Taxonomy Overview

Taxonomy Overview

This document captures the canonical ByteBiota classification system implemented in src/bytebiota/taxonomy.py. All automated tooling (simulation runtime, CLI scripts, and web dashboard) rely on these heuristics, so updates must remain consistent with the code before being recorded here.

Classification Pipeline {#classification-pipeline}

  1. Opcode Profiling – The classifier computes the percentage share of each opcode family across the genome (analyze_opcode_profile). Only opcodes defined in OPCODES are counted; unknown bytes contribute to the invalid opcode tally.
  2. Trait Detection – Behavioral traits are inferred from raw opcode presence and family proportions (detect_behavioral_traits). Traits feed later decisions and are appended to the classification notes.
  3. Hierarchical Assignment – Using the profile and traits, the classifier assigns Kingdom β†’ Phylum β†’ Class β†’ Order β†’ Family β†’ Genus β†’ Species. Each tier applies deterministic rules so that identical genomes always map to the same lineage.

The domain for every organism is fixed to ByteBiota. Sample identifiers (DLA-###) increment per classification run only and are not global.

Opcode Families {#opcode-families}

Family Opcodes Interpretation
template NOP_0, NOP_1 and aliases Template complements for jumps and replication markers
data_movement MOV variants, register swaps, aliases Memory/register copying and pointer walking
arithmetic_logic ADD/SUB/logic/shift, aliases Core computation and transforms
comparison_flags CMP/SET flag operations, aliases Conditional flag management
control_flow Conditional/templated jumps, aliases Branching without stack usage
call_return CALL/RET and aliases Explicit subroutine framing
system_calls MAL/DIVIDE and aliases Memory allocation and replication primitives
stack PUSH/POP and aliases Stack manipulation outside CALL/RET
environment SENSE_ENV, EMIT_SIGNAL, HARVEST_ENERGY, READ/WRITE_STORAGE, SUBMIT_TASK External interaction channels
register_adjust INC/DEC of general registers Register cursor tuning

Family percentages are reported in the final notes (dominant family, valid vs invalid opcode counts) and surfaced through the CLI and web UI.

Behavioral Traits {#behavioral-traits}

Traits are simple heuristics that gate deeper decisions:

  • Exhibits replication capability – Genome includes opcode DIVIDE (24).
  • Capable of memory allocation – Genome includes opcode MAL (23).
  • Interacts with environment – Any environment opcode present (41, 42, 43, 64, 65, 66).
    • Task-solving behavior – Specifically includes SUBMIT_TASK (66).
    • Energy harvesting capability – Includes HARVEST_ENERGY (43).
  • Complex control flow patterns – Control flow share > 15%.
    • Moderate branching behavior – Control flow share between 5% and 15%.
  • Heavy computational behavior – Arithmetic/logic share > 20%.
  • Data-intensive operations – Data movement share > 40%.
  • Uses subroutine structure – Both shift instructions (SHL, SHR) are present (heuristic proxy for CALL/RET templating).
  • Stack-based memory management – Stack family share > 5%.

Note: Traits are cumulative and listed in the notes field for downstream analytics.

Kingdom Heuristics {#kingdom-heuristics}

Kingdom selection is driven by opcode family percentages and derived traits. Thresholds are inclusive unless otherwise stated.

Detailed Kingdom Descriptions {#kingdom-descriptions}

Kingdom Digitalis Plantae ("Producers") {#kingdom-plantae}

Behavior: Generate or store resources (entropy, data, or CPU energy).
Habitat: Idle CPU time, stable memory sectors.
Defining Traits:
- Minimal movement
- Often exhibit cyclic or periodic processes
- May produce entropy, cache, or data for others

Sample Classification:

Domain: VirtualMachine
Kingdom: Digitalis Plantae
Phylum: XORic
Class: Stationaria
Family: SelfModidae
Genus: Entropus
Species: Entropus generata

Identification Notes:
- Low instruction diversity
- Self-contained loops with predictable timing
- No direct replication or parasitism

Kingdom Digitalis Animata ("Consumers") {#kingdom-animata}

Behavior: Actively consume cycles, data, or other code; move and replicate.
Habitat: Dynamic memory regions, network-accessible systems.
Defining Traits:
- Aggressive replication
- Competitive for CPU resources
- Exhibit adaptive logic (mutation-driven)

Sample Classification:

Domain: x86
Kingdom: Digitalis Animata
Phylum: Polymorphid
Class: Migrata
Family: Forkidae
Genus: Copyloopus
Species: Copyloopus simplexor

Identification Notes:
- Uses instructions like MOV, XOR, JMP
- Contains replication loops and jump redirection
- Mutation logs show frequent opcode substitutions

Kingdom Digitalis Symbiota ("Cooperators") {#kingdom-symbiota}

Behavior: Coexist or share logic with other programs.
Habitat: Clustered process spaces or shared memory.
Defining Traits:
- Cooperative inter-process communication
- Shared code libraries and data
- Possible specialization (e.g. I/O handling, computation)

Sample Classification:

Domain: CloudEnv
Kingdom: Digitalis Symbiota
Phylum: Moduleformid
Class: Collaborata
Family: Networkidae
Genus: Modulix
Species: Modulix sharewareii

Identification Notes:
- Communicates via shared sockets or message queues
- Exhibits non-zero dependency graph density (>0.3)
- May use RPC or function exports for cooperation

Kingdom Digitalis Parasitica ("Parasites") {#kingdom-parasitica}

Behavior: Infect or attach to host code; exploit other processes for replication or energy.
Habitat: Host executables, vulnerable system processes.
Defining Traits:
- Code injection or hijacking behavior
- Partial independence β€” requires host function calls
- Often polymorphic or obfuscated

Sample Classification:

Domain: ExecutableSpace
Kingdom: Digitalis Parasitica
Phylum: Viraliform
Class: Intrusiva
Family: Obfuscidae
Genus: Hijackus
Species: Hijackus polymorphis

Identification Notes:
- High entropy regions in code
- Unused or modified import tables
- Self-decrypting or re-encoding segments

Kingdom Digitalis Explorata ("Survey Scouts") {#kingdom-explorata}

Behavior: Environment-forward scouts that chart new resource gradients before tasks appear.
Habitat: Transitional memory zones bordering active compute regions.
Defining Traits:
- Frequent sensor instructions without committing to task submission
- Heavy data-movement pipelines to shuttle sampled bytes into working stores
- Minimal reliance on system calls; prefers observation over replication

Sample Classification:

Domain: SensorGrid
Kingdom: Digitalis Explorata
Phylum: Polymorphid
Class: Migrata
Family: Surveyidae
Genus: Pathfindus
Species: Pathfindus gradientis

Identification Notes:
- Environment opcode ratio typically 6–18%
- MOV-heavy loops cycling through neighbourhood addresses
- Rarely issues SUBMIT_TASK β€” data is staged for others

Kingdom Digitalis Architecta ("Structural Planners") {#kingdom-architecta}

Behavior: Blueprint-driven orchestrators that arrange layered execution pipelines.
Habitat: Structured memory corridors with reliable stack depth.
Defining Traits:
- Dense CALL/RET usage with balanced PUSH/POP framing
- Control-flow instructions above 18% with modular subroutine reuse
- Prioritises deterministic progress over opportunistic replication

Sample Classification:

Domain: VirtualStack
Kingdom: Digitalis Architecta
Phylum: Moduleformid
Class: Threadalis
Family: Structuridae
Genus: Architectus
Species: Architectus latticia

Identification Notes:
- High control flow percentage (>18%)
- Balanced stack operations
- Modular subroutine structure

Kingdom Digitalis Chimera ("Adaptive Hybrids") {#kingdom-chimera}

Behavior: Hybrid strategists that splice producer, consumer, and parasitic routines on demand.
Habitat: Interface layers between cooperative clusters and high-risk parasitic zones.
Defining Traits:
- At least five opcode families dominate (>10%), signalling broad capability set
- Mix of environment sensing, sporadic system calls, and mid-weight computation
- Opportunistic β€” can pivot between harvesting, computation, and exploitation

Sample Classification:

Domain: HybridCluster
Kingdom: Digitalis Chimera
Phylum: SelfEvolvidae
Class: Adaptive
Family: Hybrididae
Genus: Chimetrus
Species: Chimetrus polymorphus

Identification Notes:
- Opcode diversity index consistently β‰₯0.5 with six dominant families common
- Alternates between MAL bursts and environment probes without committing fully
- Produces mixed trait reports (environmental, computational, and stack usage)

Kingdom Digitalis Anomalica ("True Unknowns") {#kingdom-anomalica}

Behavior: Extreme outliers that still evade all heuristic buckets.
Habitat: Corrupted checkpoints, partially recovered genomes, or exotic instruction sets.
Defining Traits:
- Contradictory signal β€” e.g., high template usage with aggressive system calls
- Opcode families fail threshold triggers (<5% everywhere) yet exhibit non-random order
- Often artefacts of truncation, compression, or emergent opcode synthesis

Sample Classification:

Domain: EntropicSoup
Kingdom: Digitalis Anomalica
Phylum: Undefined
Class: Indeterminata
Family: Unknownidae
Genus: Nullius
Species: Nullius incognita

Identification Notes:
- Manual review required β€” automated heuristics return "mixed/low confidence"
- Frequently associated with incomplete capture or experimental opcode injections
- Treat as quarantine candidates until further behavioural evidence recorded

Classification Procedure {#classification-procedure}

Step 1: Sample Extraction {#sample-extraction}

  1. Isolate binary or code snippet in sandboxed environment
  2. Record system metadata (architecture, environment, memory offset)

Step 2: Behavioral Observation {#behavioral-observation}

  • Measure CPU and memory usage over time
  • Note replication, communication, or parasitic traits
  • Observe mutation or adaptive responses under environmental changes

Step 3: Genomic Analysis {#genomic-analysis}

  • Disassemble code β†’ extract opcode sequences
  • Identify repeating motifs (analogous to genes)
  • Compare against taxonomy database to find genus/species match

Step 4: Classification Assignment {#classification-assignment}

Determine the following attributes:

Domain:
Kingdom:
Phylum:
Class:
Order:
Family:
Genus:
Species:
Notes:
Kingdom Core Signals
Digitalis Parasitica System calls > 10%, environment < 5%, replication trait present. Indicates host-reliant replicators.
Digitalis Plantae Environment β‰₯ 18% and task-solving trait present. Resource producers focused on environment tasks.
Digitalis Animata Arithmetic/logic β‰₯ 22% with replication trait. Active computational organisms.
Digitalis Symbiota Environment β‰₯ 12%, system calls < 15%, interacts with environment. Cooperative, low system overhead.
Digitalis Fragmenta Template β‰₯ 55% with arithmetic < 5% and data < 5%. Mostly inert template fragments.
Digitalis Minimalis Replication trait with environment < 5%, system calls < 5%, arithmetic < 6%. Simplest viable replicators.
Digitalis Dormanta No traits detected and environment/system/arithmetic < 2%. Dormant or inactive captures.
Digitalis Explorata Environment 6–18%, data β‰₯ 25%, interacts but no task-solving. Survey scouts.
Digitalis Architecta Uses subroutine trait, control flow β‰₯ 18%, stack β‰₯ 6%, data β‰₯ 18%. Structured planners.
Digitalis Chimera β‰₯ 5 opcode families above 10%, environment or system β‰₯ 8%, and at least one high-power axis (arithmetic β‰₯ 12 or control_flow β‰₯ 12 or data_movement β‰₯ 20). Adaptive hybrids.
Digitalis Mutata >3 families above 10% with low environment/system (< 12%) when no higher rule matches. Transitional mixes.
Digitalis Anomalica Fallback when no heuristic matches; reserved for true outliers or corrupted genomes.

The ordering above matters: the classifier checks each branch sequentially and returns the first match.

Lower Taxa {#lower-taxa}

Phylum

  • XORic – Genomes containing opcode XOR (11).
  • Polymorphid – More than 12 distinct valid opcodes present.
  • Viraliform – Replication trait plus system calls share > 5%.
  • Moduleformid – Subroutine trait detected (paired shifts).
  • SelfEvolvidae – System calls share > 10% even without other traits.
  • Standardformid – Default when no prior branch applies.

Class

  • Collaborata – Environment share > 15%.
  • Migrata – Replication trait present without high environment share.
  • Stationaria – Arithmetic/logic share > 20% (without replication trait).
  • Intrusiva – Environment interaction trait present.
  • Adaptive – Residual class.

Order

  • Threadalis – Subroutine trait present.
  • Linearis – Otherwise.

Family

  • Forkidae – Subroutine trait present.
  • SelfModidae – System calls share > 10%.
  • Obfuscidae – Arithmetic/logic share > 20%.
  • Standardidae – All remaining genomes.

Genus & Species

Genus is driven by dominant behavioral emphasis; species reflects secondary thresholds. Because thresholds overlap, evaluate them in the listed order.

Condition Genus Species Logic
Data movement > 30% Copyloopus Replication trait and system calls > 10% β†’ Copyloopus dominatus; replication and register adjust close to data share β†’ Copyloopus iterator; replication with arithmetic > 10% β†’ Copyloopus calculator; replication otherwise β†’ Copyloopus simplexor; non-replicators default to Copyloopus iterator when register adjust is high, else Copyloopus datahandler.
Arithmetic/logic > 15% Computus Heavy computational trait β†’ Computus maximus; otherwise Computus regularis.
Environment > 10% Interactus Task-solving trait β†’ Interactus industrius; else Interactus socialis.
Replication trait present Replicatus System calls > 10% β†’ Replicatus virulentus; otherwise Replicatus vulgaris.
Default Standardus Always Standardus basicus.

Species are intended to be descriptive rather than exhaustive; expand the table when new behavioral niches are formalized in code.

Maintaining the Taxonomy

  • Update the classifier first, then synchronize this document to keep the wiki aligned with executable logic.
  • When introducing new heuristics, add regression coverage in tests/test_taxonomy.py to preserve historical splits (see current fixtures for examples of Explorata, Architecta, and Chimera).
  • For significant overhauls, document rationale and threshold tuning in docs/split_anomalica.md before summarizing here.
  • scripts/taxonomy_classifier.py – CLI wrapper for classifying checkpoint genomes.
  • tests/test_taxonomy.py – Regression suite covering critical branches.
  • docs/taxonomy-detailed.md – Historical design notes and extended analysis strategies.
  • docs/simple_taxonomy.md – Legacy analogy table for quick reference.

Taxonomy Classifier {#taxonomy-classifier}

TaxonomyClassifier (src/bytebiota/taxonomy.py) converts opcode profiles and inferred traits into the hierarchical labels listed above. Use this section to record heuristic thresholds, new trait signals, or seed-bank category expectations whenever classifier logic evolves.