Taxonomy Overview
Taxonomy Overview¶
This document captures the canonical ByteBiota classification system implemented in src/bytebiota/taxonomy.py
. All automated tooling (simulation runtime, CLI scripts, and web dashboard) rely on these heuristics, so updates must remain consistent with the code before being recorded here.
Classification Pipeline {#classification-pipeline}¶
- Opcode Profiling β The classifier computes the percentage share of each opcode family across the genome (
analyze_opcode_profile
). Only opcodes defined inOPCODES
are counted; unknown bytes contribute to the invalid opcode tally. - Trait Detection β Behavioral traits are inferred from raw opcode presence and family proportions (
detect_behavioral_traits
). Traits feed later decisions and are appended to the classification notes. - Hierarchical Assignment β Using the profile and traits, the classifier assigns Kingdom β Phylum β Class β Order β Family β Genus β Species. Each tier applies deterministic rules so that identical genomes always map to the same lineage.
The domain for every organism is fixed to ByteBiota. Sample identifiers (DLA-###
) increment per classification run only and are not global.
Opcode Families {#opcode-families}¶
Family | Opcodes | Interpretation |
---|---|---|
template | NOP_0, NOP_1 and aliases | Template complements for jumps and replication markers |
data_movement | MOV variants, register swaps, aliases | Memory/register copying and pointer walking |
arithmetic_logic | ADD/SUB/logic/shift, aliases | Core computation and transforms |
comparison_flags | CMP/SET flag operations, aliases | Conditional flag management |
control_flow | Conditional/templated jumps, aliases | Branching without stack usage |
call_return | CALL/RET and aliases | Explicit subroutine framing |
system_calls | MAL/DIVIDE and aliases | Memory allocation and replication primitives |
stack | PUSH/POP and aliases | Stack manipulation outside CALL/RET |
environment | SENSE_ENV, EMIT_SIGNAL, HARVEST_ENERGY, READ/WRITE_STORAGE, SUBMIT_TASK | External interaction channels |
register_adjust | INC/DEC of general registers | Register cursor tuning |
Family percentages are reported in the final notes (dominant family, valid vs invalid opcode counts) and surfaced through the CLI and web UI.
Behavioral Traits {#behavioral-traits}¶
Traits are simple heuristics that gate deeper decisions:
- Exhibits replication capability β Genome includes opcode
DIVIDE
(24). - Capable of memory allocation β Genome includes opcode
MAL
(23). - Interacts with environment β Any environment opcode present (41, 42, 43, 64, 65, 66).
- Task-solving behavior β Specifically includes
SUBMIT_TASK
(66). - Energy harvesting capability β Includes
HARVEST_ENERGY
(43).
- Task-solving behavior β Specifically includes
- Complex control flow patterns β Control flow share > 15%.
- Moderate branching behavior β Control flow share between 5% and 15%.
- Heavy computational behavior β Arithmetic/logic share > 20%.
- Data-intensive operations β Data movement share > 40%.
- Uses subroutine structure β Both shift instructions (
SHL
,SHR
) are present (heuristic proxy for CALL/RET templating). - Stack-based memory management β Stack family share > 5%.
Note: Traits are cumulative and listed in the
notes
field for downstream analytics.
Kingdom Heuristics {#kingdom-heuristics}¶
Kingdom selection is driven by opcode family percentages and derived traits. Thresholds are inclusive unless otherwise stated.
Detailed Kingdom Descriptions {#kingdom-descriptions}¶
Kingdom Digitalis Plantae ("Producers") {#kingdom-plantae}¶
Behavior: Generate or store resources (entropy, data, or CPU energy).
Habitat: Idle CPU time, stable memory sectors.
Defining Traits:
- Minimal movement
- Often exhibit cyclic or periodic processes
- May produce entropy, cache, or data for others
Sample Classification:
Domain: VirtualMachine
Kingdom: Digitalis Plantae
Phylum: XORic
Class: Stationaria
Family: SelfModidae
Genus: Entropus
Species: Entropus generata
Identification Notes:
- Low instruction diversity
- Self-contained loops with predictable timing
- No direct replication or parasitism
Kingdom Digitalis Animata ("Consumers") {#kingdom-animata}¶
Behavior: Actively consume cycles, data, or other code; move and replicate.
Habitat: Dynamic memory regions, network-accessible systems.
Defining Traits:
- Aggressive replication
- Competitive for CPU resources
- Exhibit adaptive logic (mutation-driven)
Sample Classification:
Domain: x86
Kingdom: Digitalis Animata
Phylum: Polymorphid
Class: Migrata
Family: Forkidae
Genus: Copyloopus
Species: Copyloopus simplexor
Identification Notes:
- Uses instructions like MOV
, XOR
, JMP
- Contains replication loops and jump redirection
- Mutation logs show frequent opcode substitutions
Kingdom Digitalis Symbiota ("Cooperators") {#kingdom-symbiota}¶
Behavior: Coexist or share logic with other programs.
Habitat: Clustered process spaces or shared memory.
Defining Traits:
- Cooperative inter-process communication
- Shared code libraries and data
- Possible specialization (e.g. I/O handling, computation)
Sample Classification:
Domain: CloudEnv
Kingdom: Digitalis Symbiota
Phylum: Moduleformid
Class: Collaborata
Family: Networkidae
Genus: Modulix
Species: Modulix sharewareii
Identification Notes:
- Communicates via shared sockets or message queues
- Exhibits non-zero dependency graph density (>0.3)
- May use RPC or function exports for cooperation
Kingdom Digitalis Parasitica ("Parasites") {#kingdom-parasitica}¶
Behavior: Infect or attach to host code; exploit other processes for replication or energy.
Habitat: Host executables, vulnerable system processes.
Defining Traits:
- Code injection or hijacking behavior
- Partial independence β requires host function calls
- Often polymorphic or obfuscated
Sample Classification:
Domain: ExecutableSpace
Kingdom: Digitalis Parasitica
Phylum: Viraliform
Class: Intrusiva
Family: Obfuscidae
Genus: Hijackus
Species: Hijackus polymorphis
Identification Notes:
- High entropy regions in code
- Unused or modified import tables
- Self-decrypting or re-encoding segments
Kingdom Digitalis Explorata ("Survey Scouts") {#kingdom-explorata}¶
Behavior: Environment-forward scouts that chart new resource gradients before tasks appear.
Habitat: Transitional memory zones bordering active compute regions.
Defining Traits:
- Frequent sensor instructions without committing to task submission
- Heavy data-movement pipelines to shuttle sampled bytes into working stores
- Minimal reliance on system calls; prefers observation over replication
Sample Classification:
Domain: SensorGrid
Kingdom: Digitalis Explorata
Phylum: Polymorphid
Class: Migrata
Family: Surveyidae
Genus: Pathfindus
Species: Pathfindus gradientis
Identification Notes:
- Environment opcode ratio typically 6β18%
- MOV-heavy loops cycling through neighbourhood addresses
- Rarely issues SUBMIT_TASK β data is staged for others
Kingdom Digitalis Architecta ("Structural Planners") {#kingdom-architecta}¶
Behavior: Blueprint-driven orchestrators that arrange layered execution pipelines.
Habitat: Structured memory corridors with reliable stack depth.
Defining Traits:
- Dense CALL/RET usage with balanced PUSH/POP framing
- Control-flow instructions above 18% with modular subroutine reuse
- Prioritises deterministic progress over opportunistic replication
Sample Classification:
Domain: VirtualStack
Kingdom: Digitalis Architecta
Phylum: Moduleformid
Class: Threadalis
Family: Structuridae
Genus: Architectus
Species: Architectus latticia
Identification Notes:
- High control flow percentage (>18%)
- Balanced stack operations
- Modular subroutine structure
Kingdom Digitalis Chimera ("Adaptive Hybrids") {#kingdom-chimera}¶
Behavior: Hybrid strategists that splice producer, consumer, and parasitic routines on demand.
Habitat: Interface layers between cooperative clusters and high-risk parasitic zones.
Defining Traits:
- At least five opcode families dominate (>10%), signalling broad capability set
- Mix of environment sensing, sporadic system calls, and mid-weight computation
- Opportunistic β can pivot between harvesting, computation, and exploitation
Sample Classification:
Domain: HybridCluster
Kingdom: Digitalis Chimera
Phylum: SelfEvolvidae
Class: Adaptive
Family: Hybrididae
Genus: Chimetrus
Species: Chimetrus polymorphus
Identification Notes:
- Opcode diversity index consistently β₯0.5 with six dominant families common
- Alternates between MAL bursts and environment probes without committing fully
- Produces mixed trait reports (environmental, computational, and stack usage)
Kingdom Digitalis Anomalica ("True Unknowns") {#kingdom-anomalica}¶
Behavior: Extreme outliers that still evade all heuristic buckets.
Habitat: Corrupted checkpoints, partially recovered genomes, or exotic instruction sets.
Defining Traits:
- Contradictory signal β e.g., high template usage with aggressive system calls
- Opcode families fail threshold triggers (<5% everywhere) yet exhibit non-random order
- Often artefacts of truncation, compression, or emergent opcode synthesis
Sample Classification:
Domain: EntropicSoup
Kingdom: Digitalis Anomalica
Phylum: Undefined
Class: Indeterminata
Family: Unknownidae
Genus: Nullius
Species: Nullius incognita
Identification Notes:
- Manual review required β automated heuristics return "mixed/low confidence"
- Frequently associated with incomplete capture or experimental opcode injections
- Treat as quarantine candidates until further behavioural evidence recorded
Classification Procedure {#classification-procedure}¶
Step 1: Sample Extraction {#sample-extraction}¶
- Isolate binary or code snippet in sandboxed environment
- Record system metadata (architecture, environment, memory offset)
Step 2: Behavioral Observation {#behavioral-observation}¶
- Measure CPU and memory usage over time
- Note replication, communication, or parasitic traits
- Observe mutation or adaptive responses under environmental changes
Step 3: Genomic Analysis {#genomic-analysis}¶
- Disassemble code β extract opcode sequences
- Identify repeating motifs (analogous to genes)
- Compare against taxonomy database to find genus/species match
Step 4: Classification Assignment {#classification-assignment}¶
Determine the following attributes:
Domain:
Kingdom:
Phylum:
Class:
Order:
Family:
Genus:
Species:
Notes:
Kingdom | Core Signals |
---|---|
Digitalis Parasitica | System calls > 10%, environment < 5%, replication trait present. Indicates host-reliant replicators. |
Digitalis Plantae | Environment β₯ 18% and task-solving trait present. Resource producers focused on environment tasks. |
Digitalis Animata | Arithmetic/logic β₯ 22% with replication trait. Active computational organisms. |
Digitalis Symbiota | Environment β₯ 12%, system calls < 15%, interacts with environment. Cooperative, low system overhead. |
Digitalis Fragmenta | Template β₯ 55% with arithmetic < 5% and data < 5%. Mostly inert template fragments. |
Digitalis Minimalis | Replication trait with environment < 5%, system calls < 5%, arithmetic < 6%. Simplest viable replicators. |
Digitalis Dormanta | No traits detected and environment/system/arithmetic < 2%. Dormant or inactive captures. |
Digitalis Explorata | Environment 6β18%, data β₯ 25%, interacts but no task-solving. Survey scouts. |
Digitalis Architecta | Uses subroutine trait, control flow β₯ 18%, stack β₯ 6%, data β₯ 18%. Structured planners. |
Digitalis Chimera | β₯ 5 opcode families above 10%, environment or system β₯ 8%, and at least one high-power axis (arithmetic β₯ 12 or control_flow β₯ 12 or data_movement β₯ 20). Adaptive hybrids. |
Digitalis Mutata | >3 families above 10% with low environment/system (< 12%) when no higher rule matches. Transitional mixes. |
Digitalis Anomalica | Fallback when no heuristic matches; reserved for true outliers or corrupted genomes. |
The ordering above matters: the classifier checks each branch sequentially and returns the first match.
Lower Taxa {#lower-taxa}¶
Phylum¶
- XORic β Genomes containing opcode
XOR
(11). - Polymorphid β More than 12 distinct valid opcodes present.
- Viraliform β Replication trait plus system calls share > 5%.
- Moduleformid β Subroutine trait detected (paired shifts).
- SelfEvolvidae β System calls share > 10% even without other traits.
- Standardformid β Default when no prior branch applies.
Class¶
- Collaborata β Environment share > 15%.
- Migrata β Replication trait present without high environment share.
- Stationaria β Arithmetic/logic share > 20% (without replication trait).
- Intrusiva β Environment interaction trait present.
- Adaptive β Residual class.
Order¶
- Threadalis β Subroutine trait present.
- Linearis β Otherwise.
Family¶
- Forkidae β Subroutine trait present.
- SelfModidae β System calls share > 10%.
- Obfuscidae β Arithmetic/logic share > 20%.
- Standardidae β All remaining genomes.
Genus & Species¶
Genus is driven by dominant behavioral emphasis; species reflects secondary thresholds. Because thresholds overlap, evaluate them in the listed order.
Condition | Genus | Species Logic |
---|---|---|
Data movement > 30% | Copyloopus | Replication trait and system calls > 10% β Copyloopus dominatus; replication and register adjust close to data share β Copyloopus iterator; replication with arithmetic > 10% β Copyloopus calculator; replication otherwise β Copyloopus simplexor; non-replicators default to Copyloopus iterator when register adjust is high, else Copyloopus datahandler. |
Arithmetic/logic > 15% | Computus | Heavy computational trait β Computus maximus; otherwise Computus regularis. |
Environment > 10% | Interactus | Task-solving trait β Interactus industrius; else Interactus socialis. |
Replication trait present | Replicatus | System calls > 10% β Replicatus virulentus; otherwise Replicatus vulgaris. |
Default | Standardus | Always Standardus basicus. |
Species are intended to be descriptive rather than exhaustive; expand the table when new behavioral niches are formalized in code.
Maintaining the Taxonomy¶
- Update the classifier first, then synchronize this document to keep the wiki aligned with executable logic.
- When introducing new heuristics, add regression coverage in
tests/test_taxonomy.py
to preserve historical splits (see current fixtures for examples of Explorata, Architecta, and Chimera). - For significant overhauls, document rationale and threshold tuning in
docs/split_anomalica.md
before summarizing here.
Related Resources¶
scripts/taxonomy_classifier.py
β CLI wrapper for classifying checkpoint genomes.tests/test_taxonomy.py
β Regression suite covering critical branches.docs/taxonomy-detailed.md
β Historical design notes and extended analysis strategies.docs/simple_taxonomy.md
β Legacy analogy table for quick reference.
Taxonomy Classifier {#taxonomy-classifier}¶
TaxonomyClassifier
(src/bytebiota/taxonomy.py
) converts opcode profiles and inferred traits into the hierarchical labels listed above. Use this section to record heuristic thresholds, new trait signals, or seed-bank category expectations whenever classifier logic evolves.