Opcode Reference
Opcode Reference¶
Canonical list of ByteBiota virtual CPU instructions, sourced from src/bytebiota/isa.py
(current as of October 2025). Any change to the opcode enum, aliases, or handler behavior must be reflected here after the code merges. Families align with OPCODE_FAMILIES
to support taxonomy, analytics, and mutation tooling.
Canonical Instructions {#instruction-set}¶
Byte | Hex | Name | Family | Summary |
---|---|---|---|---|
0 | 0x00 | NOP_0 | template | Template complement used for forward/backward jump anchors. |
1 | 0x01 | NOP_1 | template | Second template complement; paired with NOP_0 for relocation-safe jumps. |
2 | 0x02 | MOV_RR | data_movement | Copy value from BX into AX . |
3 | 0x03 | MOV_RM | data_movement | Store AX byte into memory at [BX] , applying copy-time mutations when targeting offspring. |
4 | 0x04 | MOV_MR | data_movement | Load byte from memory [BX] into AX . |
5 | 0x05 | ADD | arithmetic_logic | Integer addition on AX using BX as source. |
6 | 0x06 | SUB | arithmetic_logic | Subtract BX from AX . |
7 | 0x07 | INC | arithmetic_logic | Increment AX . |
8 | 0x08 | DEC | arithmetic_logic | Decrement AX . |
9 | 0x09 | AND | arithmetic_logic | Bitwise AND: AX &= BX . |
10 | 0x0A | OR | arithmetic_logic | Bitwise OR: AX |= BX . |
11 | 0x0B | XOR | arithmetic_logic | Bitwise XOR: AX ^= BX ; hallmark of many replicators. |
12 | 0x0C | SHL | arithmetic_logic | Shift AX left; used for scaling and template prep. |
13 | 0x0D | SHR | arithmetic_logic | Shift AX right. |
14 | 0x0E | CMP | comparison_flags | Compare AX and BX ; updates flags without modifying registers. |
15 | 0x0F | SETZ | comparison_flags | Set AX to 1 if zero flag true, else 0. |
16 | 0x10 | SETNZ | comparison_flags | Set AX to 1 if zero flag false. |
17 | 0x11 | JZ | control_flow | Jump via relative offset when zero flag true. |
18 | 0x12 | JNZ | control_flow | Jump via relative offset when zero flag false. |
19 | 0x13 | JMP_FWD_TEMPLATE | control_flow | Scan forward for matching template marker and jump there. |
20 | 0x14 | JMP_BACK_TEMPLATE | control_flow | Scan backward for matching template marker and jump there. |
21 | 0x15 | CALL_TEMPLATE | call_return | Push return address and jump to forward template. |
22 | 0x16 | RET | call_return | Pop return address from stack and jump back. |
23 | 0x17 | MAL | system_calls | Request heap allocation from the scheduler/reaper. |
24 | 0x18 | DIVIDE | system_calls | Spawn a child organism from pending allocation. |
25 | 0x19 | PUSH | stack | Push AX onto the stack, decrementing SP . |
26 | 0x1A | POP | stack | Pop top of stack into AX , incrementing SP . |
27 | 0x1B | INC_BX | register_adjust | Increment pointer register BX . |
28 | 0x1C | INC_CX | register_adjust | Increment loop/work register CX . |
29 | 0x1D | INC_DX | register_adjust | Increment auxiliary register DX . |
30 | 0x1E | DEC_BX | register_adjust | Decrement BX . |
31 | 0x1F | DEC_CX | register_adjust | Decrement CX . |
32 | 0x20 | DEC_DX | register_adjust | Decrement DX . |
33 | 0x21 | MOV_AX_BX | data_movement | Copy BX into AX (explicit variant for mutation heuristics). |
34 | 0x22 | MOV_BX_AX | data_movement | Copy AX into BX . |
35 | 0x23 | MOV_AX_CX | data_movement | Copy CX into AX . |
36 | 0x24 | MOV_CX_AX | data_movement | Copy AX into CX . |
37 | 0x25 | MOV_AX_DX | data_movement | Copy DX into AX . |
38 | 0x26 | MOV_DX_AX | data_movement | Copy AX into DX . |
39 | 0x27 | MOV_RM_DX | data_movement | Store AX byte into memory at [DX] . |
40 | 0x28 | MOV_MR_DX | data_movement | Load byte from memory [DX] into AX . |
41 | 0x29 | SENSE_ENV | environment | Read local resource level into AX and signal intensity into CX . |
42 | 0x2A | EMIT_SIGNAL | environment | Emit signal with intensity AX at address BX . |
43 | 0x2B | HARVEST_ENERGY | environment | Convert environmental resources into internal energy, result in AX . |
64 | 0x40 | READ_STORAGE | environment | Load value from shared environment storage slot BX into AX . |
65 | 0x41 | WRITE_STORAGE | environment | Write AX into shared storage slot BX ; previous value returns in CX . |
66 | 0x42 | SUBMIT_TASK | environment | Submit task value in AX for reward returned in BX . |
67 | 0x43 | LOAD_SIZE_CX | register_adjust | Load current organism size into CX . |
68 | 0x44 | LOAD_START_BX | register_adjust | Load organism start address into BX . |
69 | 0x45 | LOOP_TEMPLATE | control_flow | Decrement CX ; jump backward to template while CX > 0. |
70 | 0x46 | SET_CHILD_SIZE_CX | system_calls | Clamp pending child size to CX prior to division. |
255 | 0xFF | INVALID | sentinel | Reserved invalid opcode used for robustness testing; sets error flag on execution. |
Alias Opcodes {#alias-opcodes}¶
Alias opcodes provide degeneracy so mutations can swap instructions without breaking behavior. Each alias shares an implementation handler with its canonical target via ALIAS_TARGETS
.
Byte | Hex | Name | Family | Alias of |
---|---|---|---|---|
44 | 0x2C | NOP_ALIAS_A | template | NOP_0 |
45 | 0x2D | NOP_ALIAS_B | template | NOP_1 |
46 | 0x2E | MOV_ALIAS_A | data_movement | MOV_RR |
47 | 0x2F | MOV_ALIAS_B | data_movement | MOV_RM |
48 | 0x30 | MOV_ALIAS_C | data_movement | MOV_MR |
49 | 0x31 | ALU_ALIAS_ADD | arithmetic_logic | ADD |
50 | 0x32 | ALU_ALIAS_SUB | arithmetic_logic | SUB |
51 | 0x33 | ALU_ALIAS_LOGIC | arithmetic_logic | XOR |
52 | 0x34 | ALU_ALIAS_SHIFT | arithmetic_logic | SHL |
53 | 0x35 | ALU_ALIAS_INC | arithmetic_logic | INC |
54 | 0x36 | CMP_ALIAS | comparison_flags | CMP |
55 | 0x37 | SET_ALIAS | comparison_flags | SETZ |
56 | 0x38 | JUMP_ALIAS_FWD | control_flow | JMP_FWD_TEMPLATE |
57 | 0x39 | JUMP_ALIAS_BACK | control_flow | JMP_BACK_TEMPLATE |
58 | 0x3A | CALL_ALIAS | call_return | CALL_TEMPLATE |
59 | 0x3B | RET_ALIAS | call_return | RET |
60 | 0x3C | MAL_ALIAS | system_calls | MAL |
61 | 0x3D | DIVIDE_ALIAS | system_calls | DIVIDE |
62 | 0x3E | PUSH_ALIAS | stack | PUSH |
63 | 0x3F | POP_ALIAS | stack | POP |
Opcode Family Taxonomy {#opcode-family-taxonomy}¶
ByteBiota defines an opcode table spanning 0x00β0x42 (decimal 0β66), plus an INVALID
sentinel at 0xFF, organized into biological-style codon families. Each family groups instructions by their functionality, similar to how multiple DNA codons can encode the same amino acid in biology. Within each family, certain opcodes are designated as canonical instructions providing the core behavior, while additional alias opcodes map to the same functionality. These aliases serve as redundant "synonyms" for the canonical instructions, introducing mutational robustness: a one-byte mutation is likely to land on an opcode with similar behavior, preserving the organism's function despite genetic drift.
Template / NOP Family {#template-nop-family}¶
- Canonical Opcodes: NOP_0, NOP_1
- Alias Opcodes: NOP_ALIAS_A, NOP_ALIAS_B
- Description: These are neutral no-operation instructions used primarily for template matching and safe drift. They do not alter machine state, but serve as markers in the code that other instructions (like jumps or calls) can search for. By including multiple NOP variants, ByteBiota provides a neutral mutation space - NOPs can mutate from one alias to another without changing program behavior.
- Classification: Any instruction that performs no action (aside from acting as a label or filler) falls under this Template/NOP category. They facilitate alignment and pattern matching in code (e.g. identifying jump targets) while allowing evolutionary experiments with minimal functional consequence.
Data Movement Family {#data-movement-family}¶
- Canonical Opcodes: MOV_RR, MOV_RM, MOV_MR, MOV_RM_DX, MOV_MR_DX, MOV_AX_BX, MOV_BX_AX, MOV_AX_CX, MOV_CX_AX, MOV_AX_DX, MOV_DX_AX
- Alias Opcodes: MOV_ALIAS_A, MOV_ALIAS_B, MOV_ALIAS_C
- Description: Instructions in this family move data between CPU registers and the organism's memory (often called the "soup"). They include register-to-register moves (e.g. MOV_RR transfers a value from one register to another), register-to-memory and memory-to-register moves (MOV_RM/MOV_MR with direct addressing, or MOV_RM_DX/MOV_MR_DX using the DX register as an address index), and specific register swap or transfer operations (like moving values between AX and BX, CX, or DX). The alias opcodes (MOV_ALIAS_A, etc.) execute the same handlers as their canonical counterparts, providing extra codons that still implement data transfer.
- Classification: Any instruction whose primary behavior is copying or moving bytes between locations (registers or memory) is classified as Data Movement. They do not perform computation on the data; they simply relocate values. In analysis, all such moves (including their aliases) are counted together since they achieve the same fundamental operation.
Arithmetic & Logic Family {#arithmetic-logic-family}¶
- Canonical Opcodes: ADD, SUB, INC, DEC, AND, OR, XOR, SHL, SHR
- Alias Opcodes: ALU_ALIAS_ADD, ALU_ALIAS_SUB, ALU_ALIAS_LOGIC, ALU_ALIAS_SHIFT, ALU_ALIAS_INC
- Description: This family covers basic arithmetic and bitwise operations on registers (the ALU - Arithmetic/Logic Unit operations). Examples include addition (ADD), subtraction (SUB), increment/decrement by one (INC, DEC), bitwise logical operations (AND, OR, XOR), and bit shifts (SHL for shift-left, SHR for shift-right). The alias instructions (grouped by type like add, sub, logic, shift, etc.) map to the same underlying arithmetic/logic handlers as the canonical opcodes. This redundancy means a random mutation that swaps an ADD with an ALU_ALIAS_ADD still results in an addition operation, preserving the creature's behavior.
- Classification: Any instruction that performs a mathematical or bitwise transformation on data (rather than moving it) falls into the Arithmetic & Logic category. For instance, if an opcode modifies a register's value through addition, subtraction, XOR, etc., it is classified here. In an automated analysis, all arithmetic/logic operations (including aliases) would be tallied together since they serve a common purpose of transforming data values.
Comparison & Flags Family {#comparison-flags-family}¶
- Canonical Opcodes: CMP, SETZ, SETNZ
- Alias Opcodes: CMP_ALIAS, SET_ALIAS
- Description: These instructions involve comparisons and flag setting, which are used to alter control flow or decision-making based on conditions. CMP typically compares two values (often by subtracting them internally) and sets status flags (such as zero flag if they are equal). SETZ and SETNZ are examples of instructions that set a register (or flag) to 1 or 0 depending on a condition (e.g. "Set if Zero" or "Set if Not Zero"). The aliases in this family map to the same comparison or setting behaviors as their canonical versions.
- Classification: Any opcode that evaluates a condition or directly manipulates status flags for branching logic is classified under Comparison & Flags. For instance, an instruction that checks if a value is zero and records that result (either in a flag or register) belongs to this group. Automated classification would identify these by their characteristic of not producing a significant data output, but rather affecting the state used by subsequent conditional jumps or instructions (often the CPU flags or a dedicated register).
Control Flow Family {#control-flow-family}¶
- Canonical Opcodes: JZ, JNZ, JMP_FWD_TEMPLATE, JMP_BACK_TEMPLATE, LOOP_TEMPLATE
- Alias Opcodes: JUMP_ALIAS_FWD, JUMP_ALIAS_BACK
- Description: This family handles changes in the execution sequence. JZ (Jump if Zero) and JNZ (Jump if Not Zero) are conditional branches that alter the next instruction address based on a flag (commonly the zero flag from a prior compare). JMP_FWD_TEMPLATE and JMP_BACK_TEMPLATE implement template-based branching: the organism can jump forward or backward to a matching template (a sequence of NOPs) in its code - a mechanism akin to labeled jumps or loops.
LOOP_TEMPLATE
extends that idea by automatically decrementingCX
and repeatedly jumping via templates whileCX
remains non-zero, giving organisms a compact, mutation-resilient looping primitive. The alias opcodes here map to the same underlying jump handlers (forward or backward), giving evolution some leeway in the exact opcode value while preserving the jump behavior. - Classification: Any instruction that can change the program counter (the next instruction to execute) falls under Control Flow. This includes unconditional jumps, conditional branches, and template-driven loops. In classification logic, these are identified by their effect of redirecting execution to a new location (often based on a condition or a marker). During analysis, all such opcodes (including their aliases) would be grouped to measure how much the organism's code relies on branching and looping structures.
Calls & Returns Family {#calls-returns-family}¶
- Canonical Opcodes: CALL_TEMPLATE, RET
- Alias Opcodes: CALL_ALIAS, RET_ALIAS
- Description: These opcodes implement subroutine calls and returns, enabling modular code and reuse of routines. CALL_TEMPLATE is an instruction that likely finds a forward template in the code (similar to a label marked by NOPs) and transfers control to the code following it, while saving a return pointer (often on stack or in a register) so that a subsequent RET can return execution back to the calling point. RET pops the stored return address and jumps back, completing the subroutine cycle. The alias versions of call and return map to the same behaviors, ensuring that slight mutations in the call or return opcode still correctly invoke or return from subroutines.
- Classification: Any instruction related to invoking a subroutine or function, and subsequently returning from it, is classified in the Calls & Returns family. They are a specialized subset of control flow instructions that also typically interact with the stack to save/restore return addresses. When classifying program behavior, these opcodes indicate structured code reuse and can be counted to see how often an organism utilizes subroutines.
System Calls Family {#system-calls-family}¶
- Canonical Opcodes: MAL, DIVIDE, SET_CHILD_SIZE_CX
- Alias Opcodes: MAL_ALIAS, DIVIDE_ALIAS
- Description: These instructions trigger system-level operations fundamental to the lifecycle of a digital organism. MAL ("malloc") allocates new memory in the organism's address space. DIVIDE finalizes reproduction by spawning the pending child organism.
SET_CHILD_SIZE_CX
allows the organism to trim the required copy length for the pending child after over-allocating, so that division can complete once the intended genome length has been written while still leaving headroom for structural mutations. The aliases map to the same underlying handlers for these system operations, so a mutation in the opcode still results in a memory allocation or divide action. - Classification: Instructions that interact with the runtime system or virtual hardware in a way that goes beyond normal computation belong to System Calls. They often have side effects like allocating memory or creating new processes/organisms. In classification terms, these opcodes are identified by their unique roles: if an instruction triggers replication or memory management (not just arithmetic or control flow), it is a system-call type. Automated analysis might flag these in a program to understand reproductive or self-modifying behavior.
Stack Operations Family {#stack-operations-family}¶
- Canonical Opcodes: PUSH, POP
- Alias Opcodes: PUSH_ALIAS, POP_ALIAS
- Description: Stack operations provide temporary storage and retrieval of values using a LIFO (last-in-first-out) structure. PUSH takes a value (usually from a register) and places it on the stack (decrementing the stack pointer), and POP removes the top value from the stack (incrementing the stack pointer, and typically loading that value into a register). These are essential for preserving registers across subroutine calls or for other scratch storage needs. The alias variants perform the identical push or pop behavior, giving neutrality to mutations in these opcodes.
- Classification: Any instruction that directly manipulates the stack (implicitly via a stack pointer register) is classified under Stack Ops. This includes pushing data onto or popping data off the stack. In analyzing an organism's code, the presence of push/pop indicates use of a stack-based memory discipline, often correlating with nested subroutine calls or complex arithmetic needing extra registers. All push/pop variants are grouped together when measuring stack usage.
Register Adjust Family {#register-adjust-family}¶
- Canonical Opcodes: INC_BX, INC_CX, INC_DX, DEC_BX, DEC_CX, DEC_DX, LOAD_SIZE_CX, LOAD_START_BX
- Alias Opcodes: (none)
- Description: These are fine-grained register tuning instructions. Alongside the dedicated increment/decrement opcodes for BX, CX, and DX, ByteBiota includes helpers that synchronise registers with organism metadata:
LOAD_SIZE_CX
writes the current genome length intoCX
, andLOAD_START_BX
restoresBX
to the organism's starting address. Together they let organisms navigate memory or reset pointers in small steps while staying aligned with structural mutations. There are no separate alias opcodes in this category - each adjustment/helper is its own opcode (however, conceptually they are similar operations). - Classification: We classify these as a distinct family because they target a specific behavior: minor adjustments to registers or reloading canonical pointer state. In terms of opcode behavior, they are essentially a subset of arithmetic/metadata operations confined to specific registers. For classification purposes, one identifies these by the combination of targeted register effects and the limited scope (affecting only a specific register). They indicate iterative or fine-tuning behavior in code (for example, stepping through an array or resetting a pointer before copying). In analysis, although they perform arithmetic-like updates, we consider them separately to highlight how an organism manipulates pointers and synchronises with its structural metadata versus doing general computation.
Environment & Communication Family {#environment-communication-family}¶
- Canonical Opcodes: SENSE_ENV, EMIT_SIGNAL, HARVEST_ENERGY, READ_STORAGE, WRITE_STORAGE, SUBMIT_TASK
- Alias Opcodes: (none)
- Description: This family encompasses instructions that interface with the external environment or communal resources, beyond the organism's own memory. They enable the organism to sense and interact with its world:
- SENSE_ENV might read some aspect of the environment (e.g. detect nearby organisms or resources).
- EMIT_SIGNAL allows the organism to send out a communication or signal into the environment (for cooperation or competition).
- HARVEST_ENERGY converts environmental resources into usable energy for the organism.
- READ_STORAGE (0x40) reads a value from a persistent environment storage slot. The slot number is given in register BX, and it returns the stored value in AX. This lets an organism retrieve shared or previously stored data.
- WRITE_STORAGE (0x41) writes a value to an environment storage slot. It takes a slot number in BX and a value in AX, writes the value to that slot, and returns the previous value in CX. This allows organisms to update shared state while possibly using the old value (e.g. for atomic exchange or logging).
- SUBMIT_TASK (0x42) submits a task solution (provided in AX) to the environment for a reward. If the task value is correct (matching an externally defined challenge), the environment awards energy (returned in BX) and likely generates a new task. This opcode thus provides a mechanism for organisms to gain energy by performing computational work (a form of digital resource gathering or reward).
Classification: Any instruction that involves external interaction-be it sensing state outside the organism, communicating, or utilizing special environmental services-is classified in the Environment & Communication family. These are identified by their I/O-like behavior (inputs/outputs via registers that correspond to environment data or actions) rather than purely internal computation. In classification and analysis, these opcodes are crucial to understanding an organism's strategy: for example, heavy use of SUBMIT_TASK or HARVEST_ENERGY would indicate a focus on task-solving for energy, whereas frequent EMIT_SIGNAL might indicate communication with neighbors. When analyzing opcode usage, these instructions stand out since they often use dedicated environment interfaces and have side-effects beyond the organism's own state.
Automated Opcode Classification and Analysis Strategies {#automated-classification}¶
When documenting or analyzing digital organisms, we often want to classify their code automatically to understand their behavioral tendencies. Given the above taxonomy, we can devise a systematic approach to categorize and quantify the usage of each opcode family in an organism's genome.
Static Opcode Frequency Analysis {#static-analysis}¶
This approach examines the organism's code without executing it. We scan the genome, map each opcode to its family (using the aliasβcanonical mapping to ensure aliases are counted in the same bucket), and then count frequencies or proportions of each family's usage. This yields a profile of the code's structure - e.g., what percentage of instructions are Arithmetic vs. Control Flow vs. Environment calls, etc.
Dynamic Execution Trace Analysis {#dynamic-analysis}¶
This method involves actually running the organism in a controlled environment and logging which instructions execute (and how often). By observing an execution trace, we gather statistics on opcode usage during real behavior. This can be more insightful in understanding what the organism actually does in a typical scenario, as opposed to what it could do.
Hybrid Approaches {#hybrid-approaches}¶
Often the best insight comes from combining static and dynamic analysis. For example, one might first do a static classification to know all the instructions present and their family counts. Then, by running the organism, one can weight those counts by actual execution frequency.
Maintenance Notes {#opcode-maintenance}¶
- Keep taxonomy (
../biology/taxonomy-overview.md
) and species profiles aligned when opcode families expand or new behavior traits are added. - Update alias mappings in this document whenever
ALIAS_TARGETS
changes to avoid stale guidance for mutation tooling or documentation. - For execution semantics, consult handler docstrings in
InstructionSet
βkey routines include environment interactions (_sense_env
,_read_storage
), reproduction (_divide
,_set_child_size_cx
), and loop helpers (_loop_template
).
Related files: src/bytebiota/isa.py
, src/bytebiota/taxonomy.py
.