Opcode Reference

Opcode Reference

Canonical list of ByteBiota virtual CPU instructions, sourced from src/bytebiota/isa.py (current as of October 2025). Any change to the opcode enum, aliases, or handler behavior must be reflected here after the code merges. Families align with OPCODE_FAMILIES to support taxonomy, analytics, and mutation tooling.

Canonical Instructions {#instruction-set}

Byte Hex Name Family Summary
0 0x00 NOP_0 template Template complement used for forward/backward jump anchors.
1 0x01 NOP_1 template Second template complement; paired with NOP_0 for relocation-safe jumps.
2 0x02 MOV_RR data_movement Copy value from BX into AX.
3 0x03 MOV_RM data_movement Store AX byte into memory at [BX], applying copy-time mutations when targeting offspring.
4 0x04 MOV_MR data_movement Load byte from memory [BX] into AX.
5 0x05 ADD arithmetic_logic Integer addition on AX using BX as source.
6 0x06 SUB arithmetic_logic Subtract BX from AX.
7 0x07 INC arithmetic_logic Increment AX.
8 0x08 DEC arithmetic_logic Decrement AX.
9 0x09 AND arithmetic_logic Bitwise AND: AX &= BX.
10 0x0A OR arithmetic_logic Bitwise OR: AX |= BX.
11 0x0B XOR arithmetic_logic Bitwise XOR: AX ^= BX; hallmark of many replicators.
12 0x0C SHL arithmetic_logic Shift AX left; used for scaling and template prep.
13 0x0D SHR arithmetic_logic Shift AX right.
14 0x0E CMP comparison_flags Compare AX and BX; updates flags without modifying registers.
15 0x0F SETZ comparison_flags Set AX to 1 if zero flag true, else 0.
16 0x10 SETNZ comparison_flags Set AX to 1 if zero flag false.
17 0x11 JZ control_flow Jump via relative offset when zero flag true.
18 0x12 JNZ control_flow Jump via relative offset when zero flag false.
19 0x13 JMP_FWD_TEMPLATE control_flow Scan forward for matching template marker and jump there.
20 0x14 JMP_BACK_TEMPLATE control_flow Scan backward for matching template marker and jump there.
21 0x15 CALL_TEMPLATE call_return Push return address and jump to forward template.
22 0x16 RET call_return Pop return address from stack and jump back.
23 0x17 MAL system_calls Request heap allocation from the scheduler/reaper.
24 0x18 DIVIDE system_calls Spawn a child organism from pending allocation.
25 0x19 PUSH stack Push AX onto the stack, decrementing SP.
26 0x1A POP stack Pop top of stack into AX, incrementing SP.
27 0x1B INC_BX register_adjust Increment pointer register BX.
28 0x1C INC_CX register_adjust Increment loop/work register CX.
29 0x1D INC_DX register_adjust Increment auxiliary register DX.
30 0x1E DEC_BX register_adjust Decrement BX.
31 0x1F DEC_CX register_adjust Decrement CX.
32 0x20 DEC_DX register_adjust Decrement DX.
33 0x21 MOV_AX_BX data_movement Copy BX into AX (explicit variant for mutation heuristics).
34 0x22 MOV_BX_AX data_movement Copy AX into BX.
35 0x23 MOV_AX_CX data_movement Copy CX into AX.
36 0x24 MOV_CX_AX data_movement Copy AX into CX.
37 0x25 MOV_AX_DX data_movement Copy DX into AX.
38 0x26 MOV_DX_AX data_movement Copy AX into DX.
39 0x27 MOV_RM_DX data_movement Store AX byte into memory at [DX].
40 0x28 MOV_MR_DX data_movement Load byte from memory [DX] into AX.
41 0x29 SENSE_ENV environment Read local resource level into AX and signal intensity into CX.
42 0x2A EMIT_SIGNAL environment Emit signal with intensity AX at address BX.
43 0x2B HARVEST_ENERGY environment Convert environmental resources into internal energy, result in AX.
64 0x40 READ_STORAGE environment Load value from shared environment storage slot BX into AX.
65 0x41 WRITE_STORAGE environment Write AX into shared storage slot BX; previous value returns in CX.
66 0x42 SUBMIT_TASK environment Submit task value in AX for reward returned in BX.
67 0x43 LOAD_SIZE_CX register_adjust Load current organism size into CX.
68 0x44 LOAD_START_BX register_adjust Load organism start address into BX.
69 0x45 LOOP_TEMPLATE control_flow Decrement CX; jump backward to template while CX > 0.
70 0x46 SET_CHILD_SIZE_CX system_calls Clamp pending child size to CX prior to division.
255 0xFF INVALID sentinel Reserved invalid opcode used for robustness testing; sets error flag on execution.

Alias Opcodes {#alias-opcodes}

Alias opcodes provide degeneracy so mutations can swap instructions without breaking behavior. Each alias shares an implementation handler with its canonical target via ALIAS_TARGETS.

Byte Hex Name Family Alias of
44 0x2C NOP_ALIAS_A template NOP_0
45 0x2D NOP_ALIAS_B template NOP_1
46 0x2E MOV_ALIAS_A data_movement MOV_RR
47 0x2F MOV_ALIAS_B data_movement MOV_RM
48 0x30 MOV_ALIAS_C data_movement MOV_MR
49 0x31 ALU_ALIAS_ADD arithmetic_logic ADD
50 0x32 ALU_ALIAS_SUB arithmetic_logic SUB
51 0x33 ALU_ALIAS_LOGIC arithmetic_logic XOR
52 0x34 ALU_ALIAS_SHIFT arithmetic_logic SHL
53 0x35 ALU_ALIAS_INC arithmetic_logic INC
54 0x36 CMP_ALIAS comparison_flags CMP
55 0x37 SET_ALIAS comparison_flags SETZ
56 0x38 JUMP_ALIAS_FWD control_flow JMP_FWD_TEMPLATE
57 0x39 JUMP_ALIAS_BACK control_flow JMP_BACK_TEMPLATE
58 0x3A CALL_ALIAS call_return CALL_TEMPLATE
59 0x3B RET_ALIAS call_return RET
60 0x3C MAL_ALIAS system_calls MAL
61 0x3D DIVIDE_ALIAS system_calls DIVIDE
62 0x3E PUSH_ALIAS stack PUSH
63 0x3F POP_ALIAS stack POP

Opcode Family Taxonomy {#opcode-family-taxonomy}

ByteBiota defines an opcode table spanning 0x00–0x42 (decimal 0–66), plus an INVALID sentinel at 0xFF, organized into biological-style codon families. Each family groups instructions by their functionality, similar to how multiple DNA codons can encode the same amino acid in biology. Within each family, certain opcodes are designated as canonical instructions providing the core behavior, while additional alias opcodes map to the same functionality. These aliases serve as redundant "synonyms" for the canonical instructions, introducing mutational robustness: a one-byte mutation is likely to land on an opcode with similar behavior, preserving the organism's function despite genetic drift.

Template / NOP Family {#template-nop-family}

  • Canonical Opcodes: NOP_0, NOP_1
  • Alias Opcodes: NOP_ALIAS_A, NOP_ALIAS_B
  • Description: These are neutral no-operation instructions used primarily for template matching and safe drift. They do not alter machine state, but serve as markers in the code that other instructions (like jumps or calls) can search for. By including multiple NOP variants, ByteBiota provides a neutral mutation space - NOPs can mutate from one alias to another without changing program behavior.
  • Classification: Any instruction that performs no action (aside from acting as a label or filler) falls under this Template/NOP category. They facilitate alignment and pattern matching in code (e.g. identifying jump targets) while allowing evolutionary experiments with minimal functional consequence.

Data Movement Family {#data-movement-family}

  • Canonical Opcodes: MOV_RR, MOV_RM, MOV_MR, MOV_RM_DX, MOV_MR_DX, MOV_AX_BX, MOV_BX_AX, MOV_AX_CX, MOV_CX_AX, MOV_AX_DX, MOV_DX_AX
  • Alias Opcodes: MOV_ALIAS_A, MOV_ALIAS_B, MOV_ALIAS_C
  • Description: Instructions in this family move data between CPU registers and the organism's memory (often called the "soup"). They include register-to-register moves (e.g. MOV_RR transfers a value from one register to another), register-to-memory and memory-to-register moves (MOV_RM/MOV_MR with direct addressing, or MOV_RM_DX/MOV_MR_DX using the DX register as an address index), and specific register swap or transfer operations (like moving values between AX and BX, CX, or DX). The alias opcodes (MOV_ALIAS_A, etc.) execute the same handlers as their canonical counterparts, providing extra codons that still implement data transfer.
  • Classification: Any instruction whose primary behavior is copying or moving bytes between locations (registers or memory) is classified as Data Movement. They do not perform computation on the data; they simply relocate values. In analysis, all such moves (including their aliases) are counted together since they achieve the same fundamental operation.

Arithmetic & Logic Family {#arithmetic-logic-family}

  • Canonical Opcodes: ADD, SUB, INC, DEC, AND, OR, XOR, SHL, SHR
  • Alias Opcodes: ALU_ALIAS_ADD, ALU_ALIAS_SUB, ALU_ALIAS_LOGIC, ALU_ALIAS_SHIFT, ALU_ALIAS_INC
  • Description: This family covers basic arithmetic and bitwise operations on registers (the ALU - Arithmetic/Logic Unit operations). Examples include addition (ADD), subtraction (SUB), increment/decrement by one (INC, DEC), bitwise logical operations (AND, OR, XOR), and bit shifts (SHL for shift-left, SHR for shift-right). The alias instructions (grouped by type like add, sub, logic, shift, etc.) map to the same underlying arithmetic/logic handlers as the canonical opcodes. This redundancy means a random mutation that swaps an ADD with an ALU_ALIAS_ADD still results in an addition operation, preserving the creature's behavior.
  • Classification: Any instruction that performs a mathematical or bitwise transformation on data (rather than moving it) falls into the Arithmetic & Logic category. For instance, if an opcode modifies a register's value through addition, subtraction, XOR, etc., it is classified here. In an automated analysis, all arithmetic/logic operations (including aliases) would be tallied together since they serve a common purpose of transforming data values.

Comparison & Flags Family {#comparison-flags-family}

  • Canonical Opcodes: CMP, SETZ, SETNZ
  • Alias Opcodes: CMP_ALIAS, SET_ALIAS
  • Description: These instructions involve comparisons and flag setting, which are used to alter control flow or decision-making based on conditions. CMP typically compares two values (often by subtracting them internally) and sets status flags (such as zero flag if they are equal). SETZ and SETNZ are examples of instructions that set a register (or flag) to 1 or 0 depending on a condition (e.g. "Set if Zero" or "Set if Not Zero"). The aliases in this family map to the same comparison or setting behaviors as their canonical versions.
  • Classification: Any opcode that evaluates a condition or directly manipulates status flags for branching logic is classified under Comparison & Flags. For instance, an instruction that checks if a value is zero and records that result (either in a flag or register) belongs to this group. Automated classification would identify these by their characteristic of not producing a significant data output, but rather affecting the state used by subsequent conditional jumps or instructions (often the CPU flags or a dedicated register).

Control Flow Family {#control-flow-family}

  • Canonical Opcodes: JZ, JNZ, JMP_FWD_TEMPLATE, JMP_BACK_TEMPLATE, LOOP_TEMPLATE
  • Alias Opcodes: JUMP_ALIAS_FWD, JUMP_ALIAS_BACK
  • Description: This family handles changes in the execution sequence. JZ (Jump if Zero) and JNZ (Jump if Not Zero) are conditional branches that alter the next instruction address based on a flag (commonly the zero flag from a prior compare). JMP_FWD_TEMPLATE and JMP_BACK_TEMPLATE implement template-based branching: the organism can jump forward or backward to a matching template (a sequence of NOPs) in its code - a mechanism akin to labeled jumps or loops. LOOP_TEMPLATE extends that idea by automatically decrementing CX and repeatedly jumping via templates while CX remains non-zero, giving organisms a compact, mutation-resilient looping primitive. The alias opcodes here map to the same underlying jump handlers (forward or backward), giving evolution some leeway in the exact opcode value while preserving the jump behavior.
  • Classification: Any instruction that can change the program counter (the next instruction to execute) falls under Control Flow. This includes unconditional jumps, conditional branches, and template-driven loops. In classification logic, these are identified by their effect of redirecting execution to a new location (often based on a condition or a marker). During analysis, all such opcodes (including their aliases) would be grouped to measure how much the organism's code relies on branching and looping structures.

Calls & Returns Family {#calls-returns-family}

  • Canonical Opcodes: CALL_TEMPLATE, RET
  • Alias Opcodes: CALL_ALIAS, RET_ALIAS
  • Description: These opcodes implement subroutine calls and returns, enabling modular code and reuse of routines. CALL_TEMPLATE is an instruction that likely finds a forward template in the code (similar to a label marked by NOPs) and transfers control to the code following it, while saving a return pointer (often on stack or in a register) so that a subsequent RET can return execution back to the calling point. RET pops the stored return address and jumps back, completing the subroutine cycle. The alias versions of call and return map to the same behaviors, ensuring that slight mutations in the call or return opcode still correctly invoke or return from subroutines.
  • Classification: Any instruction related to invoking a subroutine or function, and subsequently returning from it, is classified in the Calls & Returns family. They are a specialized subset of control flow instructions that also typically interact with the stack to save/restore return addresses. When classifying program behavior, these opcodes indicate structured code reuse and can be counted to see how often an organism utilizes subroutines.

System Calls Family {#system-calls-family}

  • Canonical Opcodes: MAL, DIVIDE, SET_CHILD_SIZE_CX
  • Alias Opcodes: MAL_ALIAS, DIVIDE_ALIAS
  • Description: These instructions trigger system-level operations fundamental to the lifecycle of a digital organism. MAL ("malloc") allocates new memory in the organism's address space. DIVIDE finalizes reproduction by spawning the pending child organism. SET_CHILD_SIZE_CX allows the organism to trim the required copy length for the pending child after over-allocating, so that division can complete once the intended genome length has been written while still leaving headroom for structural mutations. The aliases map to the same underlying handlers for these system operations, so a mutation in the opcode still results in a memory allocation or divide action.
  • Classification: Instructions that interact with the runtime system or virtual hardware in a way that goes beyond normal computation belong to System Calls. They often have side effects like allocating memory or creating new processes/organisms. In classification terms, these opcodes are identified by their unique roles: if an instruction triggers replication or memory management (not just arithmetic or control flow), it is a system-call type. Automated analysis might flag these in a program to understand reproductive or self-modifying behavior.

Stack Operations Family {#stack-operations-family}

  • Canonical Opcodes: PUSH, POP
  • Alias Opcodes: PUSH_ALIAS, POP_ALIAS
  • Description: Stack operations provide temporary storage and retrieval of values using a LIFO (last-in-first-out) structure. PUSH takes a value (usually from a register) and places it on the stack (decrementing the stack pointer), and POP removes the top value from the stack (incrementing the stack pointer, and typically loading that value into a register). These are essential for preserving registers across subroutine calls or for other scratch storage needs. The alias variants perform the identical push or pop behavior, giving neutrality to mutations in these opcodes.
  • Classification: Any instruction that directly manipulates the stack (implicitly via a stack pointer register) is classified under Stack Ops. This includes pushing data onto or popping data off the stack. In analyzing an organism's code, the presence of push/pop indicates use of a stack-based memory discipline, often correlating with nested subroutine calls or complex arithmetic needing extra registers. All push/pop variants are grouped together when measuring stack usage.

Register Adjust Family {#register-adjust-family}

  • Canonical Opcodes: INC_BX, INC_CX, INC_DX, DEC_BX, DEC_CX, DEC_DX, LOAD_SIZE_CX, LOAD_START_BX
  • Alias Opcodes: (none)
  • Description: These are fine-grained register tuning instructions. Alongside the dedicated increment/decrement opcodes for BX, CX, and DX, ByteBiota includes helpers that synchronise registers with organism metadata: LOAD_SIZE_CX writes the current genome length into CX, and LOAD_START_BX restores BX to the organism's starting address. Together they let organisms navigate memory or reset pointers in small steps while staying aligned with structural mutations. There are no separate alias opcodes in this category - each adjustment/helper is its own opcode (however, conceptually they are similar operations).
  • Classification: We classify these as a distinct family because they target a specific behavior: minor adjustments to registers or reloading canonical pointer state. In terms of opcode behavior, they are essentially a subset of arithmetic/metadata operations confined to specific registers. For classification purposes, one identifies these by the combination of targeted register effects and the limited scope (affecting only a specific register). They indicate iterative or fine-tuning behavior in code (for example, stepping through an array or resetting a pointer before copying). In analysis, although they perform arithmetic-like updates, we consider them separately to highlight how an organism manipulates pointers and synchronises with its structural metadata versus doing general computation.

Environment & Communication Family {#environment-communication-family}

  • Canonical Opcodes: SENSE_ENV, EMIT_SIGNAL, HARVEST_ENERGY, READ_STORAGE, WRITE_STORAGE, SUBMIT_TASK
  • Alias Opcodes: (none)
  • Description: This family encompasses instructions that interface with the external environment or communal resources, beyond the organism's own memory. They enable the organism to sense and interact with its world:
  • SENSE_ENV might read some aspect of the environment (e.g. detect nearby organisms or resources).
  • EMIT_SIGNAL allows the organism to send out a communication or signal into the environment (for cooperation or competition).
  • HARVEST_ENERGY converts environmental resources into usable energy for the organism.
  • READ_STORAGE (0x40) reads a value from a persistent environment storage slot. The slot number is given in register BX, and it returns the stored value in AX. This lets an organism retrieve shared or previously stored data.
  • WRITE_STORAGE (0x41) writes a value to an environment storage slot. It takes a slot number in BX and a value in AX, writes the value to that slot, and returns the previous value in CX. This allows organisms to update shared state while possibly using the old value (e.g. for atomic exchange or logging).
  • SUBMIT_TASK (0x42) submits a task solution (provided in AX) to the environment for a reward. If the task value is correct (matching an externally defined challenge), the environment awards energy (returned in BX) and likely generates a new task. This opcode thus provides a mechanism for organisms to gain energy by performing computational work (a form of digital resource gathering or reward).

Classification: Any instruction that involves external interaction-be it sensing state outside the organism, communicating, or utilizing special environmental services-is classified in the Environment & Communication family. These are identified by their I/O-like behavior (inputs/outputs via registers that correspond to environment data or actions) rather than purely internal computation. In classification and analysis, these opcodes are crucial to understanding an organism's strategy: for example, heavy use of SUBMIT_TASK or HARVEST_ENERGY would indicate a focus on task-solving for energy, whereas frequent EMIT_SIGNAL might indicate communication with neighbors. When analyzing opcode usage, these instructions stand out since they often use dedicated environment interfaces and have side-effects beyond the organism's own state.

Automated Opcode Classification and Analysis Strategies {#automated-classification}

When documenting or analyzing digital organisms, we often want to classify their code automatically to understand their behavioral tendencies. Given the above taxonomy, we can devise a systematic approach to categorize and quantify the usage of each opcode family in an organism's genome.

Static Opcode Frequency Analysis {#static-analysis}

This approach examines the organism's code without executing it. We scan the genome, map each opcode to its family (using the alias→canonical mapping to ensure aliases are counted in the same bucket), and then count frequencies or proportions of each family's usage. This yields a profile of the code's structure - e.g., what percentage of instructions are Arithmetic vs. Control Flow vs. Environment calls, etc.

Dynamic Execution Trace Analysis {#dynamic-analysis}

This method involves actually running the organism in a controlled environment and logging which instructions execute (and how often). By observing an execution trace, we gather statistics on opcode usage during real behavior. This can be more insightful in understanding what the organism actually does in a typical scenario, as opposed to what it could do.

Hybrid Approaches {#hybrid-approaches}

Often the best insight comes from combining static and dynamic analysis. For example, one might first do a static classification to know all the instructions present and their family counts. Then, by running the organism, one can weight those counts by actual execution frequency.

Maintenance Notes {#opcode-maintenance}

  • Keep taxonomy (../biology/taxonomy-overview.md) and species profiles aligned when opcode families expand or new behavior traits are added.
  • Update alias mappings in this document whenever ALIAS_TARGETS changes to avoid stale guidance for mutation tooling or documentation.
  • For execution semantics, consult handler docstrings in InstructionSetβ€”key routines include environment interactions (_sense_env, _read_storage), reproduction (_divide, _set_child_size_cx), and loop helpers (_loop_template).

Related files: src/bytebiota/isa.py, src/bytebiota/taxonomy.py.