Distributed architecture
Back to Home
Distributed architecture¶
This page documents the distributed ByteBiota architecture, including worker β server result payloads and server-side aggregation.
Architecture Overview {#architecture-overview}¶
The distributed ByteBiota system consists of three main components:
- Server - Centralized coordinator managing seed bank, global state aggregation, and worker coordination
- Worker - Distributed execution nodes running organism time slices with configurable resource limits
- Web UI - Integrated with server for real-time monitoring and control
Communication Pattern {#communication-pattern}¶
- Workers establish a persistent WebSocket for assignments, result batches, and config pushes
- HTTP is reserved for lifecycle endpoints (register/deregister), configuration snapshots, and rare fallbacks (assignment fetch when the queue is empty, organism hydration, bulk-ack)
- Server still exposes REST APIs for administrative tooling, but steady-state orchestration stays on WebSocket
- Critical: Server actively pushes work assignments to workers via background task
Work Assignment System {#work-assignment-system}¶
The server uses a background task (_background_work_assignment
) that runs every second to push work assignments to all active workers. This ensures:
- Responsive work distribution: Workers receive assignments within 1 second
- Automatic recovery: Dead organisms are cleaned up from assignments
- Environment synchronization: Server state is included in work assignments
- Queue management: Only pushes work when worker queues are low
Key Methods:
- _push_work_if_needed(worker_id)
- Pushes work assignment to specific worker
- get_work_assignment(worker_id)
- Creates work assignment with organism data
- Background task runs continuously when simulation is active
Component Details {#component-details}¶
Server Component {#server-component}¶
Auto-reload behaviour: the server CLI now starts in no-reload mode by default. Use
--reload
when you want the development watcher to restart on code changes.
Location: src/bytebiota/server/
Key Files:
- server.py
- Main FastAPI server with core orchestration, WebSocket handlers, and lifecycle management
- worker_manager.py
- Worker lifecycle and work assignment management
- seed_bank_service.py
- Centralized seed bank with diversity management
- state_aggregator.py
- Aggregates statistics from all workers
- checkpoint_service.py
- Coordinates distributed checkpointing
- websocket_manager.py
- WebSocket connection management and message handling
- config.py
- Server-specific configuration
Service Modules (refactored from server.py):
- analytics_service.py
- Advanced analytics endpoints (phylogenetic analysis, population forecasting, etc.)
- simulation_api_service.py
- Simulation control, organism data, taxonomy, and wiki endpoints
- monitoring_api_service.py
- System health monitoring and distributed system metrics
- ui_routes_service.py
- Web UI page routes and template rendering
Refactoring Benefits:
- Modularity: Each service handles a specific domain (analytics, simulation, monitoring, UI)
- Maintainability: Easier to locate and modify specific functionality
- Testability: Services can be tested independently
- Scalability: Services can be extracted to separate processes if needed
- Code Size: Reduced server.py from 2581 to 1637 lines (36.5% reduction)
Worker Component {#worker-component}¶
Location: src/bytebiota/worker/
Core Files:
- worker.py
- Main worker orchestration and lifecycle management
- executor.py
- Local simulation execution engine
- resource_limiter.py
- CPU/memory throttling and monitoring
- sync_client.py
- HTTP client for server communication
- config.py
- Worker-specific configuration
Manager Modules (refactored from worker.py):
- batch_manager.py
- Adaptive batching logic and result coordination
- assignment_handler.py
- Work assignment coordination and execution
- organism_manager.py
- Organism lifecycle and factory management
- config_sync.py
- Configuration synchronization with server
- websocket_client.py
- WebSocket communication with server (fixed race condition in assignment queue)
- connection_manager.py
- Network connection and WebSocket management
- statistics_tracker.py
- Progress monitoring and metrics collection
Supporting Files:
- offline_cache.py
- Offline result caching for resilience
- error_handler.py
- Error handling and recovery
- resource_detector.py
- System resource detection
Platform Support:
ByteBiota workers are designed for cross-platform deployment:
- Windows 10/11: Full support with Windows-specific process priority classes
- macOS 12+: Full support with Unix nice levels (may require elevated permissions for negative values)
- Linux (Ubuntu 20.04+): Full support with standard Unix nice levels
Cross-Platform Features:
- Process priority management using platform-appropriate APIs (Windows priority classes vs Unix nice levels)
- CPU detection with fallbacks for all platforms (os.cpu_count()
, multiprocessing.cpu_count()
, platform-specific methods)
- WebSocket event loop compatibility (ProactorEventLoop on Windows, SelectorEventLoop on Unix)
- File path handling using pathlib.Path
for cross-platform compatibility
Resource Management:
Workers support configurable resource limits with presets:
minimal
: 10% CPU, 256MB memorybackground
: 25% CPU, 512MB memorystandard
: 50% CPU, 1024MB memoryfull
: 100% CPU, 4096MB memory
Console Output:
Workers show essential information only:
- Always shown: Startup banner, registration status, connection events, periodic progress summaries (once per minute), error messages, and final statistics
- Verbose mode (
--verbose
flag): Additionally shows detailed organism execution, genome verification, memory allocation, and per-assignment details
Use verbose mode for debugging. Normal mode provides a clean user experience for production runs.
Worker Architecture:
The worker has been refactored into a modular architecture with specialized manager classes:
- DistributedWorker: Main orchestration class that coordinates all managers
- BatchManager: Handles adaptive batching and result submission optimization
- AssignmentHandler: Manages work assignment execution and local queue coordination
- OrganismManager: Handles organism creation, caching, and lifecycle management
- ConfigSyncManager: Manages configuration synchronization with the server
- ConnectionManager: Handles network parameters, WebSocket, and offline mode
- StatisticsTracker: Tracks progress, metrics, and performance statistics
Execution safeguards {#worker-execution-safeguards}¶
Workers apply the same stochastic reaper heuristics that power the distributed LocalExecutor. Each time slice they calculate population-average age/error figures, apply the configured multipliers, and only reap when the reap_chance
roll succeeds and the organism exceeds the dynamic thresholds. This keeps long-lived lineages alive under heavy batching while maintaining consistent selection pressure across workers.
Stagnation seeding {#stagnation-seeding}¶
StateAggregator
honours reaper.stagnation_spawn_interval
/ _count
the same way LocalExecutor does. When births stall for that many global steps, the server injects fresh ancestor genomes (seed-bank first, ancestor fallback) and assigns them to workers immediately. This prevents long-running distributed experiments from freezing at a fixed census.
Simulation resets {#simulation-resets}¶
Workers now treat the simulation_id
returned from HTTP assignments as authoritative. If the server restarts with a new run (for example via --reset
), the next assignment response carries the new identifier; the worker clears its local queues, re-registers automatically, and establishes a fresh WebSocket without manual intervention.
Worker Loop:
1. Register with server (one-time)
2. Poll server for work assignment
3. Execute assigned organism time slices locally
4. Submit results (organism state updates, births, deaths)
5. Periodic heartbeat
6. Handle seed bank synchronization
Adaptive batching {#adaptive-batching}¶
The distributed worker system implements adaptive batching to optimize network communication and reduce server load. The BatchManager
class handles intelligent batching of execution results based on work rate and time intervals.
Configuration Parameters¶
The AdaptiveBatchConfig
class supports the following parameters:
target_interval_seconds
: Target time interval between batch submissions (default: 300)min_batch_size
: Minimum number of results per batch (default: 8)max_batch_size
: Maximum number of results per batch (default: 200)adjustment_factor
: Factor for batch size adjustments (default: 0.15)work_rate_threshold
: Threshold for work rate calculations (default: 200)
Adaptive batching is always enabled on distributed workers; the CLI and environment toggle have been removed to prevent accidental regressions.
Batching Logic¶
The adaptive batching system uses multiple criteria to determine when to submit batches:
- Time-based submission: Submit when target interval is reached
- Size-based submission: Submit when current adaptive batch size is reached
- Partial submission: Submit when minimum batch size is reached and half the target interval has passed
The system continuously adjusts batch size based on submission intervals to optimize performance.
API Endpoints {#api-endpoints}¶
# Worker Management
POST /api/workers/register # Worker registration
POST /api/workers/{id}/heartbeat # Worker heartbeat
DELETE /api/workers/{id} # Worker deregistration
# Work Assignment
GET /api/workers/{id}/assignment # Get work for worker
POST /api/workers/{id}/results # Submit execution results
# Seed Bank (Centralized)
GET /api/seedbank/genomes # Get genome from seed bank
POST /api/seedbank/genomes # Submit genome to seed bank
GET /api/seedbank/stats # Seed bank statistics
# Global State
GET /api/simulation/stats # Aggregated simulation stats
GET /api/simulation/organisms # Combined organism data
POST /api/simulation/control # Start/stop/pause
# Checkpointing
POST /api/checkpoint/create # Trigger distributed checkpoint
GET /api/checkpoint/status # Checkpoint status
# WebSocket (UI only)
WS /ws/realtime # Real-time stats for web UI
Work Assignment Strategy {#work-assignment-strategy}¶
Workers execute time slices for assigned organisms and return results.
Assignment Model:
@dataclass
class WorkAssignment:
organism_ids: List[int] # Organisms assigned to this worker
time_slice_steps: int # Steps to execute per organism
seed_bank_genomes: List[bytes] # New genomes for reproduction
simulation_config: Config # Current simulation parameters
Assignment Logic:
- Round-robin distribution of organisms across available workers
- Dynamic load balancing based on worker resource utilization
- Seed bank synchronization to maintain genetic diversity
- Configurable time slice size (default: 1000 steps per organism)
Worker local stats schema {#worker-local-stats-schema}¶
Workers submit execution results that include local_stats
with the following fields. Snapshot batches flush on the adaptive schedule (default target 300β―s with a minimum of 8 assignments) to balance server load with timely observability. Emergency flushes fire at 2Γ the target or when the batch reaches the hard cap.
WebSocket Communication {#websocket-communication}¶
The distributed system uses WebSocket for real-time, bidirectional communication between server and workers, eliminating HTTP polling and reducing network overhead.
Architecture Overview {#websocket-architecture}¶
- WebSocket-First Communication: Assignments, snapshot batches, heartbeats, and tuning deltas stay on the worker WebSocket
- Server-Push Assignments: Server pushes work assignments to workers when queue is low
- Real-Time Result Submission: Workers submit results immediately via WebSocket
- Offline Operation: Workers cache results locally when disconnected, resume on reconnection
- Adaptive Heartbeats: Heartbeat frequency adapts based on worker queue size (30/60/120s)
- Limited HTTP Fallback: HTTP remains for registration, deregistration, config snapshots, and last-resort assignment/organism fetches
Message Protocol {#websocket-message-protocol}¶
All messages use JSON envelope with optional gzip compression:
{
"type": "WORK_ASSIGNMENT|RESULT_SUBMISSION|HEARTBEAT|...",
"simulation_id": "sim-12345-abc",
"timestamp": 1234567890.123,
"compressed": false,
"payload": { ... }
}
Message Types:
- HANDSHAKE
: Initial connection setup with simulation_id and config
- WORK_ASSIGNMENT
: Work assignment with organism data included
- RESULT_SUBMISSION
: Execution results with submission_id for deduplication
- RESULT_ACK
: Server acknowledgment of result submission with submission_id and accepted status
- HEARTBEAT
: Worker status (queue size, cache size, offline mode)
- CONFIG_UPDATE
: Server-pushed configuration changes
- SIMULATION_CHANGE
: Notification of simulation restart/change
- CHUNK
: Large message chunk for reliable transmission
- ERROR
: Error messages with backpressure handling
Result Acknowledgment {#result-acknowledgment}¶
Server sends acknowledgments for result submissions:
{
"type": "RESULT_ACK",
"payload": {
"submission_id": "uuid-string",
"accepted": true,
"dedup": false,
"message": "Success"
}
}
Fields:
- submission_id
: Matches the submission_id from the original RESULT_SUBMISSION
- accepted
: Boolean indicating if the result was accepted and processed
- dedup
: Boolean indicating if this was a duplicate submission
- message
: Human-readable status message
Organism Data Transmission {#organism-data-transmission}¶
Work assignments include complete organism data to eliminate HTTP fallback:
{
"type": "WORK_ASSIGNMENT",
"payload": {
"assignment_id": "uuid",
"organism_ids": [1, 2, 3],
"organism_data": [
{
"id": 1,
"genome": [...],
"energy": 100,
"registers": {...},
"start_addr": 1000
}
],
"time_slice_steps": 1000,
"global_step": 42
}
}
Dead Organism Handling:
- Organisms that die are removed from both state aggregator and worker assignments
- Work assignments filter out dead organisms before transmission
- Workers receive only valid organisms, preventing HTTP fallback errors
Offline Operation {#offline-operation}¶
Workers operate in offline mode when WebSocket is disconnected:
- Result Caching: Cache results locally up to 100MB (configurable)
- Work Continuation: Continue processing local work queue
- Cache Management: FIFO cleanup when cache is full
- Reconnection: Submit cached results on reconnection
- Simulation Change: Clear cache if simulation_id changes
Connection Management {#connection-management}¶
- Registration: HTTP POST
/api/workers/register
β receive worker_id - WebSocket Connection:
ws://host:port/ws/worker/{worker_id}
- Reconnection: Indefinite attempts with exponential backoff (1β300s, 5 min max)
- Health Monitoring: Server tracks heartbeat timeouts (3Γ interval)
- Deduplication: Result submissions use (worker_id, submission_id) with 1-hour TTL
Worker ID Generation {#worker-id-generation}¶
Workers use persistent, machine-based IDs by default:
Default Format: worker-{hostname}-{machine-hash}
- Example: worker-macbook-pro-a1b2c3
Custom IDs:
# Environment variable
export WORKER_ID="gpu-worker"
python -m bytebiota worker
# β worker-gpu-worker
# CLI argument
python -m bytebiota worker --worker-id "my-worker"
# β worker-my-worker
Multiple Workers:
The server automatically handles collisions by adding suffixes:
- First worker: worker-macbook-pro-a1b2c3
- Second worker: worker-macbook-pro-a1b2c3-1
- Third worker: worker-macbook-pro-a1b2c3-2
Benefits:
- Persistent IDs across restarts
- Work statistics preserved
- Automatic multi-worker support
- No manual instance management
Reconnection and Recovery {#reconnection-recovery}¶
The system implements robust reconnection mechanisms to handle server restarts and network interruptions:
Indefinite Reconnection Strategy:
- Workers attempt reconnection indefinitely with exponential backoff
- Backoff intervals: 1, 2, 4, 8, 16, 32, 60, 120, 300 seconds (5 min max)
- No hard limits on reconnection attempts - workers will always try to reconnect
Simulation Continuity:
- Same Simulation: When reconnecting to the same simulation, workers:
- Submit cached results generated during offline period
- Sync configuration with server
- Resume work assignments immediately
- Maintain simulation state continuity
- New Simulation: When server starts a new simulation:
- Clear local state and cached results
- Re-register with new worker ID
- Fetch updated configuration
- Start fresh with new simulation
Connection State Management:
- HTTP registration and WebSocket connection are coordinated
- Successful HTTP re-registration resets WebSocket connection state
- Circuit breaker prevents permanent disconnection
- Periodic health checks detect and recover from connection issues
Server-Side Recovery:
- Server assigns existing organisms to reconnected workers
- Ensures running simulations continue with reconnected workers
- Maintains work distribution across all active workers
- Handles worker ID collisions intelligently with force mode support
Assignment Queue Synchronization {#assignment-queue-synchronization}¶
The WebSocket client uses proper synchronization to handle the race condition between async message handlers and the synchronous main work loop:
- Async Context: Uses
get_nowait()
when called from async context - Sync Context: Uses
run_until_complete()
with timeout when called from sync context - Fallback: Handles cases where no event loop is running
- Debugging: Logs queue operations to track assignment flow
This ensures work assignments are reliably retrieved regardless of the calling context.
Request Optimization {#request-optimization}¶
The distributed system implements several optimizations to prevent server overload:
- WebSocket Push: Eliminates HTTP polling entirely
- Result batching: Workers target 5-minute snapshot submissions (configurable) and fall back to an emergency flush at 2Γ the target or when the batch hits the hard cap
- Adaptive Heartbeats: Heartbeats adapt to queue size (30/60/120s)
- Rate Limiting: Server implements rate limiting to prevent abuse
- Compression: Messages >1KB are gzip compressed
- Chunking: Messages >10MB are split into chunks
Result streaming pipeline {#result-streaming}¶
Workers split result traffic into a fast event lane and a slower snapshot lane to keep the server responsive while suppressing redundant data:
- Event fast lane {#result-event-fast-lane}: births, deaths, and seed submissions are packaged as lean payloads and flushed immediately so
WorkerManager.update_work_results()
can react in real time. These envelopes carry only lifecycle events and omit organism snapshots. Event results havestep_count=0
and do not trigger tuning system assessments to prevent excessive tuning activity. - Snapshot batching {#result-snapshot-batching}: assignment summaries accumulate inside
BatchManager
until both the target interval (default 300β―s) and the adaptive minimum batch size (default 8) are satisfied. An emergency flush fires at 2Γ the target interval or when the batch reaches 200 assignments.BATCH_TARGET_INTERVAL_SECONDS
,BATCH_MIN_SIZE
, andBATCH_MAX_SIZE
override these thresholds. - Per-organism throttling {#result-snapshot-throttling}:
LocalExecutor
records the last snapshot per organism and suppresses repeats unless one of the following occurs: age advanced by β₯25β―000 instructions, energy shifted by β₯5β―%, error count increased, or 600β―s elapsed since the last broadcast. Runtime overrides (snapshot_age_threshold
,snapshot_energy_delta
,snapshot_heartbeat_interval
) let tuning policies adjust the heuristics.
Configuration {#worker-config}¶
Key environment variables for tuning worker behavior:
WebSocket Settings:
- WEBSOCKET_URL
: WebSocket server URL (auto-derived from SERVER_URL)
- OFFLINE_CACHE_MAX_MB
: Maximum offline cache size (default: 100MB)
- ADAPTIVE_HEARTBEAT
: Enable adaptive heartbeat (default: true)
- WEBSOCKET_RECONNECT_MAX_ATTEMPTS
: Max reconnection attempts (default: unlimited)
- WEBSOCKET_MAX_MESSAGE_SIZE
: Max message size (default: 10MB)
Batching Controls:
- BATCH_SIZE
: Number of work cycles per batch (default: 8, higher = more efficient, less responsive)
- BATCH_TARGET_INTERVAL_SECONDS
: Target delay between snapshot flushes (default: 300)
- BATCH_MIN_SIZE
: Minimum merged assignments before flushing (default: 8)
- BATCH_MAX_SIZE
: Hard cap for backlog size before forcing submission (default: 200)
Adaptive batching is always enabled; the legacy ADAPTIVE_BATCHING
environment variable is now ignored.
Server Settings:
- WEBSOCKET_ENABLED
: Enable WebSocket communication (default: true)
- WEBSOCKET_MAX_MESSAGE_SIZE
: Max message size (default: 10MB)
- WEBSOCKET_COMPRESSION_THRESHOLD
: Compression threshold (default: 1KB)
Legacy HTTP Settings (fallback only):
- POLL_INTERVAL
: Assignment polling interval (default: 2.0s)
- HEARTBEAT_INTERVAL
: Heartbeat frequency (default: 10.0s)
With WebSocket enabled, network traffic is reduced by ~95% compared to HTTP polling.
- steps_executed: number
- execution_time: seconds
- organisms_processed: number
- memory_occupancy: float (0β1) of local soup occupancy
- environment_stats:
- total_resources: number
- total_signals: number
- task_attempts: number
- task_successes: number
- mutation_stats:
- global_instructions: number
- copy_bit_flips: number
- background_flips: number
- insertions: number
- deletions: number
- indels: number (insertions + deletions; convenience aggregate)
- resource_usage: throttle stats
- execution_stats: cumulative execution counters
- allocation_failures: number of MAL allocation failures observed during the assignment
- allocation_failure_rate: failures per executed step (1.0 when no progress occurs but failures are logged)
Workers also include organism update records with basic phenotype data and genome bytes where applicable.
Aggregated memory occupancy {#aggregated-memory-occupancy}¶
The server aggregates a global memory_occupancy
value by summing the size
of all tracked organisms and dividing by the configured soup size from Config.soup.size
.
Formula: occupancy = (β organism.size) / soup_size.
Monitoring Metrics {#monitoring-metrics}¶
The StateAggregator
calculates comprehensive metrics for the monitoring dashboard by aggregating data from all workers and organism states. These metrics are stored in historical_stats
and served via the /api/simulation/metrics
endpoint.
Historical stats sampling {#historical-stats-sampling}¶
The server supports two retrieval modes for chart time series via get_historical_stats(limit, time_slice=False)
:
- Recent window (default): returns the most recent
limit
points. This preserves legacy behavior and is used by all existing callers that do not specifytime_slice
. - Time-sliced: when
time_slice=true
is passed through the API, the server samples evenly across the entire retained history to producelimit
representative points, always ensuring the latest point is included. This mirrors the standalone monitor's/data?time_slice=true
behavior.
API usage:
GET /api/simulation/metrics?points=1000&time_slice=true # time-sliced across full timeline
GET /api/simulation/metrics?points=1000&time_slice=false # most recent 1000 points
Organism-Level Metrics {#organism-level-metrics}¶
Basic Metrics:
- average_size
: Mean organism size in bytes
- average_age
: Mean age in instructions executed
- average_errors
: Mean error count per organism
- average_energy
: Mean energy level per organism
Task Metrics:
- total_task_attempts
: Sum of task attempts across all organisms
- total_task_rewards
: Sum of task rewards earned
- active_priority_boosts
: Count of organisms with priority boosts
Environment Metrics {#environment-metrics}¶
Resource Levels:
- average_resource_level
: Mean resource level per organism
- average_signal_level
: Mean signal level per organism
- current_task_value
: Current environment task reward value
Economic Metrics {#economic-metrics}¶
Resource Utilization:
- storage_utilization
: Fraction of soup memory used by organisms
- energy_efficiency
: Energy consumed per instruction ratio
- total_cpu_cost
: Total CPU cost (instructions Γ cost per instruction)
- total_rent_collected
: Total rent collected for memory usage
- seed_usage_events
: Number of seed bank usage events
Allocation Metrics {#allocation-metrics}¶
Memory Pressure:
- allocation_failures
: Sum of the most recent MAL failures reported by each worker
- allocation_failure_rate
: Failures per executed step (falls back to 1.0 when workers cannot advance but keep failing MAL)
- cumulative_failures
: Running total of all failures since the simulation started
These figures allow Hybrid Tuning and dashboards to distinguish "no diversity" warnings caused by true evolutionary stagnation from simple soup exhaustion.
Implementation {#monitoring-implementation}¶
The metrics are calculated in StateAggregator._recalculate_global_stats()
using three helper methods:
_calculate_organism_metrics()
: Computes organism-level aggregations_calculate_environment_metrics()
: Computes environment and resource metrics_calculate_economic_metrics()
: Computes economic and efficiency metrics
All metrics are stored in historical_stats
with top-level fields for easy access by the monitoring API.
Mutation Statistics {#mutation-statistics}¶
Mutation Metrics:
- copy_bit_flips
: Copy-time bit flips during genome reproduction
- background_flips
: Background mutations triggered by instruction count intervals
- insertions
: Structural mutations that add bytes to genomes
- deletions
: Structural mutations that remove bytes from genomes
- global_instructions
: Total instruction count across all organisms
Implementation Details:
- Workers track mutation counters in MutationEngine.mutation_counters
- Server aggregates mutation stats from all workers in _calculate_mutation_metrics()
- Separate tracking for insertions and deletions (not just combined structural mutations)
- Background mutations triggered every 2M instructions (configurable via BACKGROUND_FLIP_INTERVAL
)
- Copy-time mutations occur during genome reproduction with configurable rates
Configuration:
- INSERTION_DELETION_RATE
: Probability of structural mutations (default: 0.14)
- INDEL_INSERTION_BIAS
: Bias toward insertions vs deletions (default: 0.6)
- BACKGROUND_FLIP_INTERVAL
: Instructions between background mutations (default: 2000000)
Global Step Counting {#global-step-counting}¶
The distributed system maintains a global step count that accumulates the actual number of steps executed by all workers. This is critical for maintaining simulation consistency and proper metrics reporting.
Step Counting Process:
1. Workers execute organism time slices and report the total number of steps executed
2. Server receives execution results with step_count
field containing actual steps executed
3. Server accumulates these step counts into the global step counter: global_step += results.step_count
4. Global step count is used in metrics reporting and simulation state tracking
Implementation: The StateAggregator.process_execution_results()
method adds the actual step count from worker results to the global step counter, ensuring accurate simulation progress tracking across all distributed workers.
Worker Result Submission {#worker-result-submission}¶
The distributed system requires workers to always submit results to the server, even when all organisms die, to ensure continuous work assignment and prevent simulation stalls.
Critical Bug Fix: The AssignmentHandler.execute_assignment()
method now returns an empty ExecutionResults
object instead of None
when no organisms are available for execution. This ensures the worker always submits results to the server, allowing the simulation to continue.
Configuration: Workers use adaptive batching with configurable submission thresholds:
- BATCH_TARGET_INTERVAL_SECONDS
: Controls the adaptive batch target interval for time-based submissions
- BATCH_MIN_SIZE
: Minimum number of assignments before a flush
- BATCH_MAX_SIZE
: Hard cap that triggers an emergency flush
Implementation: The worker's main loop processes results only when result
is truthy, so returning empty results instead of None
ensures continuous result submission and prevents simulation stalls.
Memory Allocation Issues {#memory-allocation-issues}¶
The distributed system has a critical issue where organisms cannot reproduce due to memory allocation failures, preventing population growth and causing the simulation to reach a steady state with no progress.
Root Cause: The ancestor program is designed to overallocate memory for reproduction (adding 16 extra bytes to the original 48-byte size, requiring 64 bytes total), but the memory allocation algorithm fails to find contiguous blocks of this size.
Symptoms:
- Repeated "ALLOCATION FAILED: Organism X needs 64 bytes but allocation failed" errors
- All organisms die without reproducing, leading to population decline
- Server continuously creates new organisms to replace dead ones, but they also die
- Simulation reaches steady state with no work progress
Investigation Results:
- Energy parameters are generous (ENERGY_INITIAL=2000
, ENERGY_MAX=3000
, CPU_COST_PER_INSTRUCTION=0.001
)
- Memory sizes are large (SOUP_SIZE=10000000
, LOCAL_SOUP_SIZE=5000000
)
- Worker result submission is working correctly
- Issue persists even with increased memory sizes and separate worker local memory
Current Status: The memory allocation algorithm or the ancestor program's reproduction strategy needs to be redesigned to allow successful organism reproduction and population growth.
Mutation metrics aggregation {#mutation-metrics-aggregation}¶
The server aggregates per-worker mutation counters into global_stats.mutation_metrics
with keys:
- copy_time_mutations: sum of worker
copy_bit_flips
- background_mutations: sum of worker
background_flips
- structural_mutations: sum of worker
indels
if present, elseinsertions + deletions
- global_instructions: sum of worker
global_instructions
- total_mutations: copy_time_mutations + background_mutations + structural_mutations
Diversity metrics {#diversity-metrics}¶
The server computes diversity_metrics
from organism states:
- unique_genomes: count of distinct genome hashes observed across live organisms
- total_genomes: current population size
- genome_diversity_ratio: unique_genomes / max(1, total_genomes)
- dominant_genome_ratio: share of the population occupied by the most common genome hash
- dominant_genome_count: number of organisms sharing that dominant hash
- size_distribution: histogram of organism sizes
- taxonomy_distribution: counts by
taxonomy.kingdom
when present - missing_genome_count: organisms that did not report genome bytes (should remain 0)
Server API {#server-api}¶
The distributed server provides a comprehensive REST API for web interface integration and external tooling.
Web Interface Integration {#web-interface-integration}¶
The server integrates the UI system located in src/bytebiota/ui/
:
- Static Assets: Serves CSS, JavaScript, and images from
src/bytebiota/ui/static/
- HTML Templates: Renders Jinja2 templates from
src/bytebiota/ui/templates/
- Icon Generation: Dynamic SVG icon generation for organisms
Icon Generation API {#icon-generation-api}¶
The server provides dynamic icon generation through the integrated ByteBiotaIconGenerator
:
Endpoint: /api/organism/{organism_id}/icon
¶
Method: GET
Description: Generate or retrieve an SVG icon for a specific organism
Response:
{
"icon_path": "/static/icons/Replicatus_vulgaris_Digitalis_Animata.svg"
}
Process:
1. Retrieves organism data from state aggregator
2. Extracts taxonomic classification and behavioral traits
3. Generates deterministic SVG icon using genome hash as seed
4. Saves icon to src/bytebiota/ui/static/icons/
5. Returns relative path for web serving
Error Handling: Falls back to default icon (/static/icons/organism.svg
) if generation fails
Icon Generation Features¶
- Deterministic: Same organism always generates identical icon
- Taxonomic Mapping: Visual elements reflect classification hierarchy
- Trait Encoding: Behavioral traits add visual overlays
- Caching: Generated icons are cached by filename
- Scalable: SVG format supports any display size