Worker self-update plan
Back to Home
Worker self-update plan {#worker-self-update}¶
Overview¶
This follow-up plan describes the work required after Phase A (artifact packaging) to enable ByteBiota workers to download and install updates automatically. It assumes the packaging workflow defined in auto-upgrade.md
is in place and that versioned binaries plus checksums are published for each platform.
Guiding principles¶
- Opt-in rollout first. Operators should be able to toggle self-update per deployment or per worker pool while we gain confidence.
- Reuse existing infrastructure. Extend FastAPI services and worker CLI without introducing parallel frameworks or redundant schedulers.
- Auditable behaviour. Every upgrade attempt must emit structured logs so we can trace version adoption across the fleet.
Phase breakdown¶
Phase B1 β Server-side update orchestration {#phase-b1-server}¶
Objective: expose the authoritative manifest that workers consult before attempting an upgrade.
Task | Description | Key files / modules | Exit criteria |
---|---|---|---|
B1.1 | Create update service scaffold | src/bytebiota/server/update_api_service.py , src/bytebiota/server/app.py |
FastAPI routes /api/worker-updates/latest and /api/worker-updates/report registered with shared auth dependencies. |
B1.2 | Define manifest schema | src/bytebiota/server/schemas/update_manifest.py , wiki/operations/environment.md |
Pydantic models enforce required fields (version, platform triples, checksum, download URL). Schema documented in the wiki. |
B1.3 | Manifest storage + caching | scripts/build_worker.sh , new S3/GitHub Release publication script |
Build pipeline uploads worker-release.json ; server reads from storage with 5-minute cache and exposes ETag header. |
B1.4 | Observability | src/bytebiota/server/logging.py |
Each manifest fetch, report, and error logged with correlation IDs. |
Dependencies: Phase A manifest artifacts, secrets for storage bucket access, existing auth middleware.
Phase B2 β Worker updater client {#phase-b2-worker}¶
Objective: teach the worker process how to check for, download, and apply updates safely.
Task | Description | Key files / modules | Exit criteria |
---|---|---|---|
B2.1 | Version embedding | worker.spec , scripts/build_worker.sh , src/bytebiota/worker/worker.py |
bytebiota-worker --version returns the packaged version and build metadata. |
B2.2 | Updater module | src/bytebiota/worker/updater.py |
Module exports check_for_update(config) , download_update(manifest_entry) , apply_update(path) with unit tests covering success and failure paths. |
B2.3 | CLI integration | src/bytebiota/worker/worker.py |
New CLI flags --no-auto-update and --force-update ; environment variable AUTO_UPDATE=0 documented and respected. |
B2.4 | Download + verification | src/bytebiota/worker/updater.py , tests/worker/test_updater.py |
Stream download to temp path, verify SHA-256, atomically swap binary (Windows fallback uses helper script). |
B2.5 | Rollback safeguards | src/bytebiota/worker/updater.py |
On failed checksum or apply, restore previous binary and emit structured log worker_update_failure with reason. |
Dependencies: Phase B1 manifest API availability, packaging version metadata, OS-specific file replacement strategy (documented below).
Phase B3 β Handshake and policy enforcement {#phase-b3-policy}¶
Objective: exchange version data between workers and server so we can coordinate staged rollouts and minimum supported versions.
Task | Description | Key files / modules | Exit criteria |
---|---|---|---|
B3.1 | Handshake payload update | src/bytebiota/worker/connection_manager.py , src/bytebiota/server/schemas/worker_registration.py |
Worker registration includes current_version , server responds with minimum_version and recommended_version . |
B3.2 | Policy enforcement | src/bytebiota/worker/worker.py |
Worker refuses to start when server reports higher minimum_version unless --force-start is provided; event logged. |
B3.3 | Feature flag plumbing | src/bytebiota/config/worker_config.py , wiki/operations/environment.md |
New config fields auto_update_enabled , auto_update_channel . Documented and defaulted safely. |
B3.4 | Metrics + dashboards | src/bytebiota/worker/metrics.py , wiki/analytics/metrics-and-observability.md |
Emit metrics (worker_update_attempt , worker_update_success , worker_update_failure , worker_version_current ). Grafana panel spec added to analytics wiki. |
Dependencies: Telemetry stack online (Prometheus exporters, logging sinks), update API accessible from worker networks.
Phase B4 β Integration + rollout {#phase-b4-rollout}¶
Objective: validate the end-to-end flow and provide operators with runbooks for safe deployment.
Task | Description | Key files / modules | Exit criteria |
---|---|---|---|
B4.1 | Integration tests | tests/integration/test_worker_update_flow.py |
Simulated manifest server delivers new version; test asserts download, apply, and restart behaviour. |
B4.2 | Canary deployment automation | .github/workflows/canary-worker-update.yml , scripts/promote_worker_release.py |
Workflow promotes manifest to canary channel only; manual approval gates production channel. |
B4.3 | Operator documentation | DEPLOYMENT.md , wiki/operations/runbooks/worker-update.md |
Runbooks cover enabling updates, forcing upgrades, rolling back, reading metrics. |
B4.4 | Post-deploy monitoring | wiki/analytics/metrics-and-observability.md |
Alert thresholds defined for failure rate and version skew. |
Dependencies: Prior phases complete, staging environment available for dry runs, release engineering sign-off.
Telemetry and safety requirements¶
- Emit structured logs and metrics for each update attempt (
worker_update_attempt
,worker_update_success
,worker_update_failure
). - Add feature flag configuration to
WorkerConfig
and document environment variables inwiki/operations/environment.md
. - Write integration tests covering a simulated update cycle (download stub file, checksum mismatch, rollback path).
- Define rollback process: store previous binary before applying update; revert on failure; document manual rollback command sequence.
Deployment considerations¶
- Start with staged rollouts: enable auto-update on canary worker pools and observe metrics before broad rollout.
- Document operator runbooks for forced updates, rolling back to a previous version, and troubleshooting failed downloads.
- Update
DEPLOYMENT.md
once the feature is feature-flagged in production. - Coordinate release timing with server maintenance windows to avoid workers downloading manifests that reference unpublished binaries.