Orchestration
Orchestration turns workbench intent into sandboxed execution. It coordinates sidecar placement, session lifecycle, and runtime signals across hosts.
Execution Lifecycle (Simplified)
- The orchestrator validates policies and checks capacity.
- A sidecar is selected or started on an available host.
- A session is created and queued executions run in order.
- Events stream to clients with buffering and replay support.
- Completion, failure, or cancellation updates metrics and metadata.
The orchestrator should fail before allocation when policy or capacity is invalid. Once a sidecar starts, failures should be visible as session events and metrics, not hidden as missing output.
What Orchestration Covers
- Placement and capacity: Host health, resource-aware limits, and pool membership.
- Execution control: Per-session queues, timeouts, and cancellation.
- Batch and simulation runs: Large task sets can queue and retry with backoff.
- Autoscaling (optional): Standby hosts can be promoted and webhooks can request new capacity.
- Observability hooks: Health endpoints and metrics for fleet visibility.
This is how the workbench and protocol workloads remain predictable even when the compute layer is distributed.
Operator Preflight
For a hosted pool or protocol-backed operator endpoint, check these before accepting traffic:
| Check | Why it matters |
|---|---|
/health or equivalent host probe returns healthy | The process and required dependencies are up. |
| A new session can be created and cancelled | Lifecycle control works before customer work starts. |
| A short command or prompt emits events | Streaming and execution are wired through the same path users will hit. |
| Capacity counters move after the run | Placement decisions are using live resource state. |
| Failure events include a reason | Operators can debug without guessing which subsystem failed. |