Skip to content

Execution Engine

The Execution Engine is where the rubber meets the road. It takes the ranked provider list from the Scoring Layer, sends the request, handles streaming responses, manages retries on failure, and executes fallback chains when providers are down.

Request Lifecycle

Ranked Candidate List
|
v
[1] Attempt Primary --> Send to top-scored provider
|
v
[2] Stream / Await --> Handle streaming or blocking response
|
v
[3] Success? --> Yes: return result + record trace
| No: continue to fallback
v
[4] Attempt Fallback 1 --> Send to next candidate
|
v
[5] Exhaustion? --> Yes: return error with full trace
| No: continue chain
v
[6] Record Trace --> Persist full execution trace
  • File: gateway/handler.ts
  • Transforms internal Layerr request format into provider-specific formats (OpenAI, Anthropic, Ollama, custom)
  • Handles API key injection from the Secrets Manager
  • Sets headers, timeouts, and retry policies
  • OpenAI-compatible: providers/openai-compat/adapter.ts
  • Ollama: providers/ollama/adapter.ts
  • Each adapter normalizes request/response formats so the rest of the system is provider-agnostic
  • File: server.ts (primary handler for /v1/chat/completions)
  • Manages Server-Sent Events (SSE) streams
  • Supports cancellation mid-stream
  • Handles token counting for cost attribution in real-time
  • File: runtime/protection/classifier.ts
  • Classifies errors as transient (retryable) or permanent
  • Implements exponential backoff with jitter
  • Circuit breaker pattern: after N consecutive failures, provider is temporarily removed from rotation

The protection classifier (runtime/protection/classifier.ts) categorises errors:

Error TypeExamplesAction
Transient429 rate limit, 503 unavailable, timeoutRetry with backoff
Auth401, 403Fail fast, alert admin
Content400 bad request, context too longFail fast, log for analysis
Provider DownConnection refused, DNS failureImmediately fallback, mark unhealthy
ProfileTimeoutUse Case
Fast5 secondsAutocomplete, quick fixes
Standard30 secondsGeneral coding tasks
Deep120 secondsArchitecture design, complex reasoning
CustomUser-definedSpecialised workloads

Every execution is recorded as a trace with:

interface ExecutionTrace {
traceId: string;
workspaceId: string;
request: {
intent: IntentClassification;
workload: WorkloadProfile;
strategy: Strategy;
providers: ProviderCandidate[]; // ranked list
};
attempts: Attempt[];
finalOutcome: {
provider: string;
model: string;
latencyMs: number;
tokensIn: number;
tokensOut: number;
costUsd: number;
qualityScore: number;
};
timestamps: {
routedAt: Date;
firstTokenAt: Date;
completedAt: Date;
};
}
FileWhat It Does
server.tsMain server entrypoint. All API routes flow through here
gateway/handler.tsRequest transformation and gateway routing
providers/openai-compat/adapter.tsOpenAI-compatible API adapter
providers/ollama/adapter.tsOllama local model adapter
providers/resolution.tsProvider URL resolution and health checking
providers/registry.tsProvider registration and metadata
runtime/protection/classifier.tsError classification and retry logic
runtime/timeout/profiles.tsTimeout profile definitions
src/features/runtime/execution/executionModel.tsFrontend execution data model
src/features/runtime/lifecycle/lifecycleModel.tsExecution lifecycle state machine
  1. Scoring Layer → provides ranked candidate list
  2. Secrets Manager → provides API keys for provider auth
  3. Provider Health → receives success/failure signals for real-time health updates
  4. Replay → receives full execution traces for storage
  5. Economics → receives token counts and costs for attribution
  6. Explainability → receives execution details for post-hoc explanations