Execution Engine

The Execution Engine is where the rubber meets the road. It takes the ranked provider list from the Scoring Layer, sends the request, handles streaming responses, manages retries on failure, and executes fallback chains when providers are down.

Request Lifecycle

Ranked Candidate List
    |
    v
[1] Attempt Primary      --> Send to top-scored provider
    |
    v
[2] Stream / Await       --> Handle streaming or blocking response
    |
    v
[3] Success?             --> Yes: return result + record trace
    |                        No: continue to fallback
    v
[4] Attempt Fallback 1   --> Send to next candidate
    |
    v
[5] Exhaustion?          --> Yes: return error with full trace
    |                        No: continue chain
    v
[6] Record Trace         --> Persist full execution trace

Core Components

Request Builder

File: gateway/handler.ts
Transforms internal Layerr request format into provider-specific formats (OpenAI, Anthropic, Ollama, custom)
Handles API key injection from the Secrets Manager
Sets headers, timeouts, and retry policies

Provider Adapters

OpenAI-compatible: providers/openai-compat/adapter.ts
Ollama: providers/ollama/adapter.ts
Each adapter normalizes request/response formats so the rest of the system is provider-agnostic

Streaming Handler

File: server.ts (primary handler for /v1/chat/completions)
Manages Server-Sent Events (SSE) streams
Supports cancellation mid-stream
Handles token counting for cost attribution in real-time

Retry & Circuit Breaker

File: runtime/protection/classifier.ts
Classifies errors as transient (retryable) or permanent
Implements exponential backoff with jitter
Circuit breaker pattern: after N consecutive failures, provider is temporarily removed from rotation

Error Classification

The protection classifier (runtime/protection/classifier.ts) categorises errors:

Error Type	Examples	Action
Transient	429 rate limit, 503 unavailable, timeout	Retry with backoff
Auth	401, 403	Fail fast, alert admin
Content	400 bad request, context too long	Fail fast, log for analysis
Provider Down	Connection refused, DNS failure	Immediately fallback, mark unhealthy

Timeout Profiles

Profile	Timeout	Use Case
Fast	5 seconds	Autocomplete, quick fixes
Standard	30 seconds	General coding tasks
Deep	120 seconds	Architecture design, complex reasoning
Custom	User-defined	Specialised workloads

Execution Trace Format

Every execution is recorded as a trace with:

interface ExecutionTrace {
  traceId: string;
  workspaceId: string;
  request: {
    intent: IntentClassification;
    workload: WorkloadProfile;
    strategy: Strategy;
    providers: ProviderCandidate[]; // ranked list
  };
  attempts: Attempt[];
  finalOutcome: {
    provider: string;
    model: string;
    latencyMs: number;
    tokensIn: number;
    tokensOut: number;
    costUsd: number;
    qualityScore: number;
  };
  timestamps: {
    routedAt: Date;
    firstTokenAt: Date;
    completedAt: Date;
  };
}

File Reference

File	What It Does
`server.ts`	Main server entrypoint. All API routes flow through here
`gateway/handler.ts`	Request transformation and gateway routing
`providers/openai-compat/adapter.ts`	OpenAI-compatible API adapter
`providers/ollama/adapter.ts`	Ollama local model adapter
`providers/resolution.ts`	Provider URL resolution and health checking
`providers/registry.ts`	Provider registration and metadata
`runtime/protection/classifier.ts`	Error classification and retry logic
`runtime/timeout/profiles.ts`	Timeout profile definitions
`src/features/runtime/execution/executionModel.ts`	Frontend execution data model
`src/features/runtime/lifecycle/lifecycleModel.ts`	Execution lifecycle state machine

Integration

Scoring Layer → provides ranked candidate list
Secrets Manager → provides API keys for provider auth
Provider Health → receives success/failure signals for real-time health updates
Replay → receives full execution traces for storage
Economics → receives token counts and costs for attribution
Explainability → receives execution details for post-hoc explanations