Accepted · Implemented · Verified

ADR-001: Latency Quality Gate

Architecture Decision Record — Five-layer latency control system for the CO₂Router routing hot path.

p95 Total (Measured)

77ms

p95 Compute (Measured)

59ms

Hot Path Budget

100ms

p99 SLA Contract

≤200ms

Measured Results: p95 total 77ms, p95 compute 59ms (250 decision samples). Budget target of 100ms is confirmed. The architecture described in this ADR is the live production implementation.

1. Context

CO₂Router has a contractual p99 latency ceiling of 200ms cross-region and an internal quality gate of 100ms for the routing hot path. This ADR documents the multi-layer latency control architecture implemented to guarantee predictable performance. A governance layer that adds more than 100ms to the critical path of a CI/CD pipeline or a pre-scaling event is not deployable in practice.

2. Decision

Implement a five-layer latency control system that enforces the 100ms hot path budget while preserving signal quality and maintaining full observability.

Layer 1 — Hot Path Budget Enforcement

Deadline budget middleware on all routing endpoints. Total budget 100ms, allocated as follows:

Component	Budget	Notes
`Signal resolution`	40ms	Cache lookup only — no API calls on hot path
`Scoring + selection`	20ms	Multi-objective arbitration
`Governance + lease`	10ms	Policy trace assembly
`Persistence`	0ms blocking	Fire-and-forget async write
`Response serialization`	10ms	Zod validation + JSON
`Buffer`	20ms	20% safety margin

If signal resolution exceeds budget, use Last-Known-Good immediately. Never block the response waiting for a provider.

Layer 2 — Cache Architecture Hardening

Cache warming interval: 15 seconds (reduced from 30s)
Predictive cache warming — pre-warm regions likely to be requested based on demand patterns
Write-through cache on all provider responses
Redis pipeline / batch operations for multi-region lookups
Local in-process L1 cache (5-second TTL) before Redis — eliminates Redis round-trip for repeat requests

Layer 3 — Provider Call Isolation

Provider calls never happen on the hot path. The full flow is:

Background Worker → Redis Cache → Hot Path

// Hot path reads only from:
// 1. L1 in-process cache (5s TTL)
// 2. Redis warm cache (15min / 1hr / 6hr TTLs)
// 3. Last-Known-Good fallback

// Circuit breaker for background workers: 5s timeout

Layer 4 — Database Query Optimization

Redis-based decision ID generation (avoids DB round-trip on hot path)
Batch persistence operations via Prisma.$transaction
Connection pool monitoring — alert if utilization exceeds 80%
All decision writes are fire-and-forget from the hot path

Layer 5 — Observability

Every latency boundary is instrumented:

// Metrics emitted per routing decision
co2router.routing.latency.ms          // p50, p95, p99 — end-to-end
co2router.routing.cache_lookup_ms     // per-region, per-tier
co2router.routing.scoring_ms          // multi-objective scoring time
co2router.routing.governance_ms       // policy trace assembly time

// Alert thresholds
p99 > 80ms  → WARNING
p99 > 100ms → ALERT (hot path budget breach)
p99 > 200ms → CRITICAL (SLA breach)

// Request header for clients
X-CO2Router-Deadline-Ms: <remaining budget ms>

3. Cache TTL Architecture

Cache Tier	TTL	Triggered By
L1 in-process	5 seconds	All hot path reads
Redis warm — live	15 minutes	Background worker, healthy provider
Redis warm — degraded	1 hour	Provider returning stale data
Redis warm — fallback	6 hours	Provider circuit open, LKG applied
Last-Known-Good	Until eviction	Fallback when all tiers miss
Ember structural baseline	30 days	Monthly data, used as final data-backed fallback

4. Five-Level Fallback Chain

Level 1: Live provider data (p95 < 40ms — cache hit)
Level 2: Warm cache 15min (background worker recent success)
Level 3: Warm cache 1–6hr (provider degraded but recovering)
Level 4: Ember structural baseline (region carbon baseline, no real-time signal)
Level 5: Static 450 gCO₂/kWh global default (all data paths failed)

// Each fallback level sets:
fallbackUsed: true
syntheticFlag: true (levels 4–5)
qualityTier: "LOW"
leaseExpiresAt: now + 30 minutes

5. Consequences

Positive

Consistent sub-100ms p99 on hot path (measured: 77ms p95)
Measurable, alertable, auditable latency per component
Early signal degradation is surfaced and prevents cascading failures
Clear performance contract with infrastructure and all API consumers

Negative

Last-Known-Good fallback more frequent during provider degradation windows
Budget tracking adds implementation complexity in middleware
Predictive cache warming requires demand pattern modeling (warm-up period)

6. Validation

Verified: 77ms p95 total, 59ms p95 compute across 250 live decision samples. Budget target of 100ms confirmed with 23ms margin. Architecture described in this ADR is the live production implementation as of March 30, 2026.