ADR-001: Latency Quality Gate
Architecture Decision Record — Five-layer latency control system for the CO₂Router routing hot path.
Measured Results: p95 total 77ms, p95 compute 59ms (250 decision samples). Budget target of 100ms is confirmed. The architecture described in this ADR is the live production implementation.
1. Context
CO₂Router has a contractual p99 latency ceiling of 200ms cross-region and an internal quality gate of 100ms for the routing hot path. This ADR documents the multi-layer latency control architecture implemented to guarantee predictable performance. A governance layer that adds more than 100ms to the critical path of a CI/CD pipeline or a pre-scaling event is not deployable in practice.
2. Decision
Implement a five-layer latency control system that enforces the 100ms hot path budget while preserving signal quality and maintaining full observability.
Layer 1 — Hot Path Budget Enforcement
Deadline budget middleware on all routing endpoints. Total budget 100ms, allocated as follows:
| Component | Budget | Notes |
|---|---|---|
Signal resolution | 40ms | Cache lookup only — no API calls on hot path |
Scoring + selection | 20ms | Multi-objective arbitration |
Governance + lease | 10ms | Policy trace assembly |
Persistence | 0ms blocking | Fire-and-forget async write |
Response serialization | 10ms | Zod validation + JSON |
Buffer | 20ms | 20% safety margin |
If signal resolution exceeds budget, use Last-Known-Good immediately. Never block the response waiting for a provider.
Layer 2 — Cache Architecture Hardening
- Cache warming interval: 15 seconds (reduced from 30s)
- Predictive cache warming — pre-warm regions likely to be requested based on demand patterns
- Write-through cache on all provider responses
- Redis pipeline / batch operations for multi-region lookups
- Local in-process L1 cache (5-second TTL) before Redis — eliminates Redis round-trip for repeat requests
Layer 3 — Provider Call Isolation
Provider calls never happen on the hot path. The full flow is:
Background Worker → Redis Cache → Hot Path // Hot path reads only from: // 1. L1 in-process cache (5s TTL) // 2. Redis warm cache (15min / 1hr / 6hr TTLs) // 3. Last-Known-Good fallback // Circuit breaker for background workers: 5s timeout
Layer 4 — Database Query Optimization
- Redis-based decision ID generation (avoids DB round-trip on hot path)
- Batch persistence operations via
Prisma.$transaction - Connection pool monitoring — alert if utilization exceeds 80%
- All decision writes are fire-and-forget from the hot path
Layer 5 — Observability
Every latency boundary is instrumented:
// Metrics emitted per routing decision co2router.routing.latency.ms // p50, p95, p99 — end-to-end co2router.routing.cache_lookup_ms // per-region, per-tier co2router.routing.scoring_ms // multi-objective scoring time co2router.routing.governance_ms // policy trace assembly time // Alert thresholds p99 > 80ms → WARNING p99 > 100ms → ALERT (hot path budget breach) p99 > 200ms → CRITICAL (SLA breach) // Request header for clients X-CO2Router-Deadline-Ms: <remaining budget ms>
3. Cache TTL Architecture
| Cache Tier | TTL | Triggered By |
|---|---|---|
| L1 in-process | 5 seconds | All hot path reads |
| Redis warm — live | 15 minutes | Background worker, healthy provider |
| Redis warm — degraded | 1 hour | Provider returning stale data |
| Redis warm — fallback | 6 hours | Provider circuit open, LKG applied |
| Last-Known-Good | Until eviction | Fallback when all tiers miss |
| Ember structural baseline | 30 days | Monthly data, used as final data-backed fallback |
4. Five-Level Fallback Chain
Level 1: Live provider data (p95 < 40ms — cache hit) Level 2: Warm cache 15min (background worker recent success) Level 3: Warm cache 1–6hr (provider degraded but recovering) Level 4: Ember structural baseline (region carbon baseline, no real-time signal) Level 5: Static 450 gCO₂/kWh global default (all data paths failed) // Each fallback level sets: fallbackUsed: true syntheticFlag: true (levels 4–5) qualityTier: "LOW" leaseExpiresAt: now + 30 minutes
5. Consequences
Positive
- Consistent sub-100ms p99 on hot path (measured: 77ms p95)
- Measurable, alertable, auditable latency per component
- Early signal degradation is surfaced and prevents cascading failures
- Clear performance contract with infrastructure and all API consumers
Negative
- Last-Known-Good fallback more frequent during provider degradation windows
- Budget tracking adds implementation complexity in middleware
- Predictive cache warming requires demand pattern modeling (warm-up period)
6. Validation
Verified: 77ms p95 total, 59ms p95 compute across 250 live decision samples. Budget target of 100ms confirmed with 23ms margin. Architecture described in this ADR is the live production implementation as of March 30, 2026.