Accepted · Implemented · Verified

ADR-001: Latency Quality Gate

Architecture Decision Record — Five-layer latency control system for the CO₂Router routing hot path.

p95 Total (Measured)
77ms
p95 Compute (Measured)
59ms
Hot Path Budget
100ms
p99 SLA Contract
≤200ms

Measured Results: p95 total 77ms, p95 compute 59ms (250 decision samples). Budget target of 100ms is confirmed. The architecture described in this ADR is the live production implementation.

1. Context

CO₂Router has a contractual p99 latency ceiling of 200ms cross-region and an internal quality gate of 100ms for the routing hot path. This ADR documents the multi-layer latency control architecture implemented to guarantee predictable performance. A governance layer that adds more than 100ms to the critical path of a CI/CD pipeline or a pre-scaling event is not deployable in practice.

2. Decision

Implement a five-layer latency control system that enforces the 100ms hot path budget while preserving signal quality and maintaining full observability.

Layer 1 — Hot Path Budget Enforcement

Deadline budget middleware on all routing endpoints. Total budget 100ms, allocated as follows:

ComponentBudgetNotes
Signal resolution40msCache lookup only — no API calls on hot path
Scoring + selection20msMulti-objective arbitration
Governance + lease10msPolicy trace assembly
Persistence0ms blockingFire-and-forget async write
Response serialization10msZod validation + JSON
Buffer20ms20% safety margin

If signal resolution exceeds budget, use Last-Known-Good immediately. Never block the response waiting for a provider.

Layer 2 — Cache Architecture Hardening

  • Cache warming interval: 15 seconds (reduced from 30s)
  • Predictive cache warming — pre-warm regions likely to be requested based on demand patterns
  • Write-through cache on all provider responses
  • Redis pipeline / batch operations for multi-region lookups
  • Local in-process L1 cache (5-second TTL) before Redis — eliminates Redis round-trip for repeat requests

Layer 3 — Provider Call Isolation

Provider calls never happen on the hot path. The full flow is:

Background Worker → Redis Cache → Hot Path

// Hot path reads only from:
// 1. L1 in-process cache (5s TTL)
// 2. Redis warm cache (15min / 1hr / 6hr TTLs)
// 3. Last-Known-Good fallback

// Circuit breaker for background workers: 5s timeout

Layer 4 — Database Query Optimization

  • Redis-based decision ID generation (avoids DB round-trip on hot path)
  • Batch persistence operations via Prisma.$transaction
  • Connection pool monitoring — alert if utilization exceeds 80%
  • All decision writes are fire-and-forget from the hot path

Layer 5 — Observability

Every latency boundary is instrumented:

// Metrics emitted per routing decision
co2router.routing.latency.ms          // p50, p95, p99 — end-to-end
co2router.routing.cache_lookup_ms     // per-region, per-tier
co2router.routing.scoring_ms          // multi-objective scoring time
co2router.routing.governance_ms       // policy trace assembly time

// Alert thresholds
p99 > 80ms  → WARNING
p99 > 100ms → ALERT (hot path budget breach)
p99 > 200ms → CRITICAL (SLA breach)

// Request header for clients
X-CO2Router-Deadline-Ms: <remaining budget ms>

3. Cache TTL Architecture

Cache TierTTLTriggered By
L1 in-process5 secondsAll hot path reads
Redis warm — live15 minutesBackground worker, healthy provider
Redis warm — degraded1 hourProvider returning stale data
Redis warm — fallback6 hoursProvider circuit open, LKG applied
Last-Known-GoodUntil evictionFallback when all tiers miss
Ember structural baseline30 daysMonthly data, used as final data-backed fallback

4. Five-Level Fallback Chain

Level 1: Live provider data (p95 < 40ms — cache hit)
Level 2: Warm cache 15min (background worker recent success)
Level 3: Warm cache 1–6hr (provider degraded but recovering)
Level 4: Ember structural baseline (region carbon baseline, no real-time signal)
Level 5: Static 450 gCO₂/kWh global default (all data paths failed)

// Each fallback level sets:
fallbackUsed: true
syntheticFlag: true (levels 4–5)
qualityTier: "LOW"
leaseExpiresAt: now + 30 minutes

5. Consequences

Positive

  • Consistent sub-100ms p99 on hot path (measured: 77ms p95)
  • Measurable, alertable, auditable latency per component
  • Early signal degradation is surfaced and prevents cascading failures
  • Clear performance contract with infrastructure and all API consumers

Negative

  • Last-Known-Good fallback more frequent during provider degradation windows
  • Budget tracking adds implementation complexity in middleware
  • Predictive cache warming requires demand pattern modeling (warm-up period)

6. Validation

Verified: 77ms p95 total, 59ms p95 compute across 250 live decision samples. Budget target of 100ms confirmed with 23ms margin. Architecture described in this ADR is the live production implementation as of March 30, 2026.