Changelog
A detailed history of every ModelSwitch release. Subscribe to the RSS feed for release notifications.
v2.4.0
LatestMarch 6, 2025Latency routing mode now uses live P95 telemetry per region rather than static averages
Added `x-budget-usd` header for per-request cost guardrails with automatic model fallback
Semantic cache layer: identical and near-duplicate prompts reuse cached completions (opt-in)
Proxy overhead reduced from 18ms to 6ms average through WASM routing core
Failover chain now executes in parallel after 2s primary timeout instead of sequential
Streaming responses no longer drop final `[DONE]` token on long outputs
DPA auto-attachment now correctly includes APAC SCCs for Singapore-homed accounts
v2.3.0
StableFebruary 14, 2025GPT-4o and GPT-4o Mini added to the official model registry with real-time pricing data
EU-West (Frankfurt) PoP launched, reducing median latency for European customers by 38%
Webhook alerts for budget threshold breaches (configurable at 50%, 80%, 100%)
routing_metadata object now includes `failover_chain` array showing evaluated candidates
API key rotation generates replacement before invalidating old key (zero-downtime rotation)
`top_p` and `frequency_penalty` parameters were being dropped on cost-routed requests
v2.2.0
StableJanuary 28, 2025SOC2 Type II audit completed. Certificate available in Trust Center.
Python SDK (openai-compatible) now documented with ModelSwitch header examples
Cost routing mode: token budget projection now estimates output tokens using input length heuristic
Request logs are now queryable via the dashboard with up to 30-day retention on Scale+ plans
Correct HTTP 429 propagation when upstream provider rate limits trigger on failover chain tail
v2.1.0
StableJanuary 10, 2025Multi-region request routing: requests are now load-balanced across the nearest available PoP
`x-failover-chain` header allows per-request custom model priority ordering
Dashboard redesigned with bento-style layout and live cost attribution by project
Model registry expanded with context window and per-capability metadata
v2.0.0
MajorDecember 18, 2024Complete rewrite of the routing engine in Rust/WASM for sub-10ms overhead
Three routing modes introduced: cost, performance, latency
`routing_metadata` field added to all chat completion responses
Custom response headers: `x-modelswitch-latency`, `x-modelswitch-model`, and others
API key format changed from `sk-ms-` prefix to `ms_live_` / `ms_test_` prefix. Old keys deprecated March 2025.
Streaming endpoint path changed from `/v1/stream` to unified `/v1/chat/completions` with SSE