Release History

Changelog

A detailed history of every ModelSwitch release. Subscribe to the RSS feed for release notifications.

v2.4.0

LatestMarch 6, 2025

New

Latency routing mode now uses live P95 telemetry per region rather than static averages

New

Added `x-budget-usd` header for per-request cost guardrails with automatic model fallback

New

Semantic cache layer: identical and near-duplicate prompts reuse cached completions (opt-in)

Improved

Proxy overhead reduced from 18ms to 6ms average through WASM routing core

Improved

Failover chain now executes in parallel after 2s primary timeout instead of sequential

Fixed

Streaming responses no longer drop final `[DONE]` token on long outputs

Fixed

DPA auto-attachment now correctly includes APAC SCCs for Singapore-homed accounts

v2.3.0

StableFebruary 14, 2025

New

GPT-4o and GPT-4o Mini added to the official model registry with real-time pricing data

New

EU-West (Frankfurt) PoP launched, reducing median latency for European customers by 38%

New

Webhook alerts for budget threshold breaches (configurable at 50%, 80%, 100%)

Improved

routing_metadata object now includes `failover_chain` array showing evaluated candidates

Improved

API key rotation generates replacement before invalidating old key (zero-downtime rotation)

Fixed

`top_p` and `frequency_penalty` parameters were being dropped on cost-routed requests

v2.2.0

StableJanuary 28, 2025

New

SOC2 Type II audit completed. Certificate available in Trust Center.

New

Python SDK (openai-compatible) now documented with ModelSwitch header examples

New

Cost routing mode: token budget projection now estimates output tokens using input length heuristic

Improved

Request logs are now queryable via the dashboard with up to 30-day retention on Scale+ plans

Fixed

Correct HTTP 429 propagation when upstream provider rate limits trigger on failover chain tail

v2.1.0

StableJanuary 10, 2025

New

Multi-region request routing: requests are now load-balanced across the nearest available PoP

New

`x-failover-chain` header allows per-request custom model priority ordering

Improved

Dashboard redesigned with bento-style layout and live cost attribution by project

Improved

Model registry expanded with context window and per-capability metadata

v2.0.0

MajorDecember 18, 2024

New

Complete rewrite of the routing engine in Rust/WASM for sub-10ms overhead

New

Three routing modes introduced: cost, performance, latency

New

`routing_metadata` field added to all chat completion responses

New

Custom response headers: `x-modelswitch-latency`, `x-modelswitch-model`, and others

Breaking

API key format changed from `sk-ms-` prefix to `ms_live_` / `ms_test_` prefix. Old keys deprecated March 2025.

Breaking

Streaming endpoint path changed from `/v1/stream` to unified `/v1/chat/completions` with SSE