API Reference
The ModelSwitch API is fully compatible with the OpenAI API specification. Swap your base_url to start routing, optimizing, and monitoring all your LLM traffic through one unified endpoint.
https://api.modelswitchai.tech/v1Authentication
Authenticate using a ModelSwitch API key passed in the Authorization Bearer header. API keys begin with ms_live_ for production and ms_test_ for sandbox.
Quickstart
Zero configuration required. Works with any existing OpenAI SDK.
import OpenAI from "openai";
const client = new OpenAI({
apiKey: process.env.MODELSWITCH_API_KEY,
baseURL: "https://api.modelswitchai.tech/v1",
defaultHeaders: {
"x-routing-mode": "cost", // cost | performance | latency
},
});
const response = await client.chat.completions.create({
model: "auto", // ModelSwitch selects the best model
messages: [
{ role: "user", content: "Analyze the attached quarterly report" }
],
});
// All standard OpenAI fields +
console.log(response.routing_metadata);
// {
// selected_model: "gpt-4o-mini",
// routing_mode: "cost",
// latency_ms: 312,
// cost_saved: "$0.0048",
// reason: "Routed to GPT-4o Mini — 96.9% cheaper...",
// failover_chain: ["gpt-4o-mini", "gpt-3.5-turbo", "gpt-4o"]
// }Request Headers
ModelSwitch reads custom HTTP headers to control routing behavior per request. All headers are optional and have sensible defaults.
| Header | Required | Default | Description |
|---|---|---|---|
| x-routing-mode | Optional | performance | Routing strategy. One of: cost, performance, latency |
| x-failover-chain | Optional | auto | Comma-separated model priority chain for failover |
| x-budget-usd | Optional | unlimited | Maximum USD spend per request. Request rejected if exceeded. |
| x-max-retries | Optional | 2 | Number of failover attempts before returning an error |
| x-timeout-ms | Optional | 30000 | Per-upstream maximum timeout in milliseconds |
Response Headers
Every ModelSwitch response includes the following headers for observability and debugging.
| Header | Type | Description |
|---|---|---|
| x-modelswitch-latency | integer | Total proxy processing time in milliseconds |
| x-modelswitch-model | string | Final model ID selected by the routing engine |
| x-modelswitch-routing-mode | string | Active routing strategy: cost | performance | latency |
| x-modelswitch-request-id | string | Unique request trace ID for debugging and billing |
| x-modelswitch-version | string | ModelSwitch API version (current: 2.4.0) |
Failover Routing
Define custom failover chains to ensure your application survives provider outages and rate limits. ModelSwitch automatically retries with the next model in the chain with zero latency penalty.
// Configure explicit failover chains per request
const response = await client.chat.completions.create({
model: "gpt-4o",
messages: [{ role: "user", content: "..." }],
// ModelSwitch reads the x-failover-chain header
}, {
headers: {
"x-routing-mode": "performance",
"x-failover-chain": "gpt-4o,gpt-4o-mini,gpt-3.5-turbo",
"x-max-retries": "3",
"x-timeout-ms": "5000",
}
});Cost Guardrails
Set hard budget limits on a per-request basis. Requests that would exceed the budget are automatically downgraded to a cheaper model or rejected, depending on your policy.
// Set budget guardrails per request
const response = await client.chat.completions.create({
model: "auto",
messages: [{ role: "user", content: "..." }],
}, {
headers: {
"x-routing-mode": "cost",
"x-budget-usd": "0.01", // max $0.01 per request
"x-max-tokens-budget": "4096", // hard token cap
}
});Python SDK
ModelSwitch is fully compatible with the official openai Python library.
import openai
client = openai.OpenAI(
api_key="ms_live_...",
base_url="https://api.modelswitchai.tech/v1",
default_headers={
"x-routing-mode": "latency",
},
)
response = client.chat.completions.create(
model="auto",
messages=[{"role": "user", "content": "Hello"}],
)
metadata = response.routing_metadata
print(f"Routed to: {metadata['selected_model']}")
print(f"Latency: {metadata['latency_ms']}ms")