API Reference v2.4

API Reference

The ModelSwitch API is fully compatible with the OpenAI API specification. Swap your base_url to start routing, optimizing, and monitoring all your LLM traffic through one unified endpoint.

Base URL
https://api.modelswitchai.tech/v1
OpenAI Compatible

Authentication

Authenticate using a ModelSwitch API key passed in the Authorization Bearer header. API keys begin with ms_live_ for production and ms_test_ for sandbox.

# All requests require Bearer authentication
Authorization: Bearer ms_live_xxxxxxxxxxxxxxxxx

Quickstart

Zero configuration required. Works with any existing OpenAI SDK.

Node.js / TypeScriptPythoncURL
import OpenAI from "openai";

const client = new OpenAI({
  apiKey: process.env.MODELSWITCH_API_KEY,
  baseURL: "https://api.modelswitchai.tech/v1",
  defaultHeaders: {
    "x-routing-mode": "cost", // cost | performance | latency
  },
});

const response = await client.chat.completions.create({
  model: "auto",           // ModelSwitch selects the best model
  messages: [
    { role: "user", content: "Analyze the attached quarterly report" }
  ],
});

// All standard OpenAI fields +
console.log(response.routing_metadata);
// {
//   selected_model: "gpt-4o-mini",
//   routing_mode: "cost",
//   latency_ms: 312,
//   cost_saved: "$0.0048",
//   reason: "Routed to GPT-4o Mini — 96.9% cheaper...",
//   failover_chain: ["gpt-4o-mini", "gpt-3.5-turbo", "gpt-4o"]
// }

Request Headers

ModelSwitch reads custom HTTP headers to control routing behavior per request. All headers are optional and have sensible defaults.

HeaderRequiredDefaultDescription
x-routing-modeOptionalperformanceRouting strategy. One of: cost, performance, latency
x-failover-chainOptionalautoComma-separated model priority chain for failover
x-budget-usdOptionalunlimitedMaximum USD spend per request. Request rejected if exceeded.
x-max-retriesOptional2Number of failover attempts before returning an error
x-timeout-msOptional30000Per-upstream maximum timeout in milliseconds

Response Headers

Every ModelSwitch response includes the following headers for observability and debugging.

HeaderTypeDescription
x-modelswitch-latencyintegerTotal proxy processing time in milliseconds
x-modelswitch-modelstringFinal model ID selected by the routing engine
x-modelswitch-routing-modestringActive routing strategy: cost | performance | latency
x-modelswitch-request-idstringUnique request trace ID for debugging and billing
x-modelswitch-versionstringModelSwitch API version (current: 2.4.0)

Failover Routing

Define custom failover chains to ensure your application survives provider outages and rate limits. ModelSwitch automatically retries with the next model in the chain with zero latency penalty.

// Configure explicit failover chains per request
const response = await client.chat.completions.create({
  model: "gpt-4o",
  messages: [{ role: "user", content: "..." }],
  // ModelSwitch reads the x-failover-chain header
}, {
  headers: {
    "x-routing-mode": "performance",
    "x-failover-chain": "gpt-4o,gpt-4o-mini,gpt-3.5-turbo",
    "x-max-retries": "3",
    "x-timeout-ms": "5000",
  }
});

Cost Guardrails

Set hard budget limits on a per-request basis. Requests that would exceed the budget are automatically downgraded to a cheaper model or rejected, depending on your policy.

// Set budget guardrails per request
const response = await client.chat.completions.create({
  model: "auto",
  messages: [{ role: "user", content: "..." }],
}, {
  headers: {
    "x-routing-mode": "cost",
    "x-budget-usd": "0.01",       // max $0.01 per request
    "x-max-tokens-budget": "4096", // hard token cap
  }
});

Python SDK

ModelSwitch is fully compatible with the official openai Python library.

import openai

client = openai.OpenAI(
    api_key="ms_live_...",
    base_url="https://api.modelswitchai.tech/v1",
    default_headers={
        "x-routing-mode": "latency",
    },
)

response = client.chat.completions.create(
    model="auto",
    messages=[{"role": "user", "content": "Hello"}],
)

metadata = response.routing_metadata
print(f"Routed to: {metadata['selected_model']}")
print(f"Latency: {metadata['latency_ms']}ms")

Error Codes

400
invalid_request_error
Malformed request body. Check the messages field.
401
authentication_error
Invalid or missing API key.
429
rate_limit_error
Request rate exceeded. Upgrade your plan for higher limits.
500
internal_error
Internal proxy error. All upstream models were unreachable.
503
upstream_error
Upstream model provider returned an unexpected error.