API Reference v2.4

API Reference

The ModelSwitch API is fully compatible with the OpenAI API specification. Swap your base_url to start routing, optimizing, and monitoring all your LLM traffic through one unified endpoint.

Base URL

https://api.modelswitchai.tech/v1

OpenAI Compatible

Authentication

Authenticate using a ModelSwitch API key passed in the Authorization Bearer header. API keys begin with ms_live_ for production and ms_test_ for sandbox.

# All requests require Bearer authentication
Authorization: Bearer ms_live_xxxxxxxxxxxxxxxxx

Quickstart

Zero configuration required. Works with any existing OpenAI SDK.

Node.js / TypeScriptPythoncURL

import OpenAI from "openai";

const client = new OpenAI({
  apiKey: process.env.MODELSWITCH_API_KEY,
  baseURL: "https://api.modelswitchai.tech/v1",
  defaultHeaders: {
    "x-routing-mode": "cost", // cost | performance | latency
  },
});

const response = await client.chat.completions.create({
  model: "auto",           // ModelSwitch selects the best model
  messages: [
    { role: "user", content: "Analyze the attached quarterly report" }
  ],
});

// All standard OpenAI fields +
console.log(response.routing_metadata);
// {
//   selected_model: "gpt-4o-mini",
//   routing_mode: "cost",
//   latency_ms: 312,
//   cost_saved: "$0.0048",
//   reason: "Routed to GPT-4o Mini — 96.9% cheaper...",
//   failover_chain: ["gpt-4o-mini", "gpt-3.5-turbo", "gpt-4o"]
// }

Request Headers

ModelSwitch reads custom HTTP headers to control routing behavior per request. All headers are optional and have sensible defaults.

Header	Required	Default	Description
x-routing-mode	Optional	performance	Routing strategy. One of: cost, performance, latency
x-failover-chain	Optional	auto	Comma-separated model priority chain for failover
x-budget-usd	Optional	unlimited	Maximum USD spend per request. Request rejected if exceeded.
x-max-retries	Optional	2	Number of failover attempts before returning an error
x-timeout-ms	Optional	30000	Per-upstream maximum timeout in milliseconds

Response Headers

Every ModelSwitch response includes the following headers for observability and debugging.

Header	Type	Description
x-modelswitch-latency	integer	Total proxy processing time in milliseconds
x-modelswitch-model	string	Final model ID selected by the routing engine
x-modelswitch-routing-mode	string	Active routing strategy: cost \| performance \| latency
x-modelswitch-request-id	string	Unique request trace ID for debugging and billing
x-modelswitch-version	string	ModelSwitch API version (current: 2.4.0)

Failover Routing

Define custom failover chains to ensure your application survives provider outages and rate limits. ModelSwitch automatically retries with the next model in the chain with zero latency penalty.

// Configure explicit failover chains per request
const response = await client.chat.completions.create({
  model: "gpt-4o",
  messages: [{ role: "user", content: "..." }],
  // ModelSwitch reads the x-failover-chain header
}, {
  headers: {
    "x-routing-mode": "performance",
    "x-failover-chain": "gpt-4o,gpt-4o-mini,gpt-3.5-turbo",
    "x-max-retries": "3",
    "x-timeout-ms": "5000",
  }
});

Cost Guardrails

Set hard budget limits on a per-request basis. Requests that would exceed the budget are automatically downgraded to a cheaper model or rejected, depending on your policy.

// Set budget guardrails per request
const response = await client.chat.completions.create({
  model: "auto",
  messages: [{ role: "user", content: "..." }],
}, {
  headers: {
    "x-routing-mode": "cost",
    "x-budget-usd": "0.01",       // max $0.01 per request
    "x-max-tokens-budget": "4096", // hard token cap
  }
});

Python SDK

ModelSwitch is fully compatible with the official openai Python library.

import openai

client = openai.OpenAI(
    api_key="ms_live_...",
    base_url="https://api.modelswitchai.tech/v1",
    default_headers={
        "x-routing-mode": "latency",
    },
)

response = client.chat.completions.create(
    model="auto",
    messages=[{"role": "user", "content": "Hello"}],
)

metadata = response.routing_metadata
print(f"Routed to: {metadata['selected_model']}")
print(f"Latency: {metadata['latency_ms']}ms")

Error Codes

400

invalid_request_error

Malformed request body. Check the messages field.

401

authentication_error

Invalid or missing API key.

429

rate_limit_error

Request rate exceeded. Upgrade your plan for higher limits.

500

internal_error

Internal proxy error. All upstream models were unreachable.

503

upstream_error

Upstream model provider returned an unexpected error.