LLM Router MCP Server

Intelligent LLM request routing | v1.0.0

Models

Providers

Strategies

MCP Tools

MCP Configuration

{
  "mcpServers": {
    "llm-router": {
      "url": "https://llm-router-mcp.fly.dev/mcp/sse"
    }
  }
}

No authentication required. Just add the URL to your MCP client config.

Model Registry

Model	Provider	Cost (in/out per 1K)	Latency	Context	Tier	Capabilities
Gemini 2.0 Flash	google	$0.00010 / $0.00040	300ms	1,000,000	economy	general code math reasoning vision multilingual +4
GPT-4o Mini	openai	$0.00015 / $0.00060	350ms	128,000	economy	general code math multilingual function_calling summarization +1
Mistral Small	mistral	$0.00020 / $0.00060	350ms	32,000	economy	general code multilingual function_calling summarization instruction_following
DeepSeek V3	deepseek	$0.00027 / $0.00110	700ms	128,000	economy	general code math reasoning multilingual function_calling +3
Llama 3.3 70B	meta	$0.00080 / $0.00080	600ms	128,000	economy	general code math reasoning multilingual function_calling +2
DeepSeek R1	deepseek	$0.00055 / $0.00220	3000ms	128,000	standard	general code math reasoning analysis instruction_following
Claude 3.5 Haiku	anthropic	$0.00080 / $0.00400	400ms	200,000	economy	general code math reasoning multilingual long_context +3
o3-mini	openai	$0.00110 / $0.00440	2000ms	200,000	standard	general code math reasoning analysis instruction_following
Gemini 2.0 Pro	google	$0.00125 / $0.00500	1000ms	2,000,000	standard	general code math reasoning vision multilingual +6
Mistral Large	mistral	$0.00200 / $0.00600	900ms	128,000	standard	general code math reasoning multilingual function_calling +3
GPT-4o	openai	$0.00250 / $0.01000	800ms	128,000	standard	general code math reasoning vision multilingual +5
Command R+	cohere	$0.00250 / $0.01000	1100ms	128,000	standard	general code reasoning multilingual long_context function_calling +3
Claude Sonnet 4	anthropic	$0.00300 / $0.01500	1200ms	200,000	standard	general code math reasoning vision multilingual +6
o1	openai	$0.01500 / $0.06000	5000ms	200,000	premium	general code math reasoning analysis instruction_following
Claude Opus 4	anthropic	$0.01500 / $0.07500	2500ms	200,000	premium	general code math reasoning vision multilingual +6

Routing Strategies

Strategy	Description	Use When
cost	Route to the cheapest model that meets all capability and constraint requirements.	You want to minimize API spend while still getting acceptable quality. Good for batch processing, summarization, or tasks where the cheapest adequate model is fine.
latency	Route to the fastest model (lowest average latency) that meets requirements.	You need the fastest response time. Good for real-time applications, chatbots, or latency-sensitive pipelines.
fallback	Try the best model first, with automatic fallback to alternatives if it is rate-limited or down.	You want reliability. The router picks the best model and provides ordered fallbacks in case of availability issues.
capability	Route to the model with the best capability match for your requirements.	You have specific capability needs (e.g., vision, code, math, long_context) and want the model that best matches all of them.
smart	Analyze the prompt content to detect task type (code, math, creative writing, etc.) and auto-select the best model based on inferred requirements.	You do not know what model to use. The router analyzes your prompt and picks intelligently.

Quick Start

# Check available models
curl https://llm-router-mcp.fly.dev/models

# View routing strategies
curl https://llm-router-mcp.fly.dev/strategies

# Service stats
curl https://llm-router-mcp.fly.dev/stats

# SKILL.md for agents
curl https://llm-router-mcp.fly.dev/SKILL.md