LLM Router MCP Server

Intelligent LLM request routing | v1.0.0

15
Models
7
Providers
5
Strategies
6
MCP Tools

MCP Configuration

{
  "mcpServers": {
    "llm-router": {
      "url": "https://llm-router-mcp.fly.dev/mcp/sse"
    }
  }
}

No authentication required. Just add the URL to your MCP client config.

Model Registry

Model Provider Cost (in/out per 1K) Latency Context Tier Capabilities
Gemini 2.0 Flash google $0.00010 / $0.00040 300ms 1,000,000 economy general code math reasoning vision multilingual +4
GPT-4o Mini openai $0.00015 / $0.00060 350ms 128,000 economy general code math multilingual function_calling summarization +1
Mistral Small mistral $0.00020 / $0.00060 350ms 32,000 economy general code multilingual function_calling summarization instruction_following
DeepSeek V3 deepseek $0.00027 / $0.00110 700ms 128,000 economy general code math reasoning multilingual function_calling +3
Llama 3.3 70B meta $0.00080 / $0.00080 600ms 128,000 economy general code math reasoning multilingual function_calling +2
DeepSeek R1 deepseek $0.00055 / $0.00220 3000ms 128,000 standard general code math reasoning analysis instruction_following
Claude 3.5 Haiku anthropic $0.00080 / $0.00400 400ms 200,000 economy general code math reasoning multilingual long_context +3
o3-mini openai $0.00110 / $0.00440 2000ms 200,000 standard general code math reasoning analysis instruction_following
Gemini 2.0 Pro google $0.00125 / $0.00500 1000ms 2,000,000 standard general code math reasoning vision multilingual +6
Mistral Large mistral $0.00200 / $0.00600 900ms 128,000 standard general code math reasoning multilingual function_calling +3
GPT-4o openai $0.00250 / $0.01000 800ms 128,000 standard general code math reasoning vision multilingual +5
Command R+ cohere $0.00250 / $0.01000 1100ms 128,000 standard general code reasoning multilingual long_context function_calling +3
Claude Sonnet 4 anthropic $0.00300 / $0.01500 1200ms 200,000 standard general code math reasoning vision multilingual +6
o1 openai $0.01500 / $0.06000 5000ms 200,000 premium general code math reasoning analysis instruction_following
Claude Opus 4 anthropic $0.01500 / $0.07500 2500ms 200,000 premium general code math reasoning vision multilingual +6

Routing Strategies

Strategy Description Use When
cost Route to the cheapest model that meets all capability and constraint requirements. You want to minimize API spend while still getting acceptable quality. Good for batch processing, summarization, or tasks where the cheapest adequate model is fine.
latency Route to the fastest model (lowest average latency) that meets requirements. You need the fastest response time. Good for real-time applications, chatbots, or latency-sensitive pipelines.
fallback Try the best model first, with automatic fallback to alternatives if it is rate-limited or down. You want reliability. The router picks the best model and provides ordered fallbacks in case of availability issues.
capability Route to the model with the best capability match for your requirements. You have specific capability needs (e.g., vision, code, math, long_context) and want the model that best matches all of them.
smart Analyze the prompt content to detect task type (code, math, creative writing, etc.) and auto-select the best model based on inferred requirements. You do not know what model to use. The router analyzes your prompt and picks intelligently.

Quick Start

# Check available models
curl https://llm-router-mcp.fly.dev/models

# View routing strategies
curl https://llm-router-mcp.fly.dev/strategies

# Service stats
curl https://llm-router-mcp.fly.dev/stats

# SKILL.md for agents
curl https://llm-router-mcp.fly.dev/SKILL.md