Intelligent LLM request routing | v1.0.0
{
"mcpServers": {
"llm-router": {
"url": "https://llm-router-mcp.fly.dev/mcp/sse"
}
}
}
No authentication required. Just add the URL to your MCP client config.
| Model | Provider | Cost (in/out per 1K) | Latency | Context | Tier | Capabilities |
|---|---|---|---|---|---|---|
| Gemini 2.0 Flash | $0.00010 / $0.00040 | 300ms | 1,000,000 | economy | general code math reasoning vision multilingual +4 | |
| GPT-4o Mini | openai | $0.00015 / $0.00060 | 350ms | 128,000 | economy | general code math multilingual function_calling summarization +1 |
| Mistral Small | mistral | $0.00020 / $0.00060 | 350ms | 32,000 | economy | general code multilingual function_calling summarization instruction_following |
| DeepSeek V3 | deepseek | $0.00027 / $0.00110 | 700ms | 128,000 | economy | general code math reasoning multilingual function_calling +3 |
| Llama 3.3 70B | meta | $0.00080 / $0.00080 | 600ms | 128,000 | economy | general code math reasoning multilingual function_calling +2 |
| DeepSeek R1 | deepseek | $0.00055 / $0.00220 | 3000ms | 128,000 | standard | general code math reasoning analysis instruction_following |
| Claude 3.5 Haiku | anthropic | $0.00080 / $0.00400 | 400ms | 200,000 | economy | general code math reasoning multilingual long_context +3 |
| o3-mini | openai | $0.00110 / $0.00440 | 2000ms | 200,000 | standard | general code math reasoning analysis instruction_following |
| Gemini 2.0 Pro | $0.00125 / $0.00500 | 1000ms | 2,000,000 | standard | general code math reasoning vision multilingual +6 | |
| Mistral Large | mistral | $0.00200 / $0.00600 | 900ms | 128,000 | standard | general code math reasoning multilingual function_calling +3 |
| GPT-4o | openai | $0.00250 / $0.01000 | 800ms | 128,000 | standard | general code math reasoning vision multilingual +5 |
| Command R+ | cohere | $0.00250 / $0.01000 | 1100ms | 128,000 | standard | general code reasoning multilingual long_context function_calling +3 |
| Claude Sonnet 4 | anthropic | $0.00300 / $0.01500 | 1200ms | 200,000 | standard | general code math reasoning vision multilingual +6 |
| o1 | openai | $0.01500 / $0.06000 | 5000ms | 200,000 | premium | general code math reasoning analysis instruction_following |
| Claude Opus 4 | anthropic | $0.01500 / $0.07500 | 2500ms | 200,000 | premium | general code math reasoning vision multilingual +6 |
| Strategy | Description | Use When |
|---|---|---|
| cost | Route to the cheapest model that meets all capability and constraint requirements. | You want to minimize API spend while still getting acceptable quality. Good for batch processing, summarization, or tasks where the cheapest adequate model is fine. |
| latency | Route to the fastest model (lowest average latency) that meets requirements. | You need the fastest response time. Good for real-time applications, chatbots, or latency-sensitive pipelines. |
| fallback | Try the best model first, with automatic fallback to alternatives if it is rate-limited or down. | You want reliability. The router picks the best model and provides ordered fallbacks in case of availability issues. |
| capability | Route to the model with the best capability match for your requirements. | You have specific capability needs (e.g., vision, code, math, long_context) and want the model that best matches all of them. |
| smart | Analyze the prompt content to detect task type (code, math, creative writing, etc.) and auto-select the best model based on inferred requirements. | You do not know what model to use. The router analyzes your prompt and picks intelligently. |
# Check available models curl https://llm-router-mcp.fly.dev/models # View routing strategies curl https://llm-router-mcp.fly.dev/strategies # Service stats curl https://llm-router-mcp.fly.dev/stats # SKILL.md for agents curl https://llm-router-mcp.fly.dev/SKILL.md