Trends
Key metrics across the AI model ecosystem.
Top Average Score
78.0%â–²
Claude Instant leads across 4 benchmarks
Average Input Price
$-5761.87/Mâ–¼
Per million tokens across all tracked models
Open Source Share
57%â–²
236 of 414 models are open source
Max Context Window
2.0Mâ–²
Largest context window available
Average Context
219Kâ–²
Mean context window across all models
Active Providers
55â–²
API providers with at least one active model
Benchmark Leaders
| Benchmark | Leader | Score |
|---|---|---|
| ARC AI2 | DeepSeek V3 | 93.7 |
| BBH | DeepSeek V3 | 83.3 |
| GSM8K | GPT-4o-mini (2024-07-18) | 91.3 |
| HellaSwag | Llama 3.1-405B | 85.6 |
| LAMBADA | Falcon-180B | 79.8 |
| MMLU | GPT-4o (2024-11-20) | 84.1 |
| GPQA diamond | Gemini 3.1 Pro Preview | 92.1 |
| MATH level 5 | GPT-5 Chat | 98.1 |
| OTIS Mock AIME 2024-2025 | GPT-5.2 Chat | 96.1 |
| WeirdML | Claude Opus 4.6 | 77.9 |
| Winogrande | Llama 3.1-405B | 78.4 |
| SimpleBench | Gemini 3.1 Pro Preview | 75.5 |
| Aider polyglot | GPT-5 Chat | 88.0 |
| Lech Mazur Writing | Kimi K2 0905 | 87.3 |
| GSO-Bench | Claude Opus 4.6 | 33.3 |
| Fiction.LiveBench | GPT-5 Chat | 97.2 |
| SWE-Bench Verified (Bash Only) | Claude Opus 4.5 | 74.4 |
| Terminal Bench | Gemini 3.1 Pro Preview | 78.4 |
| FrontierMath-2025-02-28-Private | GPT-5.4 Pro | 50.0 |
| SimpleQA Verified | Gemini 3.1 Pro Preview | 77.3 |
| FrontierMath-Tier-4-2025-07-01-Private | GPT-5.4 Pro | 37.5 |
| Chess Puzzles | Gemini 3.1 Pro Preview | 55.0 |
| APEX-Agents | GPT-5.4 | 35.9 |
| OSWorld | Claude Opus 4.5 | 66.3 |
| ARC-AGI-2 | GPT-5.4 Pro | 83.3 |
| HLE | Gemini 3 Pro | 34.4 |
| TriviaQA | Llama 2-70B | 87.6 |
| ScienceQA | Claude 3 Haiku | 62.7 |
| PIQA | GPT-4o-mini (2024-07-18) | 77.4 |
| OpenBookQA | phi-3-mini 3.8B | 84.0 |
| CadEval | o3 | 74.0 |
| Balrog | Gemini 3 Flash Preview | 48.1 |
| GeoBench | Gemini 3 Flash Preview | 88.0 |
| Cybench | Claude Sonnet 4.5 | 55.0 |
| ANLI | phi-3-small 7.4B | 37.1 |
| The Agent Company | DeepSeek V3.2 Exp | 42.9 |
| VideoMME | Gemini 1.5 Pro (Feb 2024) | 66.7 |
| ARC-AGI | Gemini 3.1 Pro Preview | 98.0 |
| DeepResearch Bench | Claude Sonnet 4.5 | 52.6 |
| VPCT | Gemini 3 Pro | 86.5 |