Beta

Trends

Key metrics across the AI model ecosystem.

Top Average Score
78.0%â–²
Claude Instant leads across 4 benchmarks
Average Input Price
$-5761.87/Mâ–¼
Per million tokens across all tracked models
Open Source Share
57%â–²
236 of 414 models are open source
Max Context Window
2.0Mâ–²
Largest context window available
Average Context
219Kâ–²
Mean context window across all models
Active Providers
55â–²
API providers with at least one active model

Benchmark Leaders

BenchmarkLeaderScore
ARC AI2DeepSeek V393.7
BBHDeepSeek V383.3
GSM8KGPT-4o-mini (2024-07-18)91.3
HellaSwagLlama 3.1-405B85.6
LAMBADAFalcon-180B79.8
MMLUGPT-4o (2024-11-20)84.1
GPQA diamondGemini 3.1 Pro Preview92.1
MATH level 5GPT-5 Chat98.1
OTIS Mock AIME 2024-2025GPT-5.2 Chat96.1
WeirdMLClaude Opus 4.677.9
WinograndeLlama 3.1-405B78.4
SimpleBenchGemini 3.1 Pro Preview75.5
Aider polyglotGPT-5 Chat88.0
Lech Mazur WritingKimi K2 090587.3
GSO-BenchClaude Opus 4.633.3
Fiction.LiveBenchGPT-5 Chat97.2
SWE-Bench Verified (Bash Only)Claude Opus 4.574.4
Terminal BenchGemini 3.1 Pro Preview78.4
FrontierMath-2025-02-28-PrivateGPT-5.4 Pro50.0
SimpleQA VerifiedGemini 3.1 Pro Preview77.3
FrontierMath-Tier-4-2025-07-01-PrivateGPT-5.4 Pro37.5
Chess PuzzlesGemini 3.1 Pro Preview55.0
APEX-AgentsGPT-5.435.9
OSWorldClaude Opus 4.566.3
ARC-AGI-2GPT-5.4 Pro83.3
HLEGemini 3 Pro34.4
TriviaQALlama 2-70B87.6
ScienceQAClaude 3 Haiku62.7
PIQAGPT-4o-mini (2024-07-18)77.4
OpenBookQAphi-3-mini 3.8B84.0
CadEvalo374.0
BalrogGemini 3 Flash Preview48.1
GeoBenchGemini 3 Flash Preview88.0
CybenchClaude Sonnet 4.555.0
ANLIphi-3-small 7.4B37.1
The Agent CompanyDeepSeek V3.2 Exp42.9
VideoMMEGemini 1.5 Pro (Feb 2024)66.7
ARC-AGIGemini 3.1 Pro Preview98.0
DeepResearch BenchClaude Sonnet 4.552.6
VPCTGemini 3 Pro86.5