Providers Countries MCP Servers Trends News Calculator Status

Trends

Key metrics across the AI model ecosystem.

Top Average Score

78.0%▲

Claude Instant leads across 4 benchmarks

Average Input Price

$-5761.87/M▼

Per million tokens across all tracked models

Open Source Share

57%▲

236 of 414 models are open source

Max Context Window

2.0M▲

Largest context window available

Average Context

219K▲

Mean context window across all models

Active Providers

55▲

API providers with at least one active model

Benchmark Leaders

Benchmark	Category	Leader	Score
ARC AI2	knowledge	DeepSeek V3	93.7
BBH	reasoning	DeepSeek V3	83.3
GSM8K	math	GPT-4o-mini (2024-07-18)	91.3
HellaSwag	knowledge	Llama 3.1-405B	85.6
LAMBADA	knowledge	Falcon-180B	79.8
MMLU	knowledge	GPT-4o (2024-11-20)	84.1
GPQA diamond	knowledge	Gemini 3.1 Pro Preview	92.1
MATH level 5	math	GPT-5 Chat	98.1
OTIS Mock AIME 2024-2025	math	GPT-5.2 Chat	96.1
WeirdML	coding	Claude Opus 4.6	77.9
Winogrande	knowledge	Llama 3.1-405B	78.4
SimpleBench	reasoning	Gemini 3.1 Pro Preview	75.5
Aider polyglot	coding	GPT-5 Chat	88.0
Lech Mazur Writing	knowledge	Kimi K2 0905	87.3
GSO-Bench	coding	Claude Opus 4.6	33.3
Fiction.LiveBench	knowledge	GPT-5 Chat	97.2
SWE-Bench Verified (Bash Only)	coding	Claude Opus 4.5	74.4
Terminal Bench	coding	Gemini 3.1 Pro Preview	78.4
FrontierMath-2025-02-28-Private	math	GPT-5.4 Pro	50.0
SimpleQA Verified	knowledge	Gemini 3.1 Pro Preview	77.3
FrontierMath-Tier-4-2025-07-01-Private	math	GPT-5.4 Pro	37.5
Chess Puzzles	knowledge	Gemini 3.1 Pro Preview	55.0
APEX-Agents	agentic	GPT-5.4	35.9
OSWorld	agentic	Claude Opus 4.5	66.3
ARC-AGI-2	reasoning	GPT-5.4 Pro	83.3
HLE	knowledge	Gemini 3 Pro	34.4
TriviaQA	knowledge	Llama 2-70B	87.6
ScienceQA	knowledge	Claude 3 Haiku	62.7
PIQA	knowledge	GPT-4o-mini (2024-07-18)	77.4
OpenBookQA	knowledge	phi-3-mini 3.8B	84.0
CadEval	coding	o3	74.0
Balrog	knowledge	Gemini 3 Flash Preview	48.1
GeoBench	knowledge	Gemini 3 Flash Preview	88.0
Cybench	coding	Claude Sonnet 4.5	55.0
ANLI	knowledge	phi-3-small 7.4B	37.1
The Agent Company	agentic	DeepSeek V3.2 Exp	42.9
VideoMME	multimodal	Gemini 1.5 Pro (Feb 2024)	66.7
ARC-AGI	reasoning	Gemini 3.1 Pro Preview	98.0
DeepResearch Bench	knowledge	Claude Sonnet 4.5	52.6
VPCT	knowledge	Gemini 3 Pro	86.5