Model Benchmarks

Compare the performance of different AI models across standardized benchmarks. Higher scores generally indicate better performance, but context matters for each benchmark.

Benchmarks

Massive Multitask Language Understanding (MMLU)

MMLU evaluates the model's language understanding across a wide range of tasks, reflecting its versatility in natural language processing.

Rank	Model	Provider	Score (percent)	Date	Source
1	GPT-3	OpenAI	92.30	May 4, 2025	View
2	Minimax-Text-01	MiniMax	88.50	May 4, 2025	View
3	Claude 3	Anthropic	86.80	May 4, 2025	View
4	GPT-4	OpenAI	86.50	May 4, 2025	View
5	Qwen-14B	Alibaba Cloud	84.20	May 4, 2025	View
6	Qwen 2	Alibaba Cloud	84.20	May 4, 2025	View
7	Mistral Large	Mistral AI	84.00	May 4, 2025	View
8	Claude 2	Anthropic	78.50	May 4, 2025	View
9	Cohere Command	Cohere	78.50	May 4, 2025	View
10	Grok 1	xAI	73.00	May 4, 2025	View
11	DeepSeek-LLM	DeepSeek	71.30	May 4, 2025	View
12	Chinchilla	Google DeepMind	67.60	May 4, 2025	View
13	LLaMA	Meta AI	63.40	May 4, 2025	View
14	Phi-2	Microsoft	56.70	May 4, 2025	View
15	Qwen-7B	Alibaba Cloud	56.70	May 4, 2025	View
16	Galactica	Meta AI	52.60	May 4, 2025	View
17	GLM-130B	Tsinghua University	44.80	May 4, 2025	View