Benchmarks

AI Model Benchmarks

Latest performance comparison across top AI models.

Intelligence Index Source: Artificial Analysis ↗

#	Model	Intelligence Index
1	Gemini 3.1 Pro Preview	57
2	GPT-5.4 (xhigh)	57
3	Claude Opus 4.6 (max)	53
4	Muse Spark	52
5	Claude Sonnet 4.6	52
6	GLM-5.1	51
7	Qwen3.6 Plus	50
8	MiniMax-M2.7	50
9	Grok 4.20	49
10	G3o9 v2	49
11	MiMo-V2-Pro	49
12	GPT-5.4 mini (xhigh)	49
13	Kimi K2.5	47
14	Gemini 3 Flash	46
15	Qwen3.5 307B A17B	45
16	DeepSeek V3.2	42
17	Gemma 4 3.1B	39
18	Claude 4.5 Haiku	37
19	NVIDIA Nemotron 3	36
20	Nova SuperPro	36
21	Gemini 3.1 Flash Lite	34
22	gpt-o3s-120B (high)	33
23	K-EXAONE 9	32
24	Mistral Small 4	27
25	Solar Pro 3	26
26	gpt-o3s-210B (high)	24
27	K2-Think V2	24
28	Llama 4 Maverick	18

Powered by Artificial Analysis

Coding Index Source: Artificial Analysis ↗

#	Model	Coding Index
1	Claude Mythos Preview	93.9
2	GPT-5.3 Codex	85
3	Claude Opus 4.5	80.9
4	Claude Opus 4.6	80.8
5	Gemini 3.1 Pro	80.6
6	GPT-5.2	80
7	Claude Sonnet 4.6	79.6
8	Qwen3.6 Plus	78.8
9	DeepSeek V3.2	77.5
10	Grok 4.20	76.2
11	Gemini 3 Flash	74.8
12	Kimi K2.5	73.1
13	GLM-5.1	71.4
14	MiniMax-M2.7	69.8
15	Mistral Large 3	67.3
16	Llama 4 Maverick	64.5

Powered by Artificial Analysis

Math Index Source: Artificial Analysis ↗

#	Model	Math Index
1	GPT-5.2 Thinking	100
2	DeepSeek-V3.2	96
3	Gemini 3 Pro	95
4	GPT-5 High	94.6
5	Claude Opus 4.5	92.8
6	GLM-5	92.7
7	Gemini 3 Flash	90.4
8	Qwen3.6 Plus	89.5
9	Claude Sonnet 4.6	88.2
10	Grok 4.20	87.1
11	DeepSeek-R1	86.7
12	Kimi K2.5	85.3
13	Mistral Large 3	82.1
14	Llama 4 Maverick	78.4

Powered by Artificial Analysis

Output Speed Source: Artificial Analysis ↗

#	Model	Output Speed
1	Gemini 3 Flash	362 tok/s
2	GPT-5.4 mini	298 tok/s
3	Claude 4.5 Haiku	241 tok/s
4	Mistral Small 4	215 tok/s
5	DeepSeek V3.2	189 tok/s
6	Qwen3.6 Plus	176 tok/s
7	Gemini 3 Pro	158 tok/s
8	Llama 4 Maverick	142 tok/s
9	Claude Sonnet 4.6	128 tok/s
10	GPT-5.2	115 tok/s
11	Grok 4.20	108 tok/s
12	Claude Opus 4.6	82 tok/s

Powered by Artificial Analysis

Cost (Blended) Source: Artificial Analysis ↗

#	Model	Cost (Blended)
1	DeepSeek V3.2	0.28 $/1M
2	Gemini 3 Flash	0.3 $/1M
3	GPT-5.4 mini	0.4 $/1M
4	Qwen3.6 Plus	0.5 $/1M
5	Mistral Small 4	0.6 $/1M
6	Claude 4.5 Haiku	0.8 $/1M
7	Llama 4 Maverick	0.9 $/1M
8	Gemini 3 Pro	1.25 $/1M
9	Claude Sonnet 4.6	3 $/1M
10	GPT-5.2	3.75 $/1M
11	Grok 4.20	5 $/1M
12	Claude Opus 4.6	15 $/1M

Powered by Artificial Analysis

Image Arena Source: Artificial Analysis ↗

#	Model	Image Arena
1	GPT Image 1.5	1265 ELO
2	Gemini 3.1 Flash Image	1258 ELO
3	Gemini 3 Pro Image	1215 ELO
4	FLUX.2 [max]	1200 ELO
5	Seedream 4.0	1185 ELO
6	FLUX.2 [dev]	1164 ELO
7	Qwen Image Max	1150 ELO
8	Ideogram v2	1102 ELO
9	Midjourney v6.1	1093 ELO
10	DALL-E 3 HD	984 ELO

Powered by Artificial Analysis

Video Arena Source: Artificial Analysis ↗

#	Model	Video Arena
1	HappyHorse 1.0	1388 ELO
2	Seedance 2.0	1273 ELO
3	SkyReels V4	1244 ELO
4	Kling 3.0 1080p	1242 ELO
5	Grok Imagine Video	1229 ELO
6	PixVerse V5.6	1223 ELO
7	Runway Gen-4.5	1223 ELO
8	Veo 3	1210 ELO

Powered by Artificial Analysis

Price-Performance Source: Artificial Analysis ↗

#	Model	Price-Performance
1	DeepSeek V3.2	150 idx/$
2	Gemini 3 Flash	153.3 idx/$
3	GPT-5.4 mini	122.5 idx/$
4	Qwen3.6 Plus	100 idx/$
5	Mistral Small 4	45 idx/$
6	Claude 4.5 Haiku	46.3 idx/$
7	Llama 4 Maverick	20 idx/$
8	Gemini 3 Pro	40 idx/$
9	Claude Sonnet 4.6	17.3 idx/$
10	GPT-5.2	21.3 idx/$

Powered by Artificial Analysis