Benchmarks
主要AI模型的最新性能对比。
| # | 模型 | Intelligence Index |
|---|---|---|
| 1 | Gemini 3.1 Pro Preview | 57 |
| 2 | GPT-5.4 (xhigh) | 57 |
| 3 | Claude Opus 4.6 (max) | 53 |
| 4 | Muse Spark | 52 |
| 5 | Claude Sonnet 4.6 | 52 |
| 6 | GLM-5.1 | 51 |
| 7 | Qwen3.6 Plus | 50 |
| 8 | MiniMax-M2.7 | 50 |
| 9 | Grok 4.20 | 49 |
| 10 | G3o9 v2 | 49 |
| 11 | MiMo-V2-Pro | 49 |
| 12 | GPT-5.4 mini (xhigh) | 49 |
| 13 | Kimi K2.5 | 47 |
| 14 | Gemini 3 Flash | 46 |
| 15 | Qwen3.5 307B A17B | 45 |
| 16 | DeepSeek V3.2 | 42 |
| 17 | Gemma 4 3.1B | 39 |
| 18 | Claude 4.5 Haiku | 37 |
| 19 | NVIDIA Nemotron 3 | 36 |
| 20 | Nova SuperPro | 36 |
| 21 | Gemini 3.1 Flash Lite | 34 |
| 22 | gpt-o3s-120B (high) | 33 |
| 23 | K-EXAONE 9 | 32 |
| 24 | Mistral Small 4 | 27 |
| 25 | Solar Pro 3 | 26 |
| 26 | gpt-o3s-210B (high) | 24 |
| 27 | K2-Think V2 | 24 |
| 28 | Llama 4 Maverick | 18 |
Powered by Artificial Analysis
| # | 模型 | Coding Index |
|---|---|---|
| 1 | Claude Mythos Preview | 93.9 |
| 2 | GPT-5.3 Codex | 85 |
| 3 | Claude Opus 4.5 | 80.9 |
| 4 | Claude Opus 4.6 | 80.8 |
| 5 | Gemini 3.1 Pro | 80.6 |
| 6 | GPT-5.2 | 80 |
| 7 | Claude Sonnet 4.6 | 79.6 |
| 8 | Qwen3.6 Plus | 78.8 |
| 9 | DeepSeek V3.2 | 77.5 |
| 10 | Grok 4.20 | 76.2 |
| 11 | Gemini 3 Flash | 74.8 |
| 12 | Kimi K2.5 | 73.1 |
| 13 | GLM-5.1 | 71.4 |
| 14 | MiniMax-M2.7 | 69.8 |
| 15 | Mistral Large 3 | 67.3 |
| 16 | Llama 4 Maverick | 64.5 |
Powered by Artificial Analysis
| # | 模型 | Math Index |
|---|---|---|
| 1 | GPT-5.2 Thinking | 100 |
| 2 | DeepSeek-V3.2 | 96 |
| 3 | Gemini 3 Pro | 95 |
| 4 | GPT-5 High | 94.6 |
| 5 | Claude Opus 4.5 | 92.8 |
| 6 | GLM-5 | 92.7 |
| 7 | Gemini 3 Flash | 90.4 |
| 8 | Qwen3.6 Plus | 89.5 |
| 9 | Claude Sonnet 4.6 | 88.2 |
| 10 | Grok 4.20 | 87.1 |
| 11 | DeepSeek-R1 | 86.7 |
| 12 | Kimi K2.5 | 85.3 |
| 13 | Mistral Large 3 | 82.1 |
| 14 | Llama 4 Maverick | 78.4 |
Powered by Artificial Analysis
| # | 模型 | Output Speed |
|---|---|---|
| 1 | Gemini 3 Flash | 362 tok/s |
| 2 | GPT-5.4 mini | 298 tok/s |
| 3 | Claude 4.5 Haiku | 241 tok/s |
| 4 | Mistral Small 4 | 215 tok/s |
| 5 | DeepSeek V3.2 | 189 tok/s |
| 6 | Qwen3.6 Plus | 176 tok/s |
| 7 | Gemini 3 Pro | 158 tok/s |
| 8 | Llama 4 Maverick | 142 tok/s |
| 9 | Claude Sonnet 4.6 | 128 tok/s |
| 10 | GPT-5.2 | 115 tok/s |
| 11 | Grok 4.20 | 108 tok/s |
| 12 | Claude Opus 4.6 | 82 tok/s |
Powered by Artificial Analysis
| # | 模型 | Cost (Blended) |
|---|---|---|
| 1 | DeepSeek V3.2 | 0.28 $/1M |
| 2 | Gemini 3 Flash | 0.3 $/1M |
| 3 | GPT-5.4 mini | 0.4 $/1M |
| 4 | Qwen3.6 Plus | 0.5 $/1M |
| 5 | Mistral Small 4 | 0.6 $/1M |
| 6 | Claude 4.5 Haiku | 0.8 $/1M |
| 7 | Llama 4 Maverick | 0.9 $/1M |
| 8 | Gemini 3 Pro | 1.25 $/1M |
| 9 | Claude Sonnet 4.6 | 3 $/1M |
| 10 | GPT-5.2 | 3.75 $/1M |
| 11 | Grok 4.20 | 5 $/1M |
| 12 | Claude Opus 4.6 | 15 $/1M |
Powered by Artificial Analysis
| # | 模型 | Image Arena |
|---|---|---|
| 1 | GPT Image 1.5 | 1265 ELO |
| 2 | Gemini 3.1 Flash Image | 1258 ELO |
| 3 | Gemini 3 Pro Image | 1215 ELO |
| 4 | FLUX.2 [max] | 1200 ELO |
| 5 | Seedream 4.0 | 1185 ELO |
| 6 | FLUX.2 [dev] | 1164 ELO |
| 7 | Qwen Image Max | 1150 ELO |
| 8 | Ideogram v2 | 1102 ELO |
| 9 | Midjourney v6.1 | 1093 ELO |
| 10 | DALL-E 3 HD | 984 ELO |
Powered by Artificial Analysis
| # | 模型 | Video Arena |
|---|---|---|
| 1 | HappyHorse 1.0 | 1388 ELO |
| 2 | Seedance 2.0 | 1273 ELO |
| 3 | SkyReels V4 | 1244 ELO |
| 4 | Kling 3.0 1080p | 1242 ELO |
| 5 | Grok Imagine Video | 1229 ELO |
| 6 | PixVerse V5.6 | 1223 ELO |
| 7 | Runway Gen-4.5 | 1223 ELO |
| 8 | Veo 3 | 1210 ELO |
Powered by Artificial Analysis
| # | 模型 | Price-Performance |
|---|---|---|
| 1 | DeepSeek V3.2 | 150 idx/$ |
| 2 | Gemini 3 Flash | 153.3 idx/$ |
| 3 | GPT-5.4 mini | 122.5 idx/$ |
| 4 | Qwen3.6 Plus | 100 idx/$ |
| 5 | Mistral Small 4 | 45 idx/$ |
| 6 | Claude 4.5 Haiku | 46.3 idx/$ |
| 7 | Llama 4 Maverick | 20 idx/$ |
| 8 | Gemini 3 Pro | 40 idx/$ |
| 9 | Claude Sonnet 4.6 | 17.3 idx/$ |
| 10 | GPT-5.2 | 21.3 idx/$ |
Powered by Artificial Analysis