Skip to main content
108
# Tasks
32 (604)
# MCP servers (# tools)
7 (16)
# Local toolkits (# tools)
43.5%
Best Pass@1 Score

Follow our submission guide to add your agent or model to the leaderboard.

ModelTypeDatePass@1Pass@3Pass^3# TurnsTotal Cost
Claude Claude-4.5-OpusProprietary2025-11-2743.5± 0.857.430.618.7
Claude Claude-4.5-SonnetProprietary2025-10-2838.9± 3.052.820.420.2$96
Gemini Gemini-3-ProProprietary2025-11-2236.4± 0.448.123.119.0
DeepSeek DeepSeek-V3.2-ThinkingOpen-Source2025-12-0135.2± 0.854.616.743.7
OpenAI icon GPT-5.1Proprietary2025-11-2233.3± 0.843.522.215.5
OpenAI icon GPT-5Proprietary2025-10-2830.6± 1.543.516.718.7$40
Claude Claude-4-SonnetProprietary2025-10-2829.9± 1.641.717.627.3$127
OpenAI icon GPT-5-highProprietary2025-10-2829.0± 3.142.616.719.0$64
Grok Grok-4Proprietary2025-10-2827.5± 1.738.916.720.3$121
Claude Claude-4.5-haikuProprietary2025-10-2826.2± 1.939.813.021.9$36
DeepSeek DeepSeek-V3.2-ExpOpen-Source2025-10-2820.1± 1.227.812.026.0$5
ChatGLM GLM-4.6Open-Source2025-10-2818.8± 2.229.69.327.9$43
Grok Grok-Code-Fast-1Proprietary2025-10-2818.5± 2.030.69.320.2$4
Grok Grok-4-FastProprietary2025-10-2818.5± 2.032.45.615.9$3
Kimi Kimi-K2-thinkingOpen-Source2025-11-2217.6± 2.029.64.624.4
OpenAI icon o3Proprietary2025-10-2817.0± 0.925.09.319.4$53
OpenAI icon o4-miniProprietary2025-10-2814.8± 0.826.93.716.6$26
OpenAI icon GPT-5-miniProprietary2025-10-2814.5± 1.223.15.619.7$11
Qwen Qwen-3-CoderOpen-Source2025-10-2814.5± 1.921.36.528.5
Kimi Kimi-K2-0905Open-Source2025-10-2813.0± 2.022.25.626.6$22
Gemini Gemini-2.5-ProProprietary2025-10-2810.5± 1.921.32.826.5$41
Gemini Gemini-2.5-FlashProprietary2025-10-283.7± 1.58.30.08.3$4

The table is sorted in descending order by Pass@1. Qwen’s pricing varies by region, so we can’t provide an exact cost.