$ipbrLive LLM coding scoreboard.

Models drift. Agents battle. Math decides.

live · refreshed · 17 sources · 39 models

claude-fable-5gpt-5.5claude-opus-4.8IPBR
  • claude-opus-4.886.6
  • gpt-5.586.3
  • claude-fable-586.1

leaders now

[ idea ]
  1. 1gemini-3.1-pro-preview98.2 down 0.1 since last refresh
  2. 2claude-opus-4.795.5 down 0.4 since last refresh
  3. 3gemini-3.5-flash94.3 down 0.5 since last refresh
[ plan ]
  1. 1gemini-3.1-pro-preview91.6 down 0.6 since last refresh
  2. 2claude-fable-588.5
  3. 3claude-opus-4.888.4 up 0.6 since last refresh
[ build ]
  1. 1gpt-5.587.7 up 0.8 since last refresh
  2. 2claude-opus-4.883.2 up 0.6 since last refresh
  3. 3claude-fable-582.4
[ review ]
  1. 1gpt-5.586.5 down 0.2 since last refresh
  2. 2claude-fable-583.2
  3. 3claude-opus-4.882.9 up 0.4 since last refresh
how scoring works

Each model gets four role scores from public benchmarks. Idea measures open-ended creativity. Plan measures structured reasoning, function-calling, and multi-step decomposition. Build measures implementation skill — SWE-bench, LiveCodeBench, terminal tasks. Review measures preference judgment.

scoring

Each role score is the benchmark composite for that role, normalized to 0-100 and combined via weighted average of group scores. See the about page for the full math.

missing data

If a model is missing some metrics within a group, the group score blends from shrink-to-50 to trusting the present metrics across 60-80% group coverage. At 80% coverage and above, the present-weight mean is trusted directly.

Full math, role definitions, and source list →

gpt-5.5openai85.0 down 0.5 since last refresh86.1 down 1.7 since last refresh87.7 up 0.8 since last refresh86.5 down 0.2 since last refresh
gpt-5.5

group breakdown

BUILD89.91 / 39CRE83.712 / 39GEN91.45 / 39LM_ARENA_REVIEW_PROXY90.13 / 39OPS_long71.422 / 39OPS_precision67.127 / 39OPS_review69.023 / 39PLAN85.34 / 39

metrics

ARC_AGI_296.43 / 31ArtificialAnalysisCoding99.73 / 37ArtificialAnalysisIntelligence94.95 / 37ArtificialAnalysisReasoning90.84 / 37BlendedCost12.738 / 39ContextWindow100.02 / 36CopilotArenaOrLMArenaCode64.419 / 38GDPval93.23 / 38GPQA_HLE_Reasoning90.84 / 37GSO94.02 / 19IFBench73.518 / 37LMArenaCreativeOrOpenEnded83.712 / 39LMArenaDocument84.63 / 33LMArenaSearch95.72 / 22LMArenaText83.712 / 39LongContextRecall94.25 / 37MCPAtlas50.215 / 34OutputSpeed70.826 / 33SWEAtlasComposite96.01 / 39SWEAtlasQnA100.01 / 21SWEAtlasRefactoring93.22 / 19SWEAtlasTestWriting95.82 / 21SWEBenchPro95.013 / 35SWEBenchVerified95.016 / 37SWEComposite96.81 / 39SWERebench99.03 / 36SciCode83.17 / 37SonarBugDensity94.42 / 24SonarComposite68.86 / 39SonarFunctionalSkill52.315 / 24SonarIssueDensity58.113 / 24SonarVulnerabilityDensity96.72 / 24TTFT80.919 / 33Tau2Bench83.118 / 37TerminalBench100.02 / 39TerminalBenchHard99.63 / 37
sources arc_agiartificial_analysisgsolmarenamcp_atlasopenrouteroverridessonarsweatlas_qnasweatlas_refactoringsweatlas_test_writingswerebenchterminal_benchmissing BUILD/AALiveCodeBenchBUILD/BFCLGEN/ArtificialAnalysisMathGEN/MMLUProPLAN/BFCLSWEComposite/SWEBenchMultilingual
claude-opus-4.8anthropic91.9 up 0.5 since last refresh88.4 up 0.6 since last refresh83.2 up 0.6 since last refresh82.9 up 0.4 since last refresh
claude-opus-4.8

group breakdown

BUILD84.63 / 39CRE91.79 / 39GEN97.42 / 39LM_ARENA_REVIEW_PROXY80.04 / 39OPS_long72.518 / 39OPS_precision66.928 / 39OPS_review69.622 / 39PLAN85.53 / 39

metrics

ARC_AGI_298.12 / 31ArtificialAnalysisCoding100.02 / 37ArtificialAnalysisIntelligence100.02 / 37ArtificialAnalysisReasoning99.63 / 37BlendedCost22.135 / 39ContextWindow98.916 / 36CopilotArenaOrLMArenaCode92.57 / 38GDPval95.02 / 38GPQA_HLE_Reasoning99.63 / 37GSO92.54 / 19IFBench50.625 / 37LMArenaCreativeOrOpenEnded91.79 / 39LMArenaDocument72.25 / 33LMArenaSearch87.85 / 22LMArenaText91.79 / 39LongContextRecall70.318 / 37MCPAtlas100.01 / 34OutputSpeed74.718 / 33SWEAtlasComposite76.97 / 39SWEAtlasQnA67.59 / 21SWEAtlasRefactoring92.54 / 19SWEAtlasTestWriting65.39 / 21SWEBenchMultilingual95.04 / 33SWEBenchPro95.05 / 35SWEBenchVerified95.08 / 37SWEComposite88.39 / 39SWERebench78.415 / 36SciCode83.15 / 37SonarBugDensity34.621 / 24SonarComposite48.931 / 39SonarFunctionalSkill87.94 / 24SonarIssueDensity14.220 / 24SonarVulnerabilityDensity21.719 / 24TTFT72.231 / 33Tau2Bench90.514 / 37TerminalBench79.98 / 39TerminalBenchHard100.02 / 37
sources arc_agiartificial_analysislmarenamcp_atlasopenrouteroverridesswerebenchmissing BUILD/AALiveCodeBenchBUILD/BFCLGEN/ArtificialAnalysisMathGEN/MMLUProPLAN/BFCL
claude-fable-5anthropic90.388.582.483.2
claude-fable-5

group breakdown

BUILD85.92 / 39CRE91.78 / 39GEN96.53 / 39LM_ARENA_REVIEW_PROXY78.36 / 39OPS_long56.235 / 39OPS_precision37.439 / 39OPS_review47.336 / 39PLAN90.61 / 39

metrics

ARC_AGI_290.95 / 31ArtificialAnalysisCoding100.01 / 37ArtificialAnalysisIntelligence100.01 / 37ArtificialAnalysisReasoning100.01 / 37BlendedCost0.039 / 39ContextWindow98.913 / 36CopilotArenaOrLMArenaCode92.56 / 38GDPval95.01 / 38GPQA_HLE_Reasoning100.01 / 37GSO92.53 / 19IFBench53.824 / 37LMArenaCreativeOrOpenEnded91.78 / 39LMArenaDocument68.86 / 33LMArenaSearch87.84 / 22LMArenaText91.78 / 39LongContextRecall82.311 / 37MCPAtlas92.53 / 34OutputSpeed75.217 / 33SWEAtlasComposite76.96 / 39SWEAtlasQnA67.56 / 21SWEAtlasRefactoring92.53 / 19SWEAtlasTestWriting65.36 / 21SWEBenchMultilingual92.511 / 33SWEBenchPro95.03 / 35SWEBenchVerified95.05 / 37SWEComposite87.911 / 39SWERebench77.916 / 36SciCode100.01 / 37SonarBugDensity34.620 / 24SonarComposite48.930 / 39SonarFunctionalSkill87.93 / 24SonarIssueDensity14.219 / 24SonarVulnerabilityDensity21.718 / 24TTFT0.033 / 33Tau2Bench100.01 / 37TerminalBench94.43 / 39TerminalBenchHard100.01 / 37
sources artificial_analysisopenrouteroverridesmissing BUILD/AALiveCodeBenchBUILD/BFCLGEN/ArtificialAnalysisMathGEN/MMLUProPLAN/BFCL
claude-opus-4.7anthropic95.5 down 0.4 since last refresh84.4 down 1.8 since last refresh79.6 down 0.7 since last refresh82.3 down 1.0 since last refresh
claude-opus-4.7

group breakdown

BUILD80.75 / 39CRE99.04 / 39GEN95.04 / 39LM_ARENA_REVIEW_PROXY96.22 / 39OPS_long70.426 / 39OPS_precision66.729 / 39OPS_review69.024 / 39PLAN79.96 / 39

metrics

ARC_AGI_292.34 / 31ArtificialAnalysisCoding87.75 / 37ArtificialAnalysisIntelligence97.03 / 37ArtificialAnalysisReasoning86.66 / 37BlendedCost22.134 / 39ContextWindow98.915 / 36CopilotArenaOrLMArenaCode100.01 / 38GDPval90.24 / 38GPQA_HLE_Reasoning86.66 / 37GSO100.01 / 19IFBench41.126 / 37LMArenaCreativeOrOpenEnded99.04 / 39LMArenaDocument98.02 / 33LMArenaSearch94.43 / 22LMArenaText99.04 / 39LongContextRecall84.010 / 37MCPAtlas86.74 / 34OutputSpeed69.329 / 33SWEAtlasComposite79.95 / 39SWEAtlasQnA67.58 / 21SWEAtlasRefactoring100.01 / 19SWEAtlasTestWriting65.38 / 21SWEBenchMultilingual95.03 / 33SWEBenchPro95.04 / 35SWEBenchVerified95.07 / 37SWEComposite77.023 / 39SciCode88.44 / 37SonarBugDensity31.923 / 24SonarComposite48.732 / 39SonarFunctionalSkill94.52 / 24SonarIssueDensity7.921 / 24SonarVulnerabilityDensity16.722 / 24TTFT76.127 / 33Tau2Bench74.024 / 37TerminalBench100.01 / 39TerminalBenchHard82.56 / 37
sources arc_agiartificial_analysisgsolmarenamcp_atlasopenrouteroverridessonarsweatlas_refactoringswerebenchterminal_benchmissing BUILD/AALiveCodeBenchBUILD/BFCLGEN/ArtificialAnalysisMathGEN/MMLUProPLAN/BFCLSWEComposite/SWERebench
gpt-5.4openai70.0 up 0.1 since last refresh57.2 up 0.2 since last refresh78.166.9 up 0.1 since last refresh
gpt-5.4

group breakdown

BUILD81.94 / 39CRE76.815 / 39GEN58.920 / 39LM_ARENA_REVIEW_PROXY61.47 / 39OPS_long59.331 / 39OPS_precision61.030 / 39OPS_review66.027 / 39PLAN55.427 / 39

metrics

ARC_AGI_275.56 / 31BlendedCost67.628 / 39ContextWindow100.01 / 36CopilotArenaOrLMArenaCode43.531 / 38GDPval81.76 / 38GSO54.08 / 19LMArenaCreativeOrOpenEnded76.815 / 39LMArenaDocument67.37 / 33LMArenaSearch55.614 / 22LMArenaText76.815 / 39MCPAtlas50.214 / 34SWEAtlasComposite94.42 / 39SWEAtlasQnA92.62 / 21SWEAtlasRefactoring91.75 / 19SWEAtlasTestWriting100.01 / 21SWEBenchPro92.521 / 35SWEBenchVerified95.015 / 37SWEComposite91.84 / 39SWERebench90.05 / 36SonarBugDensity79.86 / 24SonarComposite62.57 / 39SonarFunctionalSkill78.96 / 24SonarIssueDensity0.024 / 24SonarVulnerabilityDensity100.01 / 24TerminalBench83.67 / 39
sources arc_agiartificial_analysisgsolmarenamcp_atlasopenrouteroverridessonarsweatlas_qnasweatlas_refactoringsweatlas_test_writingswebench_promissing BUILD/AALiveCodeBenchBUILD/ArtificialAnalysisCodingBUILD/BFCLBUILD/LongContextRecallBUILD/SciCodeBUILD/TerminalBenchHardGEN/ArtificialAnalysisIntelligenceGEN/ArtificialAnalysisMathGEN/GPQA_HLE_ReasoningGEN/MMLUProOPS_long/OutputSpeedOPS_long/TTFTOPS_precision/OutputSpeedOPS_precision/TTFTOPS_review/OutputSpeedOPS_review/TTFTPLAN/ArtificialAnalysisReasoningPLAN/BFCLPLAN/IFBenchPLAN/LongContextRecallPLAN/Tau2BenchPLAN/TerminalBenchHardSWEComposite/SWEBenchMultilingual
qwen3.7-maxalibaba89.5 down 0.6 since last refresh83.4 down 1.9 since last refresh77.7 down 0.3 since last refresh75.1 down 0.9 since last refresh
qwen3.7-max

group breakdown

BUILD75.87 / 39CRE93.46 / 39GEN80.78 / 39LM_ARENA_REVIEW_PROXY47.317 / 39OPS_long92.23 / 39OPS_precision91.42 / 39OPS_review91.84 / 39PLAN84.05 / 39

metrics

ARC_AGI_211.922 / 31ArtificialAnalysisCoding79.97 / 37ArtificialAnalysisIntelligence94.56 / 37ArtificialAnalysisReasoning85.47 / 37BFCL89.72 / 15BlendedCost79.120 / 39ContextWindow98.921 / 36CopilotArenaOrLMArenaCode100.02 / 38GDPval65.715 / 38GPQA_HLE_Reasoning85.47 / 37IFBench98.83 / 37LMArenaCreativeOrOpenEnded93.46 / 39LMArenaDocument44.513 / 33LMArenaText93.46 / 39LongContextRecall77.115 / 37MCPAtlas72.66 / 34MMLUPro81.68 / 28OutputSpeed91.96 / 33SWEAtlasComposite50.021 / 39SWEBenchMultilingual95.09 / 33SWEBenchPro95.015 / 35SWEBenchVerified95.018 / 37SWEComposite88.47 / 39SWERebench78.413 / 36SciCode58.016 / 37SonarBugDensity92.54 / 24SonarComposite81.14 / 39SonarFunctionalSkill69.613 / 24SonarIssueDensity92.53 / 24SonarVulnerabilityDensity77.47 / 24TTFT94.912 / 33Tau2Bench91.413 / 37TerminalBench72.611 / 39TerminalBenchHard80.37 / 37
sources artificial_analysislmarenaopenrouteroverridesmissing BUILD/AALiveCodeBenchBUILD/GSOGEN/ArtificialAnalysisMathLM_ARENA_REVIEW_PROXY/LMArenaSearchSWEAtlasComposite/SWEAtlasQnASWEAtlasComposite/SWEAtlasRefactoringSWEAtlasComposite/SWEAtlasTestWriting
gpt-5.3-codexopenai66.957.075.1 down 0.1 since last refresh65.3
gpt-5.3-codex

group breakdown

BUILD78.56 / 39CRE72.718 / 39GEN57.623 / 39LM_ARENA_REVIEW_PROXY59.78 / 39OPS_long56.334 / 39OPS_precision58.433 / 39OPS_review61.233 / 39PLAN56.324 / 39

metrics

ARC_AGI_271.77 / 31BlendedCost71.126 / 39ContextWindow78.126 / 36CopilotArenaOrLMArenaCode37.735 / 38GDPval61.122 / 38GSO53.49 / 19LMArenaCreativeOrOpenEnded72.718 / 39LMArenaDocument64.78 / 33LMArenaSearch54.715 / 22LMArenaText72.718 / 39SWEAtlasComposite83.74 / 39SWEAtlasQnA86.24 / 21SWEAtlasRefactoring85.47 / 19SWEAtlasTestWriting78.94 / 21SWEBenchPro95.012 / 35SWEBenchVerified92.524 / 37SWEComposite95.53 / 39SWERebench97.14 / 36SonarBugDensity75.38 / 24SonarComposite60.69 / 39SonarFunctionalSkill74.610 / 24SonarIssueDensity7.523 / 24SonarVulnerabilityDensity92.54 / 24TerminalBench89.55 / 39
sources artificial_analysislmarenaopenrouteroverridessonarsweatlas_test_writingswerebenchterminal_benchmissing BUILD/AALiveCodeBenchBUILD/ArtificialAnalysisCodingBUILD/BFCLBUILD/LongContextRecallBUILD/MCPAtlasBUILD/SciCodeBUILD/TerminalBenchHardGEN/ArtificialAnalysisIntelligenceGEN/ArtificialAnalysisMathGEN/GPQA_HLE_ReasoningGEN/MMLUProOPS_long/OutputSpeedOPS_long/TTFTOPS_precision/OutputSpeedOPS_precision/TTFTOPS_review/OutputSpeedOPS_review/TTFTPLAN/ArtificialAnalysisReasoningPLAN/BFCLPLAN/IFBenchPLAN/LongContextRecallPLAN/MCPAtlasPLAN/Tau2BenchPLAN/TerminalBenchHardSWEComposite/SWEBenchMultilingual
muse-sparkmeta87.6 down 0.5 since last refresh78.9 down 1.9 since last refresh75.0 down 0.9 since last refresh72.0 down 1.2 since last refresh
muse-spark

group breakdown

BUILD74.09 / 39CRE93.07 / 39GEN77.210 / 39LM_ARENA_REVIEW_PROXY46.922 / 39OPS_long85.810 / 39OPS_precision81.214 / 39OPS_review83.013 / 39PLAN79.77 / 39

metrics

AALiveCodeBench91.46 / 17ARC_AGI_221.213 / 31ArtificialAnalysisCoding71.511 / 37ArtificialAnalysisIntelligence78.613 / 37ArtificialAnalysisMath92.56 / 17ArtificialAnalysisReasoning81.39 / 37BlendedCost70.527 / 39ContextWindow92.524 / 36CopilotArenaOrLMArenaCode88.39 / 38GDPval78.110 / 38GPQA_HLE_Reasoning81.39 / 37GSO19.416 / 19IFBench86.612 / 37LMArenaCreativeOrOpenEnded93.07 / 39LMArenaDocument32.920 / 33LMArenaSearch60.913 / 22LMArenaText93.07 / 39LongContextRecall80.512 / 37MCPAtlas100.02 / 34MMLUPro86.55 / 28OutputSpeed91.07 / 33SWEAtlasComposite45.829 / 39SWEAtlasQnA44.011 / 21SWEAtlasTestWriting41.911 / 21SWEBenchMultilingual92.514 / 33SWEBenchPro100.01 / 35SWEBenchVerified88.429 / 37SWEComposite88.65 / 39SWERebench77.718 / 36SciCode72.410 / 37SonarBugDensity55.812 / 24SonarComposite58.410 / 39SonarFunctionalSkill74.78 / 24SonarIssueDensity39.517 / 24SonarVulnerabilityDensity49.510 / 24TTFT74.129 / 33Tau2Bench82.319 / 37TerminalBench63.720 / 39TerminalBenchHard65.412 / 37
sources artificial_analysislmarenamcp_atlassweatlas_qnasweatlas_test_writingswebench_promissing BUILD/BFCLPLAN/BFCLSWEAtlasComposite/SWEAtlasRefactoring
claude-opus-4.6anthropic92.2 down 0.4 since last refresh76.0 down 1.7 since last refresh73.2 down 0.7 since last refresh78.0 down 0.9 since last refresh
claude-opus-4.6

group breakdown

BUILD73.710 / 39CRE100.01 / 39GEN81.77 / 39LM_ARENA_REVIEW_PROXY100.01 / 39OPS_long71.225 / 39OPS_precision67.526 / 39OPS_review69.621 / 39PLAN73.413 / 39

metrics

ArtificialAnalysisCoding73.48 / 37ArtificialAnalysisIntelligence81.112 / 37ArtificialAnalysisReasoning77.412 / 37BlendedCost22.133 / 39ContextWindow98.914 / 36CopilotArenaOrLMArenaCode99.53 / 38GDPval75.711 / 38GPQA_HLE_Reasoning77.412 / 37GSO75.35 / 19IFBench26.532 / 37LMArenaCreativeOrOpenEnded100.01 / 39LMArenaDocument100.01 / 33LMArenaSearch100.01 / 22LMArenaText100.01 / 39LongContextRecall85.77 / 37MCPAtlas76.95 / 34OutputSpeed70.128 / 33SWEAtlasComposite67.88 / 39SWEAtlasQnA70.65 / 21SWEAtlasRefactoring65.58 / 19SWEAtlasTestWriting68.05 / 21SWEBenchMultilingual91.920 / 33SWEBenchPro95.12 / 35SWEBenchVerified94.521 / 37SWEComposite76.725 / 39SciCode74.59 / 37SonarComposite50.015 / 39TTFT77.725 / 33Tau2Bench83.917 / 37TerminalBench86.26 / 39TerminalBenchHard67.510 / 37
sources artificial_analysisgsolmarenamcp_atlasopenrouteroverridessonarsweatlas_qnasweatlas_refactoringsweatlas_test_writingswebenchswebench_proswerebenchterminal_benchmissing BUILD/AALiveCodeBenchBUILD/BFCLGEN/ARC_AGI_2GEN/ArtificialAnalysisMathGEN/MMLUProPLAN/BFCLSWEComposite/SWERebenchSonarComposite/SonarBugDensitySonarComposite/SonarFunctionalSkillSonarComposite/SonarIssueDensitySonarComposite/SonarVulnerabilityDensity
qwen3.7-plusalibaba62.472.773.169.9
qwen3.7-plus

group breakdown

BUILD71.411 / 39CRE58.226 / 39GEN65.716 / 39LM_ARENA_REVIEW_PROXY47.318 / 39OPS_long82.712 / 39OPS_precision88.07 / 39OPS_review88.29 / 39PLAN75.311 / 39

metrics

ARC_AGI_211.923 / 31ArtificialAnalysisCoding68.214 / 37ArtificialAnalysisIntelligence82.611 / 37ArtificialAnalysisReasoning71.815 / 37BFCL80.36 / 15BlendedCost88.98 / 39ContextWindow98.922 / 36CopilotArenaOrLMArenaCode67.417 / 38GDPval65.716 / 38GPQA_HLE_Reasoning71.815 / 37IFBench92.07 / 37LMArenaCreativeOrOpenEnded58.226 / 39LMArenaDocument44.514 / 33LMArenaText58.226 / 39LongContextRecall56.725 / 37MCPAtlas60.313 / 34MMLUPro83.57 / 28OutputSpeed72.122 / 33SWEAtlasComposite50.022 / 39SWEBenchMultilingual95.010 / 33SWEBenchPro95.016 / 35SWEBenchVerified95.019 / 37SWEComposite88.48 / 39SWERebench78.414 / 36SciCode40.322 / 37SonarBugDensity92.55 / 24SonarComposite81.15 / 39SonarFunctionalSkill69.614 / 24SonarIssueDensity92.54 / 24SonarVulnerabilityDensity77.48 / 24TTFT96.58 / 33Tau2Bench86.416 / 37TerminalBench73.510 / 39TerminalBenchHard69.79 / 37
sources artificial_analysisopenrouteroverridesmissing BUILD/AALiveCodeBenchBUILD/GSOGEN/ArtificialAnalysisMathLM_ARENA_REVIEW_PROXY/LMArenaSearchSWEAtlasComposite/SWEAtlasQnASWEAtlasComposite/SWEAtlasRefactoringSWEAtlasComposite/SWEAtlasTestWriting
claude-opus-4.5anthropic69.2 down 0.3 since last refresh66.3 down 1.2 since last refresh72.5 down 0.5 since last refresh64.2 down 0.6 since last refresh
claude-opus-4.5

group breakdown

BUILD74.98 / 39CRE70.419 / 39GEN69.314 / 39LM_ARENA_REVIEW_PROXY47.121 / 39OPS_long58.432 / 39OPS_precision54.035 / 39OPS_review46.137 / 39PLAN66.118 / 39

metrics

AALiveCodeBench85.27 / 17ARC_AGI_250.68 / 31ArtificialAnalysisCoding72.49 / 37ArtificialAnalysisIntelligence69.619 / 37ArtificialAnalysisMath87.88 / 17ArtificialAnalysisReasoning55.522 / 37BFCL100.01 / 15BlendedCost22.132 / 39ContextWindow0.036 / 36CopilotArenaOrLMArenaCode72.914 / 38GDPval74.312 / 38GPQA_HLE_Reasoning55.522 / 37GSO59.37 / 19IFBench39.327 / 37LMArenaCreativeOrOpenEnded70.419 / 39LMArenaDocument55.29 / 33LMArenaSearch39.117 / 22LMArenaText70.419 / 39LongContextRecall100.01 / 37MCPAtlas46.819 / 34MMLUPro98.62 / 28OutputSpeed73.719 / 33SWEAtlasComposite65.19 / 39SWEAtlasQnA67.57 / 21SWEAtlasRefactoring63.29 / 19SWEAtlasTestWriting65.37 / 21SWEBenchMultilingual95.02 / 33SWEBenchPro77.826 / 35SWEBenchVerified95.06 / 37SWEComposite84.114 / 39SWERebench82.88 / 36SciCode61.714 / 37SonarBugDensity56.911 / 24SonarComposite82.72 / 39SonarFunctionalSkill100.01 / 24SonarIssueDensity80.55 / 24SonarVulnerabilityDensity74.99 / 24TTFT78.522 / 33Tau2Bench76.522 / 37TerminalBench64.118 / 39TerminalBenchHard69.78 / 37
sources artificial_analysisbfclgsolmarenamcp_atlasopenrouteroverridessonarswebench_proswerebenchterminal_benchmissing none
kimi-k2.6moonshot76.7 down 0.6 since last refresh78.2 down 2.1 since last refresh71.2 down 1.0 since last refresh69.4 down 1.2 since last refresh
kimi-k2.6

group breakdown

BUILD69.713 / 39CRE76.716 / 39GEN77.211 / 39LM_ARENA_REVIEW_PROXY46.823 / 39OPS_long74.817 / 39OPS_precision80.215 / 39OPS_review75.918 / 39PLAN78.68 / 39

metrics

ArtificialAnalysisCoding70.212 / 37ArtificialAnalysisIntelligence84.79 / 37ArtificialAnalysisReasoning78.811 / 37BlendedCost81.817 / 39ContextWindow55.529 / 36CopilotArenaOrLMArenaCode91.88 / 38GDPval61.521 / 38GPQA_HLE_Reasoning78.811 / 37IFBench86.811 / 37LMArenaCreativeOrOpenEnded76.716 / 39LMArenaDocument43.518 / 33LMArenaText76.716 / 39LongContextRecall80.513 / 37MCPAtlas68.59 / 34OutputSpeed70.227 / 33SWEAtlasComposite50.019 / 39SWEBenchMultilingual95.08 / 33SWEBenchPro95.011 / 35SWEBenchVerified95.014 / 37SWEComposite82.818 / 39SWERebench64.526 / 36SciCode83.16 / 37SonarComposite50.025 / 39TTFT98.55 / 33Tau2Bench94.77 / 37TerminalBench68.113 / 39TerminalBenchHard61.113 / 37
sources artificial_analysislmarenaopenrouteroverridesswerebenchmissing BUILD/AALiveCodeBenchBUILD/BFCLBUILD/GSOGEN/ARC_AGI_2GEN/ArtificialAnalysisMathGEN/MMLUProLM_ARENA_REVIEW_PROXY/LMArenaSearchPLAN/BFCLSWEAtlasComposite/SWEAtlasQnASWEAtlasComposite/SWEAtlasRefactoringSWEAtlasComposite/SWEAtlasTestWritingSonarComposite/SonarBugDensitySonarComposite/SonarFunctionalSkillSonarComposite/SonarIssueDensitySonarComposite/SonarVulnerabilityDensity
gemini-3.1-pro-previewgoogle98.2 down 0.1 since last refresh91.6 down 0.6 since last refresh71.2 down 0.2 since last refresh74.4 down 0.4 since last refresh
gemini-3.1-pro-preview

group breakdown

BUILD68.515 / 39CRE99.63 / 39GEN98.51 / 39LM_ARENA_REVIEW_PROXY52.49 / 39OPS_long86.09 / 39OPS_precision81.812 / 39OPS_review84.412 / 39PLAN88.32 / 39

metrics

ARC_AGI_2100.01 / 31ArtificialAnalysisCoding97.44 / 37ArtificialAnalysisIntelligence96.74 / 37ArtificialAnalysisReasoning100.02 / 37BFCL77.010 / 15BlendedCost71.624 / 39ContextWindow100.03 / 36CopilotArenaOrLMArenaCode65.918 / 38GDPval42.729 / 38GPQA_HLE_Reasoning100.02 / 37GSO51.310 / 19IFBench89.88 / 37LMArenaCreativeOrOpenEnded99.63 / 39LMArenaDocument32.021 / 33LMArenaSearch72.87 / 22LMArenaText99.63 / 39LongContextRecall95.94 / 37MCPAtlas49.117 / 34OutputSpeed89.49 / 33SWEAtlasComposite34.630 / 39SWEAtlasQnA12.715 / 21SWEAtlasTestWriting36.014 / 21SWEBenchMultilingual36.024 / 33SWEBenchPro78.425 / 35SWEBenchVerified95.012 / 37SWEComposite80.519 / 39SWERebench87.96 / 36SciCode100.02 / 37SonarComposite50.022 / 39TTFT73.230 / 33Tau2Bench93.99 / 37TerminalBench92.54 / 39TerminalBenchHard88.94 / 37
sources arc_agiartificial_analysisgsolmarenamcp_atlasopenrouteroverridessonarsweatlas_qnasweatlas_test_writingswebench_proswerebenchterminal_benchmissing BUILD/AALiveCodeBenchGEN/ArtificialAnalysisMathGEN/MMLUProSWEAtlasComposite/SWEAtlasRefactoringSonarComposite/SonarBugDensitySonarComposite/SonarFunctionalSkillSonarComposite/SonarIssueDensitySonarComposite/SonarVulnerabilityDensity
glm-5.1zai82.3 down 0.6 since last refresh71.5 down 1.8 since last refresh70.9 down 0.8 since last refresh65.9 down 1.0 since last refresh
glm-5.1

group breakdown

BUILD70.612 / 39CRE88.610 / 39GEN72.213 / 39LM_ARENA_REVIEW_PROXY47.320 / 39OPS_long71.323 / 39OPS_precision74.823 / 39OPS_review64.929 / 39PLAN70.617 / 39

metrics

ArtificialAnalysisCoding58.220 / 37ArtificialAnalysisIntelligence75.716 / 37ArtificialAnalysisReasoning55.123 / 37BlendedCost80.918 / 39ContextWindow0.935 / 36CopilotArenaOrLMArenaCode97.44 / 38GDPval66.613 / 38GPQA_HLE_Reasoning55.123 / 37IFBench87.510 / 37LMArenaCreativeOrOpenEnded88.610 / 39LMArenaDocument44.516 / 33LMArenaText88.610 / 39LongContextRecall43.033 / 37MCPAtlas71.77 / 34OutputSpeed78.315 / 33SWEAtlasComposite50.028 / 39SWEBenchMultilingual92.519 / 33SWEBenchPro95.019 / 35SWEBenchVerified92.526 / 37SWEComposite96.42 / 39SWERebench100.02 / 36SciCode31.228 / 37SonarComposite50.029 / 39TTFT99.92 / 33Tau2Bench99.74 / 37TerminalBench63.321 / 39TerminalBenchHard59.018 / 37
sources artificial_analysislmarenamcp_atlasopenrouteroverridesswerebenchmissing BUILD/AALiveCodeBenchBUILD/BFCLBUILD/GSOGEN/ARC_AGI_2GEN/ArtificialAnalysisMathGEN/MMLUProLM_ARENA_REVIEW_PROXY/LMArenaSearchPLAN/BFCLSWEAtlasComposite/SWEAtlasQnASWEAtlasComposite/SWEAtlasRefactoringSWEAtlasComposite/SWEAtlasTestWritingSonarComposite/SonarBugDensitySonarComposite/SonarFunctionalSkillSonarComposite/SonarIssueDensitySonarComposite/SonarVulnerabilityDensity
qwen3.6-plusalibaba61.1 down 0.5 since last refresh68.2 down 1.7 since last refresh70.9 up 1.5 since last refresh67.7
qwen3.6-plus

group breakdown

BUILD69.214 / 39CRE59.723 / 39GEN58.521 / 39LM_ARENA_REVIEW_PROXY47.316 / 39OPS_long82.213 / 39OPS_precision87.09 / 39OPS_review87.410 / 39PLAN71.916 / 39

metrics

ARC_AGI_211.921 / 31ArtificialAnalysisCoding56.521 / 37ArtificialAnalysisIntelligence70.617 / 37ArtificialAnalysisReasoning53.324 / 37BlendedCost87.09 / 39ContextWindow98.920 / 36CopilotArenaOrLMArenaCode70.416 / 38GDPval65.714 / 38GPQA_HLE_Reasoning53.324 / 37IFBench84.614 / 37LMArenaCreativeOrOpenEnded59.723 / 39LMArenaDocument44.512 / 33LMArenaText59.723 / 39LongContextRecall80.514 / 37MCPAtlas63.812 / 34MMLUPro83.56 / 28OutputSpeed72.121 / 33SWEAtlasComposite50.020 / 39SWEBenchMultilingual92.516 / 33SWEBenchPro95.014 / 35SWEBenchVerified95.017 / 37SWEComposite88.110 / 39SWERebench78.412 / 36SciCode14.732 / 37SonarBugDensity92.53 / 24SonarComposite81.13 / 39SonarFunctionalSkill69.612 / 24SonarIssueDensity92.52 / 24SonarVulnerabilityDensity77.46 / 24TTFT94.713 / 33Tau2Bench99.73 / 37TerminalBench60.523 / 39TerminalBenchHard61.114 / 37
sources artificial_analysislmarenaopenrouteroverridesmissing BUILD/AALiveCodeBenchBUILD/BFCLBUILD/GSOGEN/ArtificialAnalysisMathLM_ARENA_REVIEW_PROXY/LMArenaSearchPLAN/BFCLSWEAtlasComposite/SWEAtlasQnASWEAtlasComposite/SWEAtlasRefactoringSWEAtlasComposite/SWEAtlasTestWriting
gemini-3.5-flashgoogle94.3 down 0.5 since last refresh78.0 down 1.9 since last refresh69.0 up 3.7 since last refresh65.4 up 0.8 since last refresh
gemini-3.5-flash

group breakdown

BUILD66.916 / 39CRE100.02 / 39GEN83.06 / 39LM_ARENA_REVIEW_PROXY35.332 / 39OPS_long92.14 / 39OPS_precision86.710 / 39OPS_review88.98 / 39PLAN73.414 / 39

metrics

AALiveCodeBench91.45 / 17ARC_AGI_221.212 / 31ArtificialAnalysisCoding59.818 / 37ArtificialAnalysisIntelligence88.07 / 37ArtificialAnalysisMath92.55 / 17ArtificialAnalysisReasoning88.55 / 37BlendedCost74.121 / 39ContextWindow100.09 / 36CopilotArenaOrLMArenaCode87.610 / 38GDPval79.79 / 38GPQA_HLE_Reasoning88.55 / 37GSO19.415 / 19IFBench83.015 / 37LMArenaCreativeOrOpenEnded100.02 / 39LMArenaDocument9.731 / 33LMArenaSearch60.912 / 22LMArenaText100.02 / 39LongContextRecall87.46 / 37MCPAtlas18.929 / 34MMLUPro86.54 / 28OutputSpeed98.24 / 33SWEAtlasComposite50.017 / 39SWEBenchMultilingual92.513 / 33SWEBenchPro95.07 / 35SWEBenchVerified88.428 / 37SWEComposite86.812 / 39SWERebench77.717 / 36SciCode80.48 / 37SonarComposite50.023 / 39TTFT78.423 / 33Tau2Bench93.910 / 37TerminalBench63.719 / 39TerminalBenchHard48.322 / 37
sources arc_agiartificial_analysislmarenamcp_atlasopenrouteroverridesmissing BUILD/BFCLPLAN/BFCLSWEAtlasComposite/SWEAtlasQnASWEAtlasComposite/SWEAtlasRefactoringSWEAtlasComposite/SWEAtlasTestWritingSonarComposite/SonarBugDensitySonarComposite/SonarFunctionalSkillSonarComposite/SonarIssueDensitySonarComposite/SonarVulnerabilityDensity
deepseek-v4-prodeepseek70.3 down 0.6 since last refresh75.6 down 2.0 since last refresh68.4 down 2.2 since last refresh68.6 down 1.2 since last refresh
deepseek-v4-pro

group breakdown

BUILD65.818 / 39CRE67.520 / 39GEN72.612 / 39LM_ARENA_REVIEW_PROXY50.010 / 39OPS_long84.011 / 39OPS_precision89.35 / 39OPS_review89.56 / 39PLAN75.610 / 39

metrics

ArtificialAnalysisCoding71.510 / 37ArtificialAnalysisIntelligence76.115 / 37ArtificialAnalysisReasoning74.313 / 37BlendedCost89.56 / 39ContextWindow100.05 / 36CopilotArenaOrLMArenaCode71.615 / 38GPQA_HLE_Reasoning74.313 / 37IFBench88.09 / 37LMArenaCreativeOrOpenEnded67.520 / 39LMArenaText67.520 / 39LongContextRecall63.519 / 37MCPAtlas64.110 / 34MMLUPro69.111 / 28OutputSpeed73.620 / 33SWEAtlasComposite50.014 / 39SWEBenchMultilingual95.06 / 33SWEBenchPro95.06 / 35SWEBenchVerified95.011 / 37SWEComposite77.024 / 39SciCode64.413 / 37SonarComposite50.017 / 39TTFT97.96 / 33Tau2Bench95.55 / 37TerminalBench63.022 / 39TerminalBenchHard67.511 / 37
sources artificial_analysislmarenaopenrouteroverridesmissing BUILD/AALiveCodeBenchBUILD/BFCLBUILD/GDPvalBUILD/GSOGEN/ARC_AGI_2GEN/ArtificialAnalysisMathLM_ARENA_REVIEW_PROXY/LMArenaDocumentLM_ARENA_REVIEW_PROXY/LMArenaSearchPLAN/BFCLSWEAtlasComposite/SWEAtlasQnASWEAtlasComposite/SWEAtlasRefactoringSWEAtlasComposite/SWEAtlasTestWritingSWEComposite/SWERebenchSonarComposite/SonarBugDensitySonarComposite/SonarFunctionalSkillSonarComposite/SonarIssueDensitySonarComposite/SonarVulnerabilityDensity
gemini-3-progoogle88.1 down 0.5 since last refresh72.6 down 1.8 since last refresh64.5 down 0.7 since last refresh63.1 down 1.1 since last refresh
gemini-3-pro

group breakdown

BUILD64.820 / 39CRE97.95 / 39GEN77.59 / 39LM_ARENA_REVIEW_PROXY45.224 / 39OPS_long52.236 / 39OPS_precision54.334 / 39OPS_review54.335 / 39PLAN72.015 / 39

metrics

AALiveCodeBench100.01 / 17ARC_AGI_241.79 / 31ArtificialAnalysisCoding68.213 / 37ArtificialAnalysisIntelligence64.923 / 37ArtificialAnalysisMath97.53 / 17ArtificialAnalysisReasoning80.710 / 37BFCL81.74 / 15BlendedCost71.623 / 39CopilotArenaOrLMArenaCode62.222 / 38GDPval29.233 / 38GPQA_HLE_Reasoning80.710 / 37GSO40.711 / 19IFBench72.119 / 37LMArenaCreativeOrOpenEnded97.95 / 39LMArenaDocument24.927 / 33LMArenaSearch65.49 / 22LMArenaText97.95 / 39LongContextRecall85.79 / 37MCPAtlas49.018 / 34MMLUPro100.01 / 28SWEAtlasComposite50.016 / 39SWEBenchMultilingual33.525 / 33SWEBenchPro70.428 / 35SWEBenchVerified100.01 / 37SWEComposite73.526 / 39SWERebench76.220 / 36SciCode97.03 / 37SonarComposite50.021 / 39Tau2Bench69.825 / 37TerminalBench74.69 / 39TerminalBenchHard54.720 / 37
sources arc_agiartificial_analysisbfclgsolmarenamcp_atlasoverridessonarswebenchswebench_proswerebenchterminal_benchmissing OPS_long/ContextWindowOPS_long/OutputSpeedOPS_long/TTFTOPS_precision/ContextWindowOPS_precision/OutputSpeedOPS_precision/TTFTOPS_review/ContextWindowOPS_review/OutputSpeedOPS_review/TTFTSWEAtlasComposite/SWEAtlasQnASWEAtlasComposite/SWEAtlasRefactoringSWEAtlasComposite/SWEAtlasTestWritingSonarComposite/SonarBugDensitySonarComposite/SonarFunctionalSkillSonarComposite/SonarIssueDensitySonarComposite/SonarVulnerabilityDensity
minimax-m3minimax41.671.964.065.3
minimax-m3

group breakdown

BUILD61.522 / 39CRE29.435 / 39GEN61.319 / 39LM_ARENA_REVIEW_PROXY40.025 / 39OPS_long62.230 / 39OPS_precision76.119 / 39OPS_review76.817 / 39PLAN78.39 / 39

metrics

ARC_AGI_211.920 / 31ArtificialAnalysisCoding58.219 / 37ArtificialAnalysisIntelligence87.68 / 37ArtificialAnalysisReasoning84.68 / 37BlendedCost89.85 / 39ContextWindow100.010 / 36CopilotArenaOrLMArenaCode48.428 / 38GDPval61.620 / 38GPQA_HLE_Reasoning84.68 / 37IFBench100.01 / 37LMArenaCreativeOrOpenEnded29.435 / 39LMArenaDocument30.022 / 33LMArenaText29.435 / 39LongContextRecall100.02 / 37MCPAtlas64.111 / 34MMLUPro68.015 / 28OutputSpeed36.032 / 33SWEAtlasComposite14.237 / 39SWEAtlasQnA10.418 / 21SWEAtlasRefactoring22.115 / 19SWEAtlasTestWriting7.520 / 21SWEBenchMultilingual92.515 / 33SWEBenchPro95.010 / 35SWEBenchVerified95.013 / 37SWEComposite86.813 / 39SWERebench75.221 / 36SciCode39.823 / 37SonarBugDensity62.910 / 24SonarComposite51.714 / 39SonarFunctionalSkill40.317 / 24SonarIssueDensity64.111 / 24SonarVulnerabilityDensity46.411 / 24TTFT92.314 / 33Tau2Bench74.823 / 37TerminalBench67.015 / 39TerminalBenchHard56.819 / 37
sources artificial_analysislmarenaopenrouteroverridesmissing BUILD/AALiveCodeBenchBUILD/BFCLBUILD/GSOGEN/ArtificialAnalysisMathLM_ARENA_REVIEW_PROXY/LMArenaSearchPLAN/BFCL
claude-sonnet-4.6anthropic71.4 down 0.3 since last refresh58.9 down 1.6 since last refresh64.0 down 0.7 since last refresh64.1 down 0.9 since last refresh
claude-sonnet-4.6

group breakdown

BUILD66.017 / 39CRE75.117 / 39GEN65.517 / 39LM_ARENA_REVIEW_PROXY79.85 / 39OPS_long64.728 / 39OPS_precision51.736 / 39OPS_review61.331 / 39PLAN55.526 / 39

metrics

AALiveCodeBench31.311 / 17ARC_AGI_210.624 / 31ArtificialAnalysisCoding82.56 / 37ArtificialAnalysisIntelligence76.814 / 37ArtificialAnalysisMath75.913 / 17ArtificialAnalysisReasoning60.318 / 37BFCL80.18 / 15BlendedCost62.531 / 39ContextWindow98.919 / 36CopilotArenaOrLMArenaCode93.85 / 38GDPval80.28 / 38GPQA_HLE_Reasoning60.318 / 37GSO30.712 / 19IFBench35.729 / 37LMArenaCreativeOrOpenEnded75.117 / 39LMArenaDocument83.94 / 33LMArenaSearch75.76 / 22LMArenaText75.117 / 39LongContextRecall85.78 / 37MCPAtlas45.620 / 34MMLUPro71.910 / 28OutputSpeed78.414 / 33SWEAtlasComposite55.010 / 39SWEAtlasQnA64.510 / 21SWEAtlasRefactoring55.310 / 19SWEAtlasTestWriting45.010 / 21SWEBenchMultilingual95.05 / 33SWEBenchPro68.130 / 35SWEBenchVerified95.09 / 37SWEComposite78.120 / 39SWERebench76.319 / 36SciCode47.318 / 37SonarBugDensity52.714 / 24SonarComposite55.512 / 39SonarFunctionalSkill86.25 / 24SonarIssueDensity33.918 / 24SonarVulnerabilityDensity13.123 / 24TTFT2.532 / 33Tau2Bench37.529 / 37TerminalBench48.025 / 39TerminalBenchHard86.85 / 37
sources arc_agiartificial_analysislmarenamcp_atlasopenrouteroverridessonarsweatlas_qnasweatlas_refactoringsweatlas_test_writingswerebenchterminal_benchmissing none
glm-5zai63.8 down 0.5 since last refresh59.9 down 1.6 since last refresh63.1 down 0.8 since last refresh59.3 down 1.0 since last refresh
glm-5

group breakdown

BUILD62.121 / 39CRE67.021 / 39GEN55.124 / 39LM_ARENA_REVIEW_PROXY47.319 / 39OPS_long71.921 / 39OPS_precision75.621 / 39OPS_review65.828 / 39PLAN60.920 / 39

metrics

ARC_AGI_25.126 / 31ArtificialAnalysisCoding60.817 / 37ArtificialAnalysisIntelligence69.918 / 37ArtificialAnalysisReasoning44.229 / 37BlendedCost85.014 / 39ContextWindow0.934 / 36CopilotArenaOrLMArenaCode61.224 / 38GDPval65.717 / 38GPQA_HLE_Reasoning44.229 / 37IFBench77.116 / 37LMArenaCreativeOrOpenEnded67.021 / 39LMArenaDocument44.515 / 33LMArenaText67.021 / 39LongContextRecall48.130 / 37MCPAtlas39.421 / 34OutputSpeed78.713 / 33SWEAtlasComposite31.731 / 39SWEAtlasQnA33.212 / 21SWEAtlasRefactoring31.411 / 19SWEAtlasTestWriting30.815 / 21SWEBenchMultilingual51.222 / 33SWEBenchPro92.522 / 35SWEBenchVerified84.031 / 37SWEComposite83.517 / 39SWERebench83.57 / 36SciCode44.120 / 37SonarBugDensity100.01 / 24SonarComposite86.61 / 39SonarFunctionalSkill73.111 / 24SonarIssueDensity100.01 / 24SonarVulnerabilityDensity82.25 / 24TTFT99.73 / 33Tau2Bench100.02 / 37TerminalBench46.326 / 39TerminalBenchHard59.017 / 37
sources arc_agiartificial_analysislmarenaopenrouteroverridessonarsweatlas_qnasweatlas_refactoringsweatlas_test_writingswebenchswerebenchterminal_benchmissing BUILD/AALiveCodeBenchBUILD/BFCLBUILD/GSOGEN/ArtificialAnalysisMathGEN/MMLUProLM_ARENA_REVIEW_PROXY/LMArenaSearchPLAN/BFCL
mimo-v2.5-proxiaomi75.4 down 0.4 since last refresh71.1 down 1.7 since last refresh63.1 down 0.8 since last refresh63.0 down 1.0 since last refresh
mimo-v2.5-pro

group breakdown

BUILD60.323 / 39CRE80.813 / 39GEN65.018 / 39LM_ARENA_REVIEW_PROXY37.629 / 39OPS_long72.519 / 39OPS_precision81.513 / 39OPS_review82.314 / 39PLAN73.612 / 39

metrics

ARC_AGI_220.116 / 31ArtificialAnalysisCoding65.015 / 37ArtificialAnalysisIntelligence84.410 / 37ArtificialAnalysisReasoning66.017 / 37BlendedCost89.57 / 39ContextWindow100.012 / 36CopilotArenaOrLMArenaCode74.613 / 38GDPval60.927 / 38GPQA_HLE_Reasoning66.017 / 37IFBench97.04 / 37LMArenaCreativeOrOpenEnded80.813 / 39LMArenaDocument25.226 / 33LMArenaText80.813 / 39LongContextRecall99.33 / 37MCPAtlas27.626 / 34MMLUPro5.027 / 28OutputSpeed54.930 / 33SWEAtlasComposite22.033 / 39SWEAtlasQnA17.314 / 21SWEAtlasRefactoring25.813 / 19SWEAtlasTestWriting21.817 / 21SWEBenchMultilingual92.518 / 33SWEBenchPro95.018 / 35SWEBenchVerified95.020 / 37SWEComposite84.115 / 39SWERebench68.324 / 36SciCode65.512 / 37SonarBugDensity45.217 / 24SonarComposite37.035 / 39SonarFunctionalSkill12.721 / 24SonarIssueDensity65.610 / 24SonarVulnerabilityDensity43.414 / 24TTFT91.915 / 33Tau2Bench89.715 / 37TerminalBench70.612 / 39TerminalBenchHard59.016 / 37
sources artificial_analysislmarenaopenrouteroverridesmissing BUILD/AALiveCodeBenchBUILD/BFCLBUILD/GSOGEN/ArtificialAnalysisMathLM_ARENA_REVIEW_PROXY/LMArenaSearchPLAN/BFCL
gpt-5.2openai41.5 down 0.4 since last refresh48.3 down 1.8 since last refresh62.9 down 0.8 since last refresh52.0 down 1.1 since last refresh
gpt-5.2

group breakdown

BUILD64.819 / 39CRE36.132 / 39GEN48.629 / 39LM_ARENA_REVIEW_PROXY33.633 / 39OPS_long56.333 / 39OPS_precision58.432 / 39OPS_review61.232 / 39PLAN46.631 / 39

metrics

AALiveCodeBench93.63 / 17ARC_AGI_20.031 / 31ArtificialAnalysisCoding60.816 / 37ArtificialAnalysisIntelligence58.425 / 37ArtificialAnalysisMath99.72 / 17ArtificialAnalysisReasoning48.326 / 37BFCL0.015 / 15BlendedCost71.125 / 39ContextWindow78.125 / 36CopilotArenaOrLMArenaCode49.527 / 38GDPval59.328 / 38GPQA_HLE_Reasoning48.326 / 37GSO64.76 / 19IFBench58.423 / 37LMArenaCreativeOrOpenEnded36.132 / 39LMArenaDocument0.033 / 33LMArenaSearch67.18 / 22LMArenaText36.132 / 39LongContextRecall48.129 / 37MMLUPro57.520 / 28SWEAtlasComposite87.83 / 39SWEAtlasQnA86.23 / 21SWEAtlasRefactoring85.46 / 19SWEAtlasTestWriting92.53 / 21SWEBenchMultilingual0.033 / 33SWEBenchPro32.034 / 35SWEBenchVerified84.030 / 37SWEComposite63.831 / 39SWERebench100.01 / 36SciCode44.119 / 37SonarBugDensity75.37 / 24SonarComposite60.68 / 39SonarFunctionalSkill74.69 / 24SonarIssueDensity7.522 / 24SonarVulnerabilityDensity92.53 / 24Tau2Bench33.332 / 37TerminalBench67.114 / 39TerminalBenchHard59.015 / 37
sources arc_agiartificial_analysisbfclgsolmarenamcp_atlasopenrouteroverridessonarswebenchswebench_proswerebenchterminal_benchmissing BUILD/MCPAtlasOPS_long/OutputSpeedOPS_long/TTFTOPS_precision/OutputSpeedOPS_precision/TTFTOPS_review/OutputSpeedOPS_review/TTFTPLAN/MCPAtlas
deepseek-v4-flashdeepseek56.6 down 0.6 since last refresh65.5 down 1.8 since last refresh62.0 down 0.8 since last refresh62.2 down 1.1 since last refresh
deepseek-v4-flash

group breakdown

BUILD58.524 / 39CRE51.327 / 39GEN58.422 / 39LM_ARENA_REVIEW_PROXY47.314 / 39OPS_long91.45 / 39OPS_precision95.01 / 39OPS_review95.11 / 39PLAN65.919 / 39

metrics

ArtificialAnalysisCoding42.927 / 37ArtificialAnalysisIntelligence58.026 / 37ArtificialAnalysisReasoning68.116 / 37BlendedCost100.01 / 39ContextWindow100.04 / 36CopilotArenaOrLMArenaCode85.611 / 38GDPval60.923 / 38GPQA_HLE_Reasoning68.116 / 37IFBench95.25 / 37LMArenaCreativeOrOpenEnded51.327 / 39LMArenaDocument44.510 / 33LMArenaText51.327 / 39LongContextRecall46.431 / 37MCPAtlas37.922 / 34MMLUPro61.918 / 28OutputSpeed84.810 / 33SWEAtlasComposite50.013 / 39SWEBenchMultilingual59.121 / 33SWEBenchPro91.623 / 35SWEBenchVerified95.010 / 37SWEComposite77.222 / 39SWERebench62.327 / 36SciCode37.125 / 37SonarComposite50.016 / 39TTFT98.94 / 33Tau2Bench92.212 / 37TerminalBench53.024 / 39TerminalBenchHard37.627 / 37
sources artificial_analysislmarenaopenrouteroverridesmissing BUILD/AALiveCodeBenchBUILD/BFCLBUILD/GSOGEN/ARC_AGI_2GEN/ArtificialAnalysisMathLM_ARENA_REVIEW_PROXY/LMArenaSearchPLAN/BFCLSWEAtlasComposite/SWEAtlasQnASWEAtlasComposite/SWEAtlasRefactoringSWEAtlasComposite/SWEAtlasTestWritingSonarComposite/SonarBugDensitySonarComposite/SonarFunctionalSkillSonarComposite/SonarIssueDensitySonarComposite/SonarVulnerabilityDensity
mimo-v2.5xiaomi52.4 down 0.4 since last refresh56.4 down 1.6 since last refresh58.4 down 0.8 since last refresh55.8 down 1.0 since last refresh
mimo-v2.5

group breakdown

BUILD55.526 / 39CRE50.028 / 39GEN47.830 / 39LM_ARENA_REVIEW_PROXY37.628 / 39OPS_long87.47 / 39OPS_precision91.03 / 39OPS_review91.93 / 39PLAN57.122 / 39

metrics

ARC_AGI_220.115 / 31ArtificialAnalysisCoding54.023 / 37ArtificialAnalysisIntelligence67.021 / 37ArtificialAnalysisReasoning46.028 / 37BlendedCost99.22 / 39ContextWindow100.011 / 36CopilotArenaOrLMArenaCode62.821 / 38GDPval60.926 / 38GPQA_HLE_Reasoning46.028 / 37IFBench63.522 / 37LMArenaCreativeOrOpenEnded50.028 / 39LMArenaDocument25.225 / 33LMArenaText50.028 / 39LongContextRecall44.732 / 37MCPAtlas27.625 / 34MMLUPro5.026 / 28OutputSpeed80.311 / 33SWEAtlasComposite22.032 / 39SWEAtlasQnA17.313 / 21SWEAtlasRefactoring25.812 / 19SWEAtlasTestWriting21.816 / 21SWEBenchMultilingual92.517 / 33SWEBenchPro95.017 / 35SWEBenchVerified92.525 / 37SWEComposite83.716 / 39SWERebench68.323 / 36SciCode27.529 / 37SonarBugDensity45.216 / 24SonarComposite37.034 / 39SonarFunctionalSkill12.720 / 24SonarIssueDensity65.69 / 24SonarVulnerabilityDensity43.413 / 24TTFT91.816 / 33Tau2Bench79.821 / 37TerminalBench66.716 / 39TerminalBenchHard54.721 / 37
sources artificial_analysislmarenaopenrouteroverridesmissing BUILD/AALiveCodeBenchBUILD/BFCLBUILD/GSOGEN/ArtificialAnalysisMathLM_ARENA_REVIEW_PROXY/LMArenaSearchPLAN/BFCL
minimax-m2.7minimax36.4 down 0.5 since last refresh54.7 down 1.8 since last refresh57.6 down 0.9 since last refresh53.4 down 1.2 since last refresh
minimax-m2.7

group breakdown

BUILD56.125 / 39CRE25.836 / 39GEN48.928 / 39LM_ARENA_REVIEW_PROXY37.627 / 39OPS_long71.224 / 39OPS_precision75.522 / 39OPS_review66.226 / 39PLAN55.625 / 39

metrics

ARC_AGI_211.919 / 31ArtificialAnalysisCoding53.324 / 37ArtificialAnalysisIntelligence69.220 / 37ArtificialAnalysisReasoning56.421 / 37BlendedCost90.74 / 39ContextWindow3.131 / 36CopilotArenaOrLMArenaCode48.129 / 38GDPval62.319 / 38GPQA_HLE_Reasoning56.421 / 37IFBench86.113 / 37LMArenaCreativeOrOpenEnded25.836 / 39LMArenaDocument25.224 / 33LMArenaText25.836 / 39LongContextRecall75.416 / 37MCPAtlas27.624 / 34MMLUPro68.014 / 28OutputSpeed77.116 / 33SWEAtlasComposite14.236 / 39SWEAtlasQnA10.417 / 21SWEAtlasRefactoring22.114 / 19SWEAtlasTestWriting7.519 / 21SWEBenchMultilingual95.07 / 33SWEBenchPro95.09 / 35SWEBenchVerified92.523 / 37SWEComposite88.56 / 39SWERebench79.611 / 36SciCode48.317 / 37SonarBugDensity65.29 / 24SonarComposite52.013 / 39SonarFunctionalSkill38.518 / 24SonarIssueDensity66.68 / 24SonarVulnerabilityDensity45.812 / 24TTFT96.49 / 33Tau2Bench63.226 / 37TerminalBench34.230 / 39TerminalBenchHard48.323 / 37
sources artificial_analysislmarenaopenrouteroverridessonarswerebenchterminal_benchmissing BUILD/AALiveCodeBenchBUILD/BFCLBUILD/GSOGEN/ArtificialAnalysisMathLM_ARENA_REVIEW_PROXY/LMArenaSearchPLAN/BFCL
gemini-3-flashgoogle81.1 down 0.4 since last refresh65.6 down 1.9 since last refresh55.2 down 0.8 since last refresh54.7 down 1.2 since last refresh
gemini-3-flash

group breakdown

BUILD51.428 / 39CRE85.711 / 39GEN67.915 / 39LM_ARENA_REVIEW_PROXY32.734 / 39OPS_long94.31 / 39OPS_precision90.64 / 39OPS_review92.22 / 39PLAN60.521 / 39

metrics

AALiveCodeBench98.72 / 17ARC_AGI_216.217 / 31ArtificialAnalysisCoding55.622 / 37ArtificialAnalysisIntelligence57.627 / 37ArtificialAnalysisMath100.01 / 17ArtificialAnalysisReasoning73.914 / 37BlendedCost83.416 / 39ContextWindow100.08 / 36CopilotArenaOrLMArenaCode61.723 / 38GDPval31.331 / 38GPQA_HLE_Reasoning73.914 / 37GSO14.017 / 19IFBench92.06 / 37LMArenaCreativeOrOpenEnded85.711 / 39LMArenaDocument2.632 / 33LMArenaSearch62.910 / 22LMArenaText85.711 / 39LongContextRecall63.520 / 37MCPAtlas13.430 / 34MMLUPro92.93 / 28OutputSpeed98.43 / 33SWEAtlasComposite11.438 / 39SWEAtlasQnA0.021 / 21SWEAtlasRefactoring0.019 / 19SWEAtlasTestWriting38.113 / 21SWEBenchMultilingual100.01 / 33SWEBenchPro45.533 / 35SWEBenchVerified95.23 / 37SWEComposite73.227 / 39SWERebench82.69 / 36SciCode67.611 / 37SonarComposite50.020 / 39TTFT84.117 / 33Tau2Bench50.727 / 37TerminalBench66.117 / 39TerminalBenchHard46.224 / 37
sources arc_agiartificial_analysisgsolmarenamcp_atlasopenrouteroverridessonarsweatlas_qnasweatlas_refactoringsweatlas_test_writingswebenchswebench_proswerebenchterminal_benchmissing BUILD/BFCLPLAN/BFCLSonarComposite/SonarBugDensitySonarComposite/SonarFunctionalSkillSonarComposite/SonarIssueDensitySonarComposite/SonarVulnerabilityDensity
claude-sonnet-4.5anthropic57.0 down 0.2 since last refresh44.6 down 1.1 since last refresh53.9 down 0.4 since last refresh44.8 down 0.7 since last refresh
claude-sonnet-4.5

group breakdown

BUILD53.327 / 39CRE59.724 / 39GEN46.431 / 39LM_ARENA_REVIEW_PROXY26.037 / 39OPS_long75.916 / 39OPS_precision76.020 / 39OPS_review78.116 / 39PLAN38.732 / 39

metrics

AALiveCodeBench28.012 / 17ARC_AGI_23.627 / 31ArtificialAnalysisCoding42.628 / 37ArtificialAnalysisIntelligence45.328 / 37ArtificialAnalysisMath80.510 / 17ArtificialAnalysisReasoning27.732 / 37BFCL85.43 / 15BlendedCost62.530 / 39ContextWindow98.918 / 36CopilotArenaOrLMArenaCode42.632 / 38GDPval82.05 / 38GPQA_HLE_Reasoning27.732 / 37GSO27.313 / 19IFBench37.528 / 37LMArenaCreativeOrOpenEnded59.724 / 39LMArenaDocument43.319 / 33LMArenaSearch8.719 / 22LMArenaText59.724 / 39LongContextRecall60.123 / 37MCPAtlas2.733 / 34MMLUPro75.89 / 28OutputSpeed71.224 / 33SWEAtlasComposite50.012 / 39SWEBenchMultilingual3.532 / 33SWEBenchPro71.327 / 35SWEBenchVerified91.527 / 37SWEComposite71.429 / 39SWERebench81.010 / 36SciCode36.026 / 37SonarBugDensity52.713 / 24SonarComposite57.711 / 39SonarFunctionalSkill76.97 / 24SonarIssueDensity53.515 / 24SonarVulnerabilityDensity20.420 / 24TTFT78.124 / 33Tau2Bench44.128 / 37TerminalBench36.528 / 39TerminalBenchHard37.626 / 37
sources arc_agiartificial_analysisbfclgsolmarenamcp_atlasopenrouteroverridessonarswebenchswebench_proswerebenchterminal_benchmissing SWEAtlasComposite/SWEAtlasQnASWEAtlasComposite/SWEAtlasRefactoringSWEAtlasComposite/SWEAtlasTestWriting
kimi-k2.5moonshot47.9 down 0.4 since last refresh56.1 down 1.6 since last refresh50.6 down 0.8 since last refresh50.8 down 1.0 since last refresh
kimi-k2.5

group breakdown

BUILD47.729 / 39CRE44.529 / 39GEN50.727 / 39LM_ARENA_REVIEW_PROXY35.430 / 39OPS_long64.129 / 39OPS_precision74.724 / 39OPS_review70.519 / 39PLAN57.023 / 39

metrics

ARC_AGI_214.818 / 31ArtificialAnalysisCoding45.826 / 37ArtificialAnalysisIntelligence59.124 / 37ArtificialAnalysisReasoning59.919 / 37BlendedCost86.512 / 39ContextWindow55.528 / 36CopilotArenaOrLMArenaCode50.926 / 38GDPval60.925 / 38GPQA_HLE_Reasoning59.919 / 37IFBench71.520 / 37LMArenaCreativeOrOpenEnded44.529 / 39LMArenaDocument20.928 / 33LMArenaText44.529 / 39LongContextRecall58.424 / 37MCPAtlas23.727 / 34MMLUPro69.112 / 28OutputSpeed50.531 / 33SWEAtlasComposite17.135 / 39SWEAtlasQnA11.616 / 21SWEAtlasRefactoring21.516 / 19SWEAtlasTestWriting16.818 / 21SWEBenchMultilingual8.828 / 33SWEBenchPro87.524 / 35SWEBenchVerified76.533 / 37SWEComposite71.628 / 39SWERebench71.522 / 36SciCode59.015 / 37SonarBugDensity44.418 / 24SonarComposite34.836 / 39SonarFunctionalSkill6.223 / 24SonarIssueDensity68.46 / 24SonarVulnerabilityDensity42.215 / 24TTFT96.87 / 33Tau2Bench94.76 / 37TerminalBench31.131 / 39TerminalBenchHard35.529 / 37
sources arc_agiartificial_analysislmarenamcp_atlasopenrouteroverridessonarsweatlas_qnasweatlas_refactoringsweatlas_test_writingswebenchswerebenchterminal_benchmissing BUILD/AALiveCodeBenchBUILD/BFCLBUILD/GSOGEN/ArtificialAnalysisMathLM_ARENA_REVIEW_PROXY/LMArenaSearchPLAN/BFCL
minimax-m2.5minimax14.8 down 0.3 since last refresh45.2 down 1.3 since last refresh50.2 down 0.7 since last refresh49.2 down 0.9 since last refresh
minimax-m2.5

group breakdown

BUILD47.430 / 39CRE0.039 / 39GEN27.735 / 39LM_ARENA_REVIEW_PROXY37.626 / 39OPS_long81.314 / 39OPS_precision78.016 / 39OPS_review70.120 / 39PLAN52.129 / 39

metrics

ARC_AGI_25.125 / 31ArtificialAnalysisCoding38.729 / 37ArtificialAnalysisIntelligence41.430 / 37ArtificialAnalysisReasoning33.931 / 37BlendedCost93.63 / 39ContextWindow3.130 / 36CopilotArenaOrLMArenaCode41.233 / 38GDPval60.924 / 38GPQA_HLE_Reasoning33.931 / 37IFBench75.317 / 37LMArenaCreativeOrOpenEnded0.039 / 39LMArenaDocument25.223 / 33LMArenaText0.039 / 39LongContextRecall61.822 / 37MCPAtlas27.623 / 34MMLUPro68.013 / 28OutputSpeed100.01 / 33SWEAtlasComposite7.939 / 39SWEAtlasQnA3.420 / 21SWEAtlasRefactoring17.217 / 19SWEAtlasTestWriting0.021 / 21SWEBenchMultilingual26.526 / 33SWEBenchPro95.08 / 35SWEBenchVerified95.24 / 37SWEComposite77.321 / 39SWERebench67.825 / 36SciCode24.831 / 37SonarBugDensity52.715 / 24SonarComposite47.433 / 39SonarFunctionalSkill40.816 / 24SonarIssueDensity67.87 / 24SonarVulnerabilityDensity24.017 / 24TTFT82.518 / 33Tau2Bench93.011 / 37TerminalBench30.232 / 39TerminalBenchHard35.528 / 37
sources arc_agiartificial_analysislmarenaopenrouteroverridessonarsweatlas_qnasweatlas_refactoringsweatlas_test_writingswebenchswerebenchterminal_benchmissing BUILD/AALiveCodeBenchBUILD/BFCLBUILD/GSOGEN/ArtificialAnalysisMathLM_ARENA_REVIEW_PROXY/LMArenaSearchPLAN/BFCL
grok-4.3xai44.5 down 0.4 since last refresh57.5 down 1.4 since last refresh48.6 down 0.7 since last refresh52.7 down 0.8 since last refresh
grok-4.3

group breakdown

BUILD44.333 / 39CRE33.634 / 39GEN54.625 / 39LM_ARENA_REVIEW_PROXY48.113 / 39OPS_long90.96 / 39OPS_precision87.28 / 39OPS_review89.27 / 39PLAN55.228 / 39

metrics

AALiveCodeBench63.89 / 17ARC_AGI_225.010 / 31ArtificialAnalysisCoding31.331 / 37ArtificialAnalysisIntelligence66.322 / 37ArtificialAnalysisMath84.79 / 17ArtificialAnalysisReasoning59.620 / 37BFCL36.911 / 15BlendedCost80.619 / 39ContextWindow98.923 / 36CopilotArenaOrLMArenaCode39.134 / 38GDPval62.818 / 38GPQA_HLE_Reasoning59.620 / 37IFBench100.02 / 37LMArenaCreativeOrOpenEnded33.634 / 39LMArenaSearch46.216 / 22LMArenaText33.634 / 39LongContextRecall56.726 / 37MMLUPro63.217 / 28OutputSpeed94.85 / 33SWEAtlasComposite50.024 / 39SWEComposite47.134 / 39SWERebench42.631 / 36SciCode35.527 / 37SonarComposite50.027 / 39TTFT79.520 / 33Tau2Bench81.420 / 37TerminalBench11.335 / 39TerminalBenchHard22.632 / 37
sources artificial_analysislmarenaopenrouteroverridesmissing BUILD/GSOBUILD/MCPAtlasLM_ARENA_REVIEW_PROXY/LMArenaDocumentPLAN/MCPAtlasSWEAtlasComposite/SWEAtlasQnASWEAtlasComposite/SWEAtlasRefactoringSWEAtlasComposite/SWEAtlasTestWritingSWEComposite/SWEBenchMultilingualSWEComposite/SWEBenchProSWEComposite/SWEBenchVerifiedSonarComposite/SonarBugDensitySonarComposite/SonarFunctionalSkillSonarComposite/SonarIssueDensitySonarComposite/SonarVulnerabilityDensity
claude-sonnet-4anthropic11.5 down 0.1 since last refresh26.5 down 0.9 since last refresh46.1 down 0.4 since last refresh37.4 down 0.6 since last refresh
claude-sonnet-4

group breakdown

BUILD45.231 / 39CRE0.038 / 39GEN18.036 / 39LM_ARENA_REVIEW_PROXY29.635 / 39OPS_long76.115 / 39OPS_precision76.318 / 39OPS_review78.315 / 39PLAN25.035 / 39

metrics

AALiveCodeBench6.616 / 17ARC_AGI_20.230 / 31ArtificialAnalysisCoding28.032 / 37ArtificialAnalysisIntelligence29.832 / 37ArtificialAnalysisMath50.114 / 17ArtificialAnalysisReasoning1.735 / 37BFCL80.17 / 15BlendedCost62.529 / 39ContextWindow98.917 / 36CopilotArenaOrLMArenaCode43.730 / 38GDPval80.27 / 38GPQA_HLE_Reasoning1.735 / 37GSO6.018 / 19IFBench30.730 / 37LMArenaCreativeOrOpenEnded0.038 / 39LMArenaDocument44.317 / 33LMArenaSearch14.918 / 22LMArenaText0.038 / 39LiveCodeBench50.01 / 1LongContextRecall54.927 / 37MCPAtlas9.831 / 34MMLUPro38.122 / 28OutputSpeed71.423 / 33SWEAtlasComposite50.011 / 39SWEBenchMultilingual10.427 / 33SWEBenchPro68.729 / 35SWEBenchVerified99.02 / 37SWEComposite63.532 / 39SWERebench59.030 / 36SciCode10.933 / 37SonarBugDensity0.024 / 24SonarComposite25.238 / 39SonarFunctionalSkill34.619 / 24SonarIssueDensity45.416 / 24SonarVulnerabilityDensity0.024 / 24TTFT78.721 / 33Tau2Bench6.035 / 37TerminalBench38.627 / 39TerminalBenchHard24.831 / 37
sources arc_agiartificial_analysisgsolivecodebenchlmarenaopenroutersonarswebenchswebench_proswerebenchmissing SWEAtlasComposite/SWEAtlasQnASWEAtlasComposite/SWEAtlasRefactoringSWEAtlasComposite/SWEAtlasTestWriting
glm-4.7zai58.1 down 0.4 since last refresh51.7 down 1.5 since last refresh45.8 down 0.7 since last refresh47.5 down 0.9 since last refresh
glm-4.7

group breakdown

BUILD42.835 / 39CRE58.425 / 39GEN53.726 / 39LM_ARENA_REVIEW_PROXY50.012 / 39OPS_long72.420 / 39OPS_precision76.317 / 39OPS_review66.425 / 39PLAN46.830 / 39

metrics

AALiveCodeBench93.64 / 17ArtificialAnalysisCoding35.130 / 37ArtificialAnalysisIntelligence42.129 / 37ArtificialAnalysisMath96.04 / 17ArtificialAnalysisReasoning47.727 / 37BlendedCost87.010 / 39ContextWindow0.933 / 36CopilotArenaOrLMArenaCode62.820 / 38GDPval28.934 / 38GPQA_HLE_Reasoning47.727 / 37IFBench65.421 / 37LMArenaCreativeOrOpenEnded58.425 / 39LMArenaText58.425 / 39LongContextRecall51.528 / 37MCPAtlas0.034 / 34MMLUPro54.121 / 28OutputSpeed79.312 / 33SWEAtlasComposite50.027 / 39SWEBenchMultilingual5.031 / 33SWEBenchVerified83.932 / 37SWEComposite55.733 / 39SWERebench61.729 / 36SciCode38.224 / 37SonarBugDensity34.022 / 24SonarComposite24.439 / 39SonarFunctionalSkill0.024 / 24SonarIssueDensity58.212 / 24SonarVulnerabilityDensity20.421 / 24TTFT100.01 / 33Tau2Bench94.78 / 37TerminalBench14.833 / 39TerminalBenchHard26.930 / 37
sources artificial_analysislmarenamcp_atlasopenrouteroverridessonarswerebenchterminal_benchmissing BUILD/BFCLBUILD/GSOGEN/ARC_AGI_2LM_ARENA_REVIEW_PROXY/LMArenaDocumentLM_ARENA_REVIEW_PROXY/LMArenaSearchPLAN/BFCLSWEAtlasComposite/SWEAtlasQnASWEAtlasComposite/SWEAtlasRefactoringSWEAtlasComposite/SWEAtlasTestWritingSWEComposite/SWEBenchPro
kimi-k2-0905moonshot8.7 down 0.1 since last refresh17.4 down 0.6 since last refresh43.7 down 0.2 since last refresh36.5 down 0.4 since last refresh
kimi-k2-0905

group breakdown

BUILD44.632 / 39CRE7.137 / 39GEN4.739 / 39LM_ARENA_REVIEW_PROXY47.315 / 39OPS_long35.839 / 39OPS_precision58.631 / 39OPS_review54.534 / 39PLAN19.836 / 39

metrics

AALiveCodeBench0.017 / 17ArtificialAnalysisCoding1.435 / 37ArtificialAnalysisIntelligence1.635 / 37ArtificialAnalysisMath12.416 / 17ArtificialAnalysisReasoning0.036 / 37BlendedCost83.815 / 39ContextWindow55.527 / 36CopilotArenaOrLMArenaCode85.612 / 38GDPval5.037 / 38GPQA_HLE_Reasoning0.036 / 37IFBench0.036 / 37LMArenaCreativeOrOpenEnded7.137 / 39LMArenaDocument44.511 / 33LMArenaText7.137 / 39LongContextRecall0.036 / 37MCPAtlas68.58 / 34MMLUPro11.925 / 28OutputSpeed0.033 / 33SWEAtlasComposite50.018 / 39SWEBenchMultilingual5.029 / 33SWEBenchPro92.520 / 35SWEBenchVerified68.435 / 37SWEComposite68.130 / 39SWERebench62.328 / 36SciCode0.036 / 37SonarComposite50.024 / 39TTFT95.710 / 33Tau2Bench30.833 / 37TerminalBench34.929 / 39TerminalBenchHard3.435 / 37
sources artificial_analysislmarenaopenrouteroverridesmissing BUILD/BFCLBUILD/GSOGEN/ARC_AGI_2LM_ARENA_REVIEW_PROXY/LMArenaSearchPLAN/BFCLSWEAtlasComposite/SWEAtlasQnASWEAtlasComposite/SWEAtlasRefactoringSWEAtlasComposite/SWEAtlasTestWritingSonarComposite/SonarBugDensitySonarComposite/SonarFunctionalSkillSonarComposite/SonarIssueDensitySonarComposite/SonarVulnerabilityDensity
grok-4-latestxai38.5 down 0.3 since last refresh39.5 down 1.5 since last refresh42.8 down 0.6 since last refresh37.2 down 0.9 since last refresh
grok-4-latest

group breakdown

BUILD43.434 / 39CRE34.733 / 39GEN44.432 / 39LM_ARENA_REVIEW_PROXY25.038 / 39OPS_long46.538 / 39OPS_precision42.938 / 39OPS_review42.939 / 39PLAN35.833 / 39

metrics

AALiveCodeBench66.38 / 17ARC_AGI_220.614 / 31ArtificialAnalysisCoding48.825 / 37ArtificialAnalysisIntelligence39.931 / 37ArtificialAnalysisMath90.97 / 17ArtificialAnalysisReasoning48.925 / 37BFCL34.613 / 15BlendedCost14.537 / 39GDPval6.936 / 38GPQA_HLE_Reasoning48.925 / 37IFBench28.031 / 37LMArenaCreativeOrOpenEnded34.733 / 39LMArenaSearch0.022 / 22LMArenaText34.733 / 39LongContextRecall72.017 / 37MMLUPro65.516 / 28SWEAtlasComposite50.023 / 39SWEComposite46.535 / 39SWERebench41.332 / 36SciCode41.421 / 37SonarComposite50.026 / 39Tau2Bench35.031 / 37TerminalBench4.536 / 39TerminalBenchHard44.025 / 37
sources arc_agiartificial_analysisbfcllmarenaoverridesswerebenchterminal_benchmissing BUILD/CopilotArenaOrLMArenaCodeBUILD/GSOBUILD/MCPAtlasLM_ARENA_REVIEW_PROXY/LMArenaDocumentOPS_long/ContextWindowOPS_long/OutputSpeedOPS_long/TTFTOPS_precision/ContextWindowOPS_precision/OutputSpeedOPS_precision/TTFTOPS_review/ContextWindowOPS_review/OutputSpeedOPS_review/TTFTPLAN/MCPAtlasSWEAtlasComposite/SWEAtlasQnASWEAtlasComposite/SWEAtlasRefactoringSWEAtlasComposite/SWEAtlasTestWritingSWEComposite/SWEBenchMultilingualSWEComposite/SWEBenchProSWEComposite/SWEBenchVerifiedSonarComposite/SonarBugDensitySonarComposite/SonarFunctionalSkillSonarComposite/SonarIssueDensitySonarComposite/SonarVulnerabilityDensity
gemini-2.5-progoogle67.3 down 0.2 since last refresh37.1 down 0.8 since last refresh42.0 down 0.4 since last refresh32.9 down 0.5 since last refresh
gemini-2.5-pro

group breakdown

BUILD39.536 / 39CRE77.814 / 39GEN40.233 / 39LM_ARENA_REVIEW_PROXY6.039 / 39OPS_long86.98 / 39OPS_precision82.811 / 39OPS_review85.411 / 39PLAN28.334 / 39

metrics

AALiveCodeBench59.710 / 17ARC_AGI_23.628 / 31ArtificialAnalysisCoding21.233 / 37ArtificialAnalysisIntelligence15.033 / 37ArtificialAnalysisMath79.811 / 17ArtificialAnalysisReasoning37.030 / 37BFCL77.09 / 15BlendedCost73.922 / 39ContextWindow100.07 / 36CopilotArenaOrLMArenaCode0.037 / 38GDPval30.432 / 38GPQA_HLE_Reasoning37.030 / 37GSO0.019 / 19IFBench14.934 / 37LMArenaCreativeOrOpenEnded77.814 / 39LMArenaDocument11.929 / 33LMArenaSearch0.121 / 22LMArenaText77.814 / 39LongContextRecall61.821 / 37MCPAtlas49.116 / 34MMLUPro61.019 / 28OutputSpeed90.48 / 33SWEAtlasComposite50.015 / 39SWEBenchMultilingual36.023 / 33SWEBenchPro67.331 / 35SWEBenchVerified93.022 / 37SWEComposite41.137 / 39SWERebench0.036 / 36SciCode25.930 / 37SonarComposite50.019 / 39TTFT74.128 / 33Tau2Bench0.037 / 37TerminalBench13.534 / 39TerminalBenchHard12.033 / 37
sources arc_agiartificial_analysisgsolmarenaopenrouterswebenchswerebenchterminal_benchmissing SWEAtlasComposite/SWEAtlasQnASWEAtlasComposite/SWEAtlasRefactoringSWEAtlasComposite/SWEAtlasTestWritingSonarComposite/SonarBugDensitySonarComposite/SonarFunctionalSkillSonarComposite/SonarIssueDensitySonarComposite/SonarVulnerabilityDensity
grok-code-fast-1xai30.0 up 0.1 since last refresh13.8 down 0.1 since last refresh30.0 up 0.1 since last refresh23.9
grok-code-fast-1

group breakdown

BUILD30.537 / 39CRE37.031 / 39GEN11.138 / 39LM_ARENA_REVIEW_PROXY28.836 / 39OPS_long47.037 / 39OPS_precision44.037 / 39OPS_review44.038 / 39PLAN11.238 / 39

metrics

AALiveCodeBench7.315 / 17ARC_AGI_225.011 / 31ArtificialAnalysisCoding0.037 / 37ArtificialAnalysisIntelligence0.037 / 37ArtificialAnalysisMath0.017 / 17ArtificialAnalysisReasoning0.037 / 37BFCL36.912 / 15BlendedCost19.836 / 39CopilotArenaOrLMArenaCode0.038 / 38GDPval5.038 / 38GPQA_HLE_Reasoning0.037 / 37IFBench0.037 / 37LMArenaCreativeOrOpenEnded37.031 / 39LMArenaSearch7.520 / 22LMArenaText37.031 / 39LongContextRecall0.037 / 37MMLUPro0.028 / 28SWEAtlasComposite50.025 / 39SWEBenchVerified73.834 / 37SWEComposite45.236 / 39SWERebench29.034 / 36SciCode0.037 / 37SonarComposite50.028 / 39Tau2Bench37.530 / 37TerminalBench2.237 / 39TerminalBenchHard0.037 / 37
sources artificial_analysislmarenaoverridesswerebenchterminal_benchmissing BUILD/GSOBUILD/MCPAtlasLM_ARENA_REVIEW_PROXY/LMArenaDocumentOPS_long/ContextWindowOPS_long/OutputSpeedOPS_long/TTFTOPS_precision/ContextWindowOPS_precision/OutputSpeedOPS_precision/TTFTOPS_review/ContextWindowOPS_review/OutputSpeedOPS_review/TTFTPLAN/MCPAtlasSWEAtlasComposite/SWEAtlasQnASWEAtlasComposite/SWEAtlasRefactoringSWEAtlasComposite/SWEAtlasTestWritingSWEComposite/SWEBenchMultilingualSWEComposite/SWEBenchProSonarComposite/SonarBugDensitySonarComposite/SonarFunctionalSkillSonarComposite/SonarIssueDensitySonarComposite/SonarVulnerabilityDensity
glm-4.6zai53.1 down 0.2 since last refresh24.6 down 1.1 since last refresh28.0 down 0.5 since last refresh28.4 down 0.7 since last refresh
glm-4.6

group breakdown

BUILD25.138 / 39CRE62.522 / 39GEN30.034 / 39LM_ARENA_REVIEW_PROXY50.011 / 39OPS_long67.027 / 39OPS_precision72.225 / 39OPS_review62.830 / 39PLAN14.137 / 39

metrics

AALiveCodeBench21.114 / 17ArtificialAnalysisCoding13.134 / 37ArtificialAnalysisIntelligence7.434 / 37ArtificialAnalysisMath76.012 / 17ArtificialAnalysisReasoning9.433 / 37BFCL81.15 / 15BlendedCost86.711 / 39ContextWindow0.932 / 36CopilotArenaOrLMArenaCode31.236 / 38GDPval12.035 / 38GPQA_HLE_Reasoning9.433 / 37IFBench0.935 / 37LMArenaCreativeOrOpenEnded62.522 / 39LMArenaText62.522 / 39LongContextRecall2.035 / 37MCPAtlas7.532 / 34MMLUPro23.324 / 28OutputSpeed71.125 / 33SWEAtlasComposite50.026 / 39SWEBenchMultilingual5.030 / 33SWEBenchPro0.035 / 35SWEBenchVerified66.736 / 37SWEComposite26.738 / 39SWERebench40.533 / 36SciCode2.435 / 37SonarBugDensity36.419 / 24SonarComposite28.237 / 39SonarFunctionalSkill7.522 / 24SonarIssueDensity57.014 / 24SonarVulnerabilityDensity24.816 / 24TTFT95.511 / 33Tau2Bench22.634 / 37TerminalBench0.039 / 39TerminalBenchHard7.734 / 37
sources artificial_analysisbfcllmarenaopenrouteroverridesswebenchswebench_proswerebenchterminal_benchmissing BUILD/GSOGEN/ARC_AGI_2LM_ARENA_REVIEW_PROXY/LMArenaDocumentLM_ARENA_REVIEW_PROXY/LMArenaSearchSWEAtlasComposite/SWEAtlasQnASWEAtlasComposite/SWEAtlasRefactoringSWEAtlasComposite/SWEAtlasTestWriting
gemini-2.5-flashgoogle38.418.2 down 0.3 since last refresh25.7 down 0.2 since last refresh24.7 down 0.3 since last refresh
gemini-2.5-flash

group breakdown

BUILD21.339 / 39CRE41.630 / 39GEN16.837 / 39LM_ARENA_REVIEW_PROXY35.331 / 39OPS_long93.92 / 39OPS_precision89.26 / 39OPS_review91.45 / 39PLAN8.939 / 39

metrics

AALiveCodeBench21.113 / 17ARC_AGI_20.729 / 31ArtificialAnalysisCoding0.036 / 37ArtificialAnalysisIntelligence0.036 / 37ArtificialAnalysisMath47.915 / 17ArtificialAnalysisReasoning7.134 / 37BFCL1.314 / 15BlendedCost85.813 / 39ContextWindow100.06 / 36CopilotArenaOrLMArenaCode59.925 / 38GDPval32.330 / 38GPQA_HLE_Reasoning7.134 / 37GSO19.414 / 19IFBench19.033 / 37LMArenaCreativeOrOpenEnded41.630 / 39LMArenaDocument9.730 / 33LMArenaSearch60.911 / 22LMArenaText41.630 / 39LongContextRecall39.634 / 37MCPAtlas18.928 / 34MMLUPro26.723 / 28OutputSpeed99.62 / 33SWEAtlasComposite17.234 / 39SWEAtlasQnA7.519 / 21SWEAtlasRefactoring7.518 / 19SWEAtlasTestWriting39.912 / 21SWEBenchMultilingual92.512 / 33SWEBenchPro46.232 / 35SWEBenchVerified0.037 / 37SWEComposite25.439 / 39SWERebench0.035 / 36SciCode7.734 / 37SonarComposite50.018 / 39TTFT77.626 / 33Tau2Bench0.036 / 37TerminalBench0.038 / 39TerminalBenchHard0.036 / 37
sources arc_agiartificial_analysisbfcllmarenaopenrouterswebenchswerebenchterminal_benchmissing SonarComposite/SonarBugDensitySonarComposite/SonarFunctionalSkillSonarComposite/SonarIssueDensitySonarComposite/SonarVulnerabilityDensity