Top SWE-Bench Pro public dataset score by January 1, 2026
9
Ṁ1121
Jan 1

Invalid contract

This market predicts what the highest score on the SWE-Bench Pro public dataset leaderboard will be as of January 1, 2026.

Current top performers on SWE-Bench Pro public dataset (as of September 24 2025):

  • OpenAI GPT-5: 23.26%

  • Claude Opus 4.1: 22.71%

Resolution Criteria: This market will resolve to the score range that contains the highest score on the official SWE-Bench Pro public dataset leaderboard (https://scale.com/leaderboard/swe_bench_pro_public) as of January 1, 2026.

  • Update 2025-12-12 (PST) (AI summary of creator comment): The market will resolve based on Scale AI's verified scores on the official SWE-Bench Pro public dataset leaderboard, not self-reported scores from model creators.

    • Self-reported scores (like Claude Opus 4.5's 52.0% or GPT 5.2 Thinking's 55.6%) will only count if Scale AI independently verifies them

    • Example: Claude Opus 4.5 reported 52.0% but Scale AI evaluated it at 45.89%, so it would resolve to the 45.89% range

Get Ṁ1,000 play money
Sort by:
bought Ṁ8 Answer #NO

While Claude Opus 4.5 reported a 52.0% on SWE-Bench Pro, Scale AI evaluated it at a 45.89. OpenAI reports that GPT 5.2 Thinking got a 55.6% but this will only resolve 55+ if Scale AI verifies it

sold Ṁ9 Answer #YES

@Jolliest very relevant information, thanks!

Would love a 2027 market of this

Market might be a little scuffed because I'm cheap with mana. Feel free to make another market for SWE-Bench Pro that is more precise