Top SWE-Bench Pro public dataset score by January 1, 2026

Ṁ1121

Jan 1

Invalid contract

This market predicts what the highest score on the SWE-Bench Pro public dataset leaderboard will be as of January 1, 2026.

Current top performers on SWE-Bench Pro public dataset (as of September 24 2025):

OpenAI GPT-5: 23.26%
Claude Opus 4.1: 22.71%

Resolution Criteria: This market will resolve to the score range that contains the highest score on the official SWE-Bench Pro public dataset leaderboard (https://scale.com/leaderboard/swe_bench_pro_public) as of January 1, 2026.

Update 2025-12-12 (PST) (AI summary of creator comment): The market will resolve based on Scale AI's verified scores on the official SWE-Bench Pro public dataset leaderboard, not self-reported scores from model creators.
- Self-reported scores (like Claude Opus 4.5's 52.0% or GPT 5.2 Thinking's 55.6%) will only count if Scale AI independently verifies them
- Example: Claude Opus 4.5 reported 52.0% but Scale AI evaluated it at 45.89%, so it would resolve to the 45.89% range

#AI

#️ Technology

#Technical AI Timelines

#AI Benchmarks

#AGI

Get Ṁ1,000 play money

5 Comments

Sort by:

bought Ṁ8 Answer #NO

While Claude Opus 4.5 reported a 52.0% on SWE-Bench Pro, Scale AI evaluated it at a 45.89. OpenAI reports that GPT 5.2 Thinking got a 55.6% but this will only resolve 55+ if Scale AI verifies it

sold Ṁ9 Answer #YES

@Jolliest very relevant information, thanks!

created a market for half 2026 https://manifold.markets/RenanCunha/best-swebench-pro-public-score-by-j

Would love a 2027 market of this

Market might be a little scuffed because I'm cheap with mana. Feel free to make another market for SWE-Bench Pro that is more precise

Invalid contract

Related questions

Related questions