Will an AI SWE model score higher than 50% on SWE-bench in 2024?
Plus
16
Ṁ470resolved Jan 1
Resolved
NO1D
1W
1M
ALL
Get Ṁ1,000 play money
🏅 Top traders
# | Name | Total profit |
---|---|---|
1 | Ṁ41 | |
2 | Ṁ18 | |
3 | Ṁ16 | |
4 | Ṁ15 | |
5 | Ṁ8 |
Sort by:
Traders (I can't tell which mention to use) -- how do you feel about changing this to be SWE-bench Verified explicitly?
https://www.swebench.com/ -- explanation of the differences here:
SWE-bench Lite is a subset of SWE-bench that's been curated to make evaluation less costly and more accessible.
SWE-bench Verified is a human annotator filtered subset that has been deemed to have a ceiling of 100% resolution rate.
If traders by majority do not want this change, we'll leave it at SWE-bench Full (which does not have a 100% resolution ceiling). And to make it fairer, it should be a majority of people voting NO.
Related questions
Related questions
What will be the highest score achieved on SWE-Bench Verified in 2025?
AI resolves at least X% on SWE-bench without any assistance, by 2028?
AI resolves at least X% on SWE-bench WITH assistance, by 2028?
Will any model get above human level on the Simple Bench benchmark before September 1st, 2025.
42% chance
Will an autonomous agent resolve 90% of tasks on SWE-bench by 2026?
50% chance
Top SWE-Bench Verified score in 2025?
-
Will any AI model score >80% on Epoch's Frontier Math Benchmark in 2025?
10% chance
Will any AI model score above 95% on GRAB by the end of 2025?
40% chance
Will an AI score over 80% on FrontierMath Benchmark in 2025
10% chance
What will be the best performance on SWE-bench Verified by December 31st 2025?