Will an AI SWE model score higher than 50% on SWE-bench in 2024?
16
Ṁ470
Dec 31
20%
chance

Get Ṁ1,000 play money
Sort by:

Traders (I can't tell which mention to use) -- how do you feel about changing this to be SWE-bench Verified explicitly?

https://www.swebench.com/ -- explanation of the differences here:

SWE-bench Lite is a subset of SWE-bench that's been curated to make evaluation less costly and more accessible.
SWE-bench Verified is a human annotator filtered subset that has been deemed to have a ceiling of 100% resolution rate.

If traders by majority do not want this change, we'll leave it at SWE-bench Full (which does not have a 100% resolution ceiling). And to make it fairer, it should be a majority of people voting NO.

https://x.com/alistairpullen/status/1822981361608888619

30% on SWE-Bench based on this tweet.