Will SotA on SWE-Lancer (Diamond) reach 400K USD (80%) in 2025?

SWE-Lancer evaluates AI agents' ability to complete real-world freelance software engineering tasks sourced from Upwork, mapping performance directly to monetary value.

This market focuses the open-sourced SWE-Lancer Diamond evaluation set, which comprises 502 tasks (237 IC SWE, 265 SWE Manager) collectively valued at $500,800 USD.

The current State-of-the-Art (SotA) reported in the paper for total earnings on SWE-Lancer Diamond is $208,050 USD under a pass@1 metric (achieved by Claude 3.5 Sonnet).

The figures above (taken from the paper) illustrate the key idea behind the benchmark (which spans IC SWE and SWE Manager tasks).

Market Details

Source: This market resolves based on published data from the maintainers of the SWE-Lancer benchmark (e.g., OpenAI researchers listed in the paper or designated successors) or credible third-party evaluations using the official benchmark configuration and SWE-Lancer Diamond set.
Metric: Total Payout Earned (USD) on the SWE-Lancer Diamond set. This is the sum of the real-world payouts associated with each successfully completed task (pass@1) in the Diamond set (across both IC SWE and SWE Manager tasks).
Target Score: Greater than or equal to $400,000.00 USD.
Reporting Window: The score must be achieved by an AI agent and credibly reported (e.g., in a peer-reviewed publication, arXiv preprint, official leaderboard, major AI lab report, public repo) before December 31st, 2025, 23:59 UTC.

Resolution Criterion

This market resolves to YES if the State-of-the-Art (SotA) Total Payout Earned on the SWE-Lancer Diamond set is credibly reported to have reached or surpassed $400,000.00 USD within the reporting window.

Otherwise, the market resolves to NO.

Market Closing Date

The market will close on January 15, 2026, 00:00 UTC, to allow for potential reporting delays. It may resolve earlier if the YES condition (>= $400,000.00 USD reported) is met and confirmed before this date.

Related questions

Related questions