
GPT-5 with scaffolding or access to tools counts as long as GPT-5 is making real decisions.
Resolves No if GPT-5 doesn’t score Bronze or higher, or if it does not does not exist by then, or nobody makes an IMO attempt by the end of August 2025
Update 2025-03-01 (PST): - o3 or future o series models do not count as GPT-5 for this question. Models called GPT-4x would not count. If OpenAI abandons the GPT-X naming scheme and comes out with a new flagship model to replace GPT-4 that's not part of the o series, that counts.
Update 2025-08-07 (PST) (AI summary of creator comment): Multiple instances with consensus (like GPT-5 Pro) are allowed. Web search is not allowed.
Update 2025-08-07 (PST) (AI summary of creator comment): GPT-5 must be able to get bronze the majority of the time, or get an average score of bronze or higher
Update 2025-08-14 (PST) (AI summary of creator comment): - No hints in prompts: Attempts that include any hints in the problem prompt will not count toward resolution.
Remain open until market close (end of August 2025): The market will stay open until close to allow further attempts with improved prompting/scaffolding; early failed attempts alone will not trigger a No resolution.
@DottedCalculator https://matharena.ai/
should resolve no
Thanks, seems like a solid attempt. They averaged 16 points and Bronze is 19 points, so not far off. Sorry the original criteria was unclear, I was expecting that OpenAI would likely have an official attempt. I think I'll keep the market open until market close at the end of August to see if anyone achieves Bronze with better prompting or scaffolding. Though if they include hints in the prompt that will not count.
@qumeric the people i've seen claiming this have been "kinda cheating", eg giving the model hints as to what direction to attempt to solve the problem from. if you know cases not like that i'd be curious to see them. others just take much more than 4.5 hours per set of 3 problems
o3 or future o series models do not count as GPT-5 for this question. GPT-4x would not count. If OpenAI abandons the GPT-X naming scheme and comes out with a new flagship model to replace GPT-4 that's not part of the o series, that counts.
@ahalekelly If gpt5 can use reasoning models like it uses search or other features, so you can get it to solve the imo through o3, does it count?
Yeah that sounds reasonable