Will OpenAI's next-generation model score 65% or higher on the GPQA benchmark?

Plus

13

Ṁ803

resolved Sep 16

Resolved

YES

1D

1W

1M

ALL

Resolve to YES if OpenAI's next generation language model scores 65% or higher on the GPQA benchmark(extended set).

If OpenAI's existing model gets 65% or higher by post-training enhancements, that also counts.

There's room for improvement via prompt engineering after the release, but I don't know how long I should wait, so I will resolve this question as soon as OpenAI releases their next model.

GPT-5 Capabilities

#️ Technology

#Technical AI Timelines

Get Ṁ1,000 play money

🏅 Top traders

#	Name	Total profit
1		Ṁ168
2		Ṁ68
3		Ṁ23
4		Ṁ22
5		Ṁ11

Related questions

Will there be an AI language model that strongly surpasses ChatGPT and other OpenAI models at the end of 2025?

+4% 1d72% chance

Will OpenAI's next major LLM (after GPT-4) achieve over 50% resolution rate on the SWE-bench benchmark?

Will any AI model score above 95% on GRAB by the end of 2025?

Will the gap between open-weights and frontier models on GPQA Diamond be at most 7%?

Will a single model achieve superhuman performance on all OpenAI gym environments by 2025?

Will OpenAI claim that it has achieved AGI in 2025?

Will OpenAI models achieve ≥90% on SimpleBench by the end of 2025?

Will any AI model score >80% on Epoch's Frontier Math Benchmark in 2025?

Will OpenAI announce a new model that EpochAI estimates is at least as large as GPT-4.5, in 2025?

In what year will AI achieve a score of 95% or higher on the GPQA benchmark?

Related questions

Will there be an AI language model that strongly surpasses ChatGPT and other OpenAI models at the end of 2025?

Will OpenAI claim that it has achieved AGI in 2025?

Will OpenAI's next major LLM (after GPT-4) achieve over 50% resolution rate on the SWE-bench benchmark?

Will OpenAI models achieve ≥90% on SimpleBench by the end of 2025?

Will any AI model score above 95% on GRAB by the end of 2025?

Will any AI model score >80% on Epoch's Frontier Math Benchmark in 2025?

Will the gap between open-weights and frontier models on GPQA Diamond be at most 7%?

Will OpenAI announce a new model that EpochAI estimates is at least as large as GPT-4.5, in 2025?

Will a single model achieve superhuman performance on all OpenAI gym environments by 2025?

In what year will AI achieve a score of 95% or higher on the GPQA benchmark?