Will there be a model that has a 75% win rate against the latest iteration of GPT-4 as of January 1st, 2025?

Ṁ19k

Jan 2

58%

chance

ALL

As per the LMSYS Chatbot Arena leaderboard, the latest iteration of GPT-4 currently has a 77% (0.77) win rate against Mistral Medium, approximately representing the advantage of GPT-4 over GPT 3.5. As of 2025-01-01, will there be a model that has a 75% or higher win rate against the latest iteration of GPT-4?

Clarifications:

I will look at the ranking of the models in the Fraction of Model A Wins for All Non-tied A vs. B Battles section of Chatbot Arena, or an equivalent section as of Jan 1st, 2025. If a new GPT-4 model is released on (say) Dec 31st, 2024 and is not yet ranked on Chatbot Arena, it will not count for the purposes of this question.
Any model that's named gpt-4-* will count. So gpt-4-turbo-2025-01-01 or gpt-4-hyper-advanced will count as "GPT-4". Something like gpt-4.1-turbo or gpt-5-turbo will not count as "GPT-4".
If ChatBot arena no longer provides a win % for any GPT-4 models or ceases to exist entirely, this question will resolve as N/A.
If the ChatBot Arena website happens to be down for maintenance or any technical issue on Jan 1st, 2025, I will keep trying again for 7 days. If after 7 days the ranking is still unavailable, I will resolve this to N/A.

#AI

#Technical AI Timelines

#Chatbot Arena Leaderboard

#GPT-4

Get Ṁ1,000 play money

15 Comments

Sort by:

New question with a more reasonable threshold.

bought Ṁ250 YES

Very small sample (<30 heads up matches) but o1-preview is now at a 75% win rate against both remaining GPT-4 models in the Arena. I'm now quite bullish on this market resolving to Yes.

sold Ṁ24 YES

I fear regression to the mean will wipe out this gap as we get more samples.

@SergeyDavidoff possible but 'o1' (non-preview) is supposed to come out by EOY, plus I assume o1-preview is actively being tweaked right now based off user feedback.

gpt-4o-latest is now at a 68.4% win rate against the latest GPT-4

Gemini 1.5 Pro (experimental) is at 62.7% as of today against the latest GPT-4

turbo isn't the latest gpt-4.

oh nm - gpt-4o doesn't count

Chat Arena is a flawed benchmark anyway, and based on the numbers it seems almost impossible this would happen. GPT-4o, the number 1 model on the leaderboard, only has a 71% win rate against its most lopsided matchup. That matchup is GLM-4, a model which is terrible in all my evaluations.

Even if the next gen of models are a huge step up over the current SOTA, I don't expect any of them to achieve that high of a win rate

@JaundicedBaboon

Very small sample (<30 heads up matches) but o1-preview has now done it :-) The market will resolve on January 1st and things might change with a bigger sample size but l'm becoming pretty bullish on this resolving to Yes.

GPT-4o has a 61% win rate against GPT-4-turbo. Pretty good but still needs a lot of work to get to 75%.

GPT-4o did not follow the gpt-4-x scheme, so it doesn't count as GPT-4 for the purposes of this question (it would've if it was called gpt-4*-*o). So if gpt-4o beats gpt-4-latest with a 75+% win rate in the leaderboard, this will resolve to Yes.

Danopened aṀ250NO at 52% order

reposted

Limit up if anyone wants to take it

why is this market less than the gpt-5 comes out in 2024 market? The 'gpt-5 will be underwhelming' money?
https://manifold.markets/VictorLJZ/will-gpt5-be-released-before-2025

@JoeandSeth GPT-5 could come out and still fail to have a decisive advantage over GPT-4.

Related questions

Related questions