Which company has best AI model end of August? (Chatbot Arena Leaderboard)
306
Ṁ220k
Aug 31
79%
Google
16%
OpenAI
4%
xAI
1%
DeepSeek
Get Ṁ1,000 play money
Sort by:

@ng Google leads with style control removed

bought Ṁ50 OpenAI YES

This might change this market

bought Ṁ200 Google YES

@JoaoPedroSantos gpt5-high is now shown and it's still below Gemini 2.5 Pro.

@BayesianOracle they just renamed gpt-5 to gpt-5-high for transparency

@Bayesian gotcha, ty

One interesting thing is head-to-head (with style control), GPT-5 losses to Gemini 2.5 about ~66% of the time, which is significant (p<0.05). GPT-5 beats out some other models at a bit higher rate, but not by much. For example, if we look at the rate GPT-5 beats Claude-Sonnet-4-thinking (0.74 with 47 samples) and the rate rate Gemini2.5 beats than Claude-Sonnet-4-thiniing (0.68 with 330 samples), we can note GPT-5 rate is not significantly greater than the Gemini2.5 rate (Fischer exact test p ~= 0.24).

The 21 point ELO with lead style control seems tenuous, and then are tied in ELO without style control. With more data, Gemini could take the lead there.


(though also just noticed this data is 4 days out of date. They may have made some changes right after release which changes the dynamics)

updates happen every week or so, and gemini 2.5 pro is leading without style control but yeah this is a curious stat (that gemini crushes head-to-head)

About the bit about resolving proportionally in case of a tie, is that for a tie in rankings? e.g., like how right now Google and OpenAI are both at rank 1 without style control (unless I'm misreading).

@sblaplace No, you can have the same ranking but different arena score, and ties refer to arena score ties only

sold Ṁ27 Google NO

@Bayesian got it, so that's only in case of an exact ELO tie, makes sense ^~^ thanks

filled aṀ250OpenAI YES at 24% order

@AffineTyped wanna bet more...

opened a Ṁ500 OpenAI YES at 20% order

@AffineTyped oh I didn't see you're turning off style control. Lame

rip, mb, it was previously something like "default settings (with style control off)" bc it was a port from previous months when that was the default

@Bayesian yah it's my reading failure, and for some reason I thought we had all migrated to just whatever the leaderboard says at the end of the month

@AffineTyped regardless of their defaulta

Yeah i’m kind of hoping polymarket does this for next year and im planning to do it for next year but yeah arbness is a nice property

Hello

bought Ṁ223 OpenAI YES

Why is this market so down on OpenAI considering their in the lead

@ng Google leads with style control removed

bought Ṁ250 OpenAI NO

@Bayesian "without filters" and "without style control" are kind of contradictory, given the default is with style control?

@bens dang i'll make sure to update all the markets to make this clearer. it's meant to track the polymarket market and be without style control, but hm

it may make sense to N/A this, that is a pretty unfortunate development

@Bayesian I mean, I kind of assumed the "without style control" superceded, and I think you can leave it open? but idk

@bens ok, updated and reopened

@Bayesian Is this N/A?

@Trazyn no, will resolve EOM based on lmarena.ai without style control