Until what date will an Google model hold the top position in the Chatbot Arena? (>=week)
➕
Plus
12
Ṁ9681
Jun 1
85%
1st Febuary 2025
5%
1st January 2025
4%
1st December 2024
3%
Other
1.3%
1st March 2025

https://lmarena.ai/

  • The market will resolve to YES if a Google model is the top chatbot on the specified date and for the entire month prior.

  • The market will resolve to NO if a model from any organization other than Google is the top model during the month before the specified date (and will also resolve NO for subsequent dates).

Exemple:

If LLaMA becomes the top model on January 5th, 2024, the market will resolve as follows:

  • The market will resolve to NO for all dates after January 5th, 2024, since a model from an organization other than OpenAI has taken the top position.

  • Even if an OpenAI model regains the top spot later, the market will not change back to YES for any subsequent dates, as the condition for NO was met on January 5th, 2024.

Get Ṁ1,000 play money
Sort by:

@mods The author seems to have left the site (there has been no activity in over three months). Could you please resolve this?

  • Feb 1st should resolve to YES by all means. The condition for this has been cleared with huge duration margins on either side.

  • Jan 1st would be most fair to also resolve to YES: the question specifies "≥ week" and "for the entire month prior". Jan 5th was the first update in January, and Google had held the 1st position between the Dec 5th and Jan 5th updates (in other words, for the entire month prior to the nearest update past Jan 1st). I did not trade for or against this option to avoid appearing biased.

Context is here and in the comment below: https://fixupx.com/lmarena_ai/status/1908612927785230476

Let's try to reconstruct the chronology from the LMArena Twitter where they publish any notable changes in rankings.

  1. 2024-12-06: Gemini-Exp-1206 regains first place after the previous Gemini-Exp release lost it to GPT-4o-1120: https://x.com/lmarena_ai/status/1865080944455225547/photo/1

  2. 2024-12-11: Gemini-Exp-1206 still first: https://x.com/lmarena_ai/status/1866873983569891378/photo/1

  3. 2024-12-19: Gemini-Exp-1206 still first: https://x.com/lmarena_ai/status/1869793847548817563/photo/1

  4. 2024-12-30: Gemini-Exp-1206 still first: https://x.com/lmarena_ai/status/1873695386323566638/photo/1

  5. 2025-01-05: Gemini-Exp-1206 still first: https://x.com/lmarena_ai/status/1876020372329660862/photo/1

  6. 2025-01-22: Gemini-2.0-Flash-Thinking-Exp-01-21 matches it for joint first: https://x.com/lmarena_ai/status/1881848934743904319/photo/1

  7. 2025-01-24: Gemini-Exp-1206 and Gemini-2.0-Flash-Thinking-Exp-01-21 joint first: https://x.com/lmarena_ai/status/1882749951924715578/photo/1

  8. 2025-02-03: Gemini-2.0-Flash-Thinking-Exp-01-21 first, Gemini-Exp-1206 falls behind: https://x.com/lmarena_ai/status/1886481741428482313/photo/1

  9. 2025-02-05: Gemini-2.0-Pro-Exp-02-05 matches Gemini-2.0-Flash-Thinking-Exp-01-21 for joint first: https://x.com/lmarena_ai/status/1887180371219132898/photo/1

  10. 2025-02-06: Both still joint first: https://x.com/lmarena_ai/status/1887595195333812691/photo/1

  11. 2025-02-18: Both overtaken by Grok-3 for the first time: https://x.com/lmarena_ai/status/1891706264800936307/photo/1

You could also do this (the error-proof way) based on the Git commits, but it's effort.

@LCBOB Please resolve Jan 1st and Feb 1st to YES.