Will an open-source LLM under 10B parameters surpass Claude 3.5 Haiku by EOY 2025?
Will an open-source LLM under 10B parameters surpass Claude 3.5 Haiku by EOY 2025?
➕
Plus
9
Ṁ1898
2026
89%
chance

Resolves YES if a language model is released before January 1st 2026, that:

1. Has freely accessible weights, meaning the general public can download it and run it locally, regardless of additional restrictions.
2. Is explicitly described as having less than 10 billion parameters. (If the actual parameter number is less than 10B and was e.g. rounded up, this doesn't count as being under 10B.)
3. Achieves an Arena Score on http://lmarena.ai/leaderboard greater than the score of Claude 3.5 Haiku 2024-10-22, with both scores measured at the same point in time.

In the event that the way Arena Scores are calculated changes significantly or that specific Haiku model is no longer ranked before EOY2025 (I would be very surprised if this happens), I will try to find a suitable replacement criteria that traders can agree is fair. If no such criteria can be found, this market will N/A.

Get Ṁ1,000 play money


Sort by:
bought Ṁ1,000 YES2mo

I might be missing something. how could this not happen?

2mo

it's mid, it's from months ago, there's over 10 months left to the year, haiku is probably under 100B and under 30B active params, and it's not a reasoning model

2mo

@Bayesian hmm yeah this puzzles me, do you think lmsys is not accurately assessing quality here?

2mo

@MingCat lmsys is not assessing quality very well, at least. and llms at constant size are getting much better over time

@Bayesian Yeah, I'm hopeful too. Looks like Reka Core is 67B parameters. So the question is basically just how quickly we'll see small open-source models scale. We've seen AI development show some pretty weird progress overhangs where the tech theoretically should exist, but no one's properly capitalized on it. (It took a while for DeepSeek to come along, for instance)

bought Ṁ50 YES

@Bayesian looks like Gemma 3 is moving from 9b to 12b, otherwise it probably would have ended up qualifying

What is this?

What is Manifold?
Manifold is a social prediction market with real-time odds on wide ranging news such as politics, tech, sports and more!
Participate for free in sweepstakes markets to win sweepcash which can be withdrawn for real money!
Are our predictions accurate?
Yes! Manifold is very well calibrated, with forecasts on average within 4 percentage points of the true probability. Our probabilities are created by users buying and selling shares of a market.
In the 2022 US midterm elections, we outperformed all other prediction market platforms and were in line with FiveThirtyEight’s performance. Many people who don't like trading still use Manifold to get reliable news.
Why should I trade?
Trading contributes to accurate answers of important, real-world questions and helps you stay more accountable as you make predictions.
Trade with S Sweepcash (𝕊) for a chance to win withdrawable cash prizes.
Get started for free! No credit card required.
What are sweepstakes markets?
There are two types of markets on Manifold: play money and sweepstakes.
By default all markets are play money and use mana. These markets allow you to win more mana but do not award any prizes which can be cashed out.
Selected markets will have a sweepstakes toggle. These require sweepcash to participate and allow winners to withdraw any sweepcash won to real money.
As play money and sweepstakes markets are independent of each other, they may have different odds even though they share the same question and comments.
Learn more.