An AI model with 100 trillion parameters exists by the end of 2025?
โž•
Plus
73
แน€5591
Dec 25
20%
chance

Get แน€1,000 play money
Sort by:
bought แน€200 NO

After underwhelming 4.5 release, I don't think anyone is going to train another oom larger for a while. They'll focus on inference-time scaling etc.

For reference, the current record (as far as publicly known) is M6-10T is 10T

This is way too ambiguous. If I merge 1400 llama 2 together, and make it a MoE 1400 choose 2, then continual train for a little bit. Would that count?

@Sss19971997 yes, if it reaches 100 trillion (I don't remember llama2 param count, and yes 1400 of them would be a shit ton)

bought แน€100 NO

Gpt4 was 1.8 trillion. Why does the market think 100 trillion is feasible within the next year and a half?

@WieDan yes but b100s are presumably a lot more expensive too, and companies will take a fair bit of time to set up their clusters, especially if they recently set up h100s and then the model training for a 100tril param model is a lot of time too, don't think it'll happen

1.7 year is cutting it close they might miss it by 6 months but it's coming soonish regardless
We'll see how my prediction fares in 2025

@firstuserhere GPT-1 was 117M, GPT2 was 1.5B, GPT3 was 175B (the trend with the old scaling law)

GPT4 was 1.8T with a MOE setup.

So historically param count has 10x'd per generation.

https://arxiv.org/pdf/2202.01169.pdf

I'm not looking closely at this paper rn and this predates Chinchilla maybe conceptually but it vaguely seems like performance boosts from experts saturate past GPT-4 levels although I'm not sure if this applies to inference cost/speed.

@firstuserhere You never said it had to be any good. Making a bad model with 100T parameters ought to be rather easy, as long as you have the space to store them (I do not, however)

bought แน€20 NO

but, I'm gonna bet NO because it's big enough that I think nobody is going to bother doing it as a joke (e.g. a 100T param MNIST classifier...), and I think it's unlikely it's going to make enough business sense for someone to do it for real.

@retr0id exactly. 100T might not be heaps, but it's enough to not bother with unless you think you're going to achieve something.

@firstuserhere The model exists before it's done training. It exists as soon as the parameters are initialized.

37%??

bought แน€100 NO from 37% to 29%

@AdrianBotez good point.

What would your policy be if #params is not officially released, such as the case with GPT-4?

@Supermaxman Resolves to the best estimates possible. I'll take a poll of AI researchers at top 3/5 AI labs in that case.

bought แน€10 NO from 30% to 28%