An AI model with 100 trillion parameters exists by the end of 2025?

Plus

Ṁ8857

Dec 25

chance

ALL

#AI

#Technical AI Timelines

#AI Safety

#Artificial Intelligence

Get Ṁ1,000 play money

18 Comments

Sort by:

bought Ṁ200 NO

After underwhelming 4.5 release, I don't think anyone is going to train another oom larger for a while. They'll focus on inference-time scaling etc.

For reference, the current record (as far as publicly known) is M6-10T is 10T

This is way too ambiguous. If I merge 1400 llama 2 together, and make it a MoE 1400 choose 2, then continual train for a little bit. Would that count?

@Sss19971997 yes, if it reaches 100 trillion (I don't remember llama2 param count, and yes 1400 of them would be a shit ton)

bought Ṁ100 NO

Gpt4 was 1.8 trillion. Why does the market think 100 trillion is feasible within the next year and a half?

bought Ṁ50 YES

@firstuserhere (1) Burny — Effective Omni op X: 'GPT-4 was trained on 25k A100s over 90 days, but now you can do it with only 2k GPUs over 90 days and with similar amount of B100 units you could train GPT-4 in a week Or 20k in 9 days Or 200k in one day' / X (twitter.com )

@WieDan yes but b100s are presumably a lot more expensive too, and companies will take a fair bit of time to set up their clusters, especially if they recently set up h100s and then the model training for a 100tril param model is a lot of time too, don't think it'll happen

1.7 year is cutting it close they might miss it by 6 months but it's coming soonish regardless
We'll see how my prediction fares in 2025

@firstuserhere GPT-1 was 117M, GPT2 was 1.5B, GPT3 was 175B (the trend with the old scaling law)

GPT4 was 1.8T with a MOE setup.

So historically param count has 10x'd per generation.

https://arxiv.org/pdf/2202.01169.pdf

I'm not looking closely at this paper rn and this predates Chinchilla maybe conceptually but it vaguely seems like performance boosts from experts saturate past GPT-4 levels although I'm not sure if this applies to inference cost/speed.

@firstuserhere You never said it had to be any good. Making a bad model with 100T parameters ought to be rather easy, as long as you have the space to store them (I do not, however)

bought Ṁ20 NO

but, I'm gonna bet NO because it's big enough that I think nobody is going to bother doing it as a joke (e.g. a 100T param MNIST classifier...), and I think it's unlikely it's going to make enough business sense for someone to do it for real.

@retr0id exactly. 100T might not be heaps, but it's enough to not bother with unless you think you're going to achieve something.

@firstuserhere The model exists before it's done training. It exists as soon as the parameters are initialized.

37%??

bought Ṁ100 NO from 37% to 29%

@AdrianBotez good point.

What would your policy be if #params is not officially released, such as the case with GPT-4?

@Supermaxman Resolves to the best estimates possible. I'll take a poll of AI researchers at top 3/5 AI labs in that case.

bought Ṁ10 NO from 30% to 28%

Related questions

Related questions