Will it cost less than 100k USD to train and run a language model that outperforms GPT-3 175B on all benchmarks by the end 2024?

Plus

Ṁ6104

Jan 1

85%

chance

ALL

The final model does not have to cost 100k. If a model outperforms GPT-3 before 100k has been spent on training the market resolves yes, even if the model continues to be trained after that point.

Clarification: all benchmarks in the original GPT-3 paper.

#AI

#Technical AI Timelines

Get Ṁ1,000 play money

13 Comments

Sort by:

I think the answer to the question is very likely yes, but I'm not sure if there's a specific model out there that resolves this to YES. Does anyone have suggestions?
Llama2 7B comes out at just about ~100k (184,320 A100 hours, and I'm pretty sure you can get A100s for $0.50/hour if you're buying them and amortizing the cost over their lifespan). Any objections to that estimate?
I also specified in the comments that it had to run on every GPT-3 benchmark (because I didn't want to litigate which benchmarks count, which in hindsight was a poor choice) - does anyone know if those experiments have been run? If not I will attempt to run them myself, but that might take a while.

MPT cost 200k$ and is roughly on par with GPT-3

Is it necessary for the new model to have been tested on all the benchmarks published for the 175B model in the original GPT-3 paper for this to resolve YES?

@meefburger Yes. I will make exceptions for any benchmarks that are/become unavailable, or are otherwise very difficult to access (e.g. very onerous licensing). I may consider making an exception for a model that completely blows GPT-3 out of the water but skips some minor benchmarks. But since the market only resolves yes in the case where the model is quite cheap to use, it seems likely that actually testing it against all the benchmarks will be feasible.

100k nominal or inflation adjusted? If the latter, adjust from what starting point?

@TomCohen Nominal

@vluzko Sweet, thanks for clarifying!

predicts YES

I think it may be already possible. GPT-3 used 3e23 FLOPS. GPT-30B by MosaicML used 3 times less, 1e23 FLOPS and cost 450k$. Flan-T5-XXL used 3 times less, 3.3e22 FLOPS, so naively it should cost around 150k$ but probably <100k$ because Google has access to cheaper hardware.

Does Flan-T5-XXL outperform GPT-3 on all benchmarks? I don't know. This is not even a reasonable definition, you can make a benchmark which will specifically prefer GPT-3 to all current models.

But it is significantly better on MMLU 5-shot (55 vs 44) which is a strong signal that it might actually be generally better.

I would give 95% that a model which is reasonably better than GPT-3 will be trained for <100k$ by 2024 and maybe 85% that this market will be resolved as yes (the model can be not public, it may be hard to estimate cost, it may be hard to say it's clearly better than GPT-3).

@ValeryCherepanov It's specifically all benchmarks in the original GPT-3 paper, not "all benchmarks imaginable"

MosaicML trained GPT-3 quality LLM for 450k about 1 month ago.

Stable diffusion trained for $600k, arguably ~$200k @ aggressive spot pricing.

https://twitter.com/jackclarkSF/status/1563957173062758401

Big labs continue to be terrible at training efficiency (e.g. one paper beat AlphaGo with ~50x less compute from better sampling and architecture), with stability.ai in play ***AND** their open-source approach, someone might pull this off

Cost ~$10m to train in 2020. Costs ~$1m to train 4.5 yrs later (halves per 18mos) Leaves ~10x improvement in approach to tie it, much more to exceed across the board. Note that costs are still $10mm today to train Palm/Megatron/Chinchilla, with no evidence of training (rather than inference) efficiency gains.

Related questions

Related questions