Will there be an LLM (as good as GPT-4) that was trained with 1/100th the energy consumed to train GPT-4, by 2026?
➕
Plus
42
Ṁ25k
resolved Aug 6
Resolved
YES

The total power consumption could be estimated to be around 50-60 million kWh for training GPT-4.

1/10th of this energy = 5-6 million kWh

1/100th of this energy = 0.5-0.6 million kWh

See calculations below:

Related

Get Ṁ1,000 play money

🏅 Top traders

#NameTotal profit
1Ṁ1,931
2Ṁ100
3Ṁ96
4Ṁ92
5Ṁ45
Sort by:

@mods Resolves as YES (creator deleted). OpenAI just introduced gpt-oss-20b, a model that it trained with far less than 1/100th of the energy used to train GPT-4, while being far superior to GPT-4 (see https://cdn.openai.com/pdf/419b6906-9da6-406c-a19d-1bb078ac7637/oai_gpt-oss_model_card.pdf). Details:

Training gpt-oss-20b required about 0,21 million H100-hours (page 5 of the model card, see screenshot below).

The market's creator used 8way systems as a baseline, meaning training required 26,250 DGX H800-hours running at a maximum of 10.2kw (https://viperatech.com/product/nvidia-dgx-h800-640gb-sxm5-2tb). The total power consumed by the servers was thus 0.26775 million kWh. Multiplication with the creator's PUE of 1.18 yields a total of 0.315945 million kWh, which is far less than 1% of the 57,525 kWh baseline established by the market creator.


gpt-oss-20b destroys GPT-4 in a direct comparison. It's not even close. GPT-4 still occasionally struggled with primary school math. gpt-oss-20b aces competition math and programming, while performing at PHD level on GPQA (page 10 of the model card, see screenshot below):

@ChaosIsALadder Was such a model just released?

@MaxE Yes, see comment above.

In the title, and in line f. and in line i. you mean "energy" where you have written "power".

If such a model is trained on synthetic data generated with a precursor model, does this take into account the energy used to train the precursor + run inference on it to produce the synthetic data?