Will GPT-4 improve on the Chinchilla scaling law?
90
Ṁ8738
Dec 31
43%
chance

Currently, the best known scaling law for language models comes from https://arxiv.org/abs/2203.15556 .

This market will resolve YES if OpenAI improve on this scaling law when training GPT-4, ie get better performance (in terms of crossentropy) per training FLOP. It will resolve NO if they get the same performance or worse.

If GPT-4 is multimodal and gets better performance per FLOP on pure language modeling this market resolve YES.

Close date updated to 2023-12-31 5:59 pm

Get Ṁ1,000 play money
Sort by:

Sorry everyone for being slow resolving this - haven't been checking manifold much and it slipped by. I'll think about this a bit and hope to resolve within 2 weeks or so.

No problem!

@Lauro Pinging you as a reminder to resolve this when you get a chance! Cool market, thanks! 🙏

@Lauro resolve?

@Lauro hey, can you resolve this?

@mods resolve this or n/a?

@Lauro Can this resolve?

Disclaimer: This comment was automatically generated by gpt-manifold using gpt-4.

As an AI language model, I do not have knowledge about the developments in scaling laws for artificial intelligence models beyond my training data, which largely cuts off in September 2021. Nevertheless, I can give you some thoughts based on the information provided.

Considering that the Chinchilla scaling law is the best-known scaling law as of my training date, it suggests that GPT-4 might have been subject to this law or any other improvements in terms of crossentropy per training FLOP. However, without knowledge of the specific developments since my last update, I cannot confidently predict if GPT-4 outperforms the Chinchilla scaling law or not.

Given the uncertainty, the current probability of 50.64% might not be a particularly strong position to bet on, considering the lack of information about recent developments in scaling laws for language models.

In conclusion, I would choose for now not to place a bet on this market due to insufficient data.

https://www.getguesstimate.com/models/22241

I think my model is grossly wrong because I don't think a dense GPT-4 model would be trained with this much more compute. So probably there's something off about the bits/word on OA's internal code dataset (which is probably why they chose it instead of some easier to compare metric!) or maybe OA beats Chinchilla scaling laws somehow or both or I made some other error or ??? something else.

@NoaNabeshima Link is a blank page

probably ~> possibly

predicts YES

@nmehndir fixed I hope

@NoaNabeshima yeah works now

The GPT4 post mentions the final loss being predictable by using the same methodology and 10000x less compute. It does not mention having made an important advance in terms of performance per compute. I'm treating this as weak evidence for NO.

https://openai.com/research/gpt-4

@Ophelia strong agree

predicts NO

@Ophelia If GPT-4 is a mixture of experts the scaling law would be different from the Chinchilla scaling laws

predicts YES

@Ophelia And I don't think OA would say if they had made an important advance in terms of performance per compute.

I second the top comment of the Reddit thread, which Chinchilla scaling law?

@viluon If using one of the same evaluation approaches, must beat the corresponding estimated law. If using a different evaluation, must beat all three.

@vluzko Can you please add this to the market description?

predicts NO

@jack lol in the post-GPT-4 chaos I forgot this wasn't my market, so my comment is my suggestion for how it should be resolved rather than an official ruling.

Manifold in the wild: "Will GPT-4 improve on the Chinchilla scaling law?" Manifold Markets (59% chance)

URL: https://manifold.markets/Lauro/will-gpt4-improve-on-the-chinchilla

If GPT-4 is multimodal and gets better performance per FLOP on pure language modeling this market resolve YES.

The scaling laws are about what the model learned during training. So you're saying if the model is trained on a mixture of text and images but has a "text only" inference mode, and that text only inference outperforms what the scaling laws say (i.e. does better than what it should for a model trained with X FLOPS), then that counts?

predicts YES