Resolves at the end of 2024 at whether GPT-4o is a GPT-4 finetune or a completed new base model if this is known (through eg public statement or very convincing forensics), and unknown otherwise.
None of the above ๐
I don't seem to be able to add an answer, but I'm guessing that the referenced dramatic decrease in inference costs is the result of a model being distilled.
It may not be a straightforward distillation of some base GPT-4 model, they may have done fancier things to the model in order to improve the resulting amount of useful generalizable patterns in the distilled model. Though it seems obvious that a distillation step is among the main techniques involved that would make GPT-4o importantly distinct from GPT-4.
https://youtu.be/fMtbrKhXMWc?t=6m&si=PldsUGtz4P_KjtXG
(Sam Altman talking about GPT-4o being way cheaper to run)
https://en.wikipedia.org/wiki/Knowledge_distillation
(What I mean roughly by "distillation" - using the embeddings output by a larger model to train a smaller one - though in my mind this is more of a general category of compressing the trained shape of a larger model into a smaller one)