On December 31, 2025, will the Hutter prize record be above 8.85?
Mini
3
Ṁ133
2026
67%
chance

The Hutter Prize is a prize run by researcher Marcus Hutter (currently employed by DeepMind) since 2006.

The current record is 8.76. In order to qualify as an improvement, the next successful submission must achieve at least 8.85. So the question is really asking, "Will there be a new winning entry to the Hutter Prize by December 31, 2025?".

Since 2017, there has been about 1 entry every 2 years, and the latest entry is on 16.Jul 2023. So naively drawing a straight line, we should expect that the next entry should arrive by the end of 2025.

Marcus Hutter is most famous for writing a textbook on Universal AI based on his theory of AGI, the AIXI model. Basically, the AIXI model is a predictor that predicts incoming data using algorithmic information theory. Since prediction and compression are equivalent, Marcus Hutter is very invested in progress in compression as a concrete predictor of progress towards AGI.

Since compression of a corpus is equivalent to pushing down the perplexity loss on the corpus, this is related to the use of perplexity loss in language modeling.


That's all the background information I would add. Now I will just quote him on what the prize is about and how it works. More QnA found on his website.

Why the prize?

The contest is motivated by the fact that compression ratios can be regarded as intelligence measures. Wikipedia is an extensive snapshot of Human Knowledge. If you can compress the first 1GB of Wikipedia better than your predecessors, your (de)compressor likely has to be smart(er). The intention of this prize is to encourage development of intelligent compressors/programs as a path to AGI.

How does the prize work?

If your compressor compresses the 1GB [10^9 bytes] file enwik9 x% better than the current record, you'll receive x% of the prize [currently at 500k euros].

Why not use Perplexity?

essentially an exponentiated compression ratio... That is, Perplexity per-se as a performance measure is fine. The problem is that it is usually applied to a test set and the size of the language model itself is ignored. Online compression would be most appropriate for large models.


Get Ṁ1,000 play money