Will scaling transformers lead to a 60% score on ARC-AGI-2?
18
Ṁ675
2030
66%
chance

Will any plain transformer model achieve 60% or more on ARC-AGI-2 by 2030?

The inference cost to achieve this result does not matter.

The model that achieves this result must use the same "transformer recipe" common between 2023-2025: techniques like RLHF/RLAIF/CoT/RAG/vision encoders are allowed, but any specialized components must also be made of vanilla transformer blocks; Any new inductive biases, such as tree-search, neurosymbolic logic, etc. would not qualify.

The result must be verified by at least one reputable, unaffiliated org (ARC, Epoch, OpenAI Evals, academic lab, etc.) or a publicly re-runnable result (notebook on Kaggle, etc.).

Resolution uses the ARC-AGI-2 evaluation set and scoring script as published on arcprize.org on the day this market opens. Later revisions are ignored.

  • Update 2025-10-12 (PST) (AI summary of creator comment): PPO, GRPO, and RLVR are allowed training methods.

Generating synthetic data using other models to train the transformer is allowed, as long as the final model follows the common transformer recipe.

Get Ṁ1,000 play money
Sort by:

@lumi is PPO or GRPO, or ig RLVR, allowed?

also is generating synthetic data to train this model on (using other models) but training with the common transformer recipe allowed?

@Bayesian both are allowed

bought Ṁ110 YES from 48% to 81%

@CraigDemel wanna bet more on this around market price? I can do a lot more volume

or anyone else; ping me

@Bayesian good for now, thanks!

Honestly I don't believe 60% or more on ARC-AGI-2 is truly AGI in any meaningful sense:

Humans can score 100%, not 60.

It's a single benchmark that doesn't really test the full breadth of capabilities. It's definitely possible to have a system that's good at this benchmark while being useless in other tasks.

I propose renaming the question.