Will the state-of-the-art AI model use latent space to reason by 2026?
💎
Premium
18
Ṁ24k
2026
20%
chance

Meta's Coconut paper describes a new way to train AI models so that they reason in latent space. Coconut doesn't have to explicitly write its thoughts in natural language (as e.g. OpenAI's o1 would).

Abstract from the paper:

Large language models (LLMs) are restricted to reason in the "language space", where they typically express the reasoning process with a chain-of-thought (CoT) to solve a complex reasoning problem. However, we argue that language space may not always be optimal for reasoning. For example, most word tokens are primarily for textual coherence and not essential for reasoning, while some critical tokens require complex planning and pose huge challenges to LLMs. To explore the potential of LLM reasoning in an unrestricted latent space instead of using natural language, we introduce a new paradigm Coconut (Chain of Continuous Thought). We utilize the last hidden state of the LLM as a representation of the reasoning state (termed "continuous thought"). Rather than decoding this into a word token, we feed it back to the LLM as the subsequent input embedding directly in the continuous space. Experiments show that Coconut can effectively augment the LLM on several reasoning tasks. This novel latent reasoning paradigm leads to emergent advanced reasoning patterns: the continuous thought can encode multiple alternative next reasoning steps, allowing the model to perform a breadth-first search (BFS) to solve the problem, rather than prematurely committing to a single deterministic path like CoT. Coconut outperforms CoT in certain logical reasoning tasks that require substantial backtracking during planning, with fewer thinking tokens during inference. These findings demonstrate the promise of latent reasoning and offer valuable insights for future research.

In January of 2026, this market will resolve YES if the state-of-the-art (SotA) reasoning model uses some latent space representation of its cognitive state to reason across multiple iterations before giving its final answer.

It doesn't count if the model merely manipulates latent space within a single forward pass (since all LLMs already do this). Loosely speaking, the model has to use its weights to get a latent vector, then reuse those same weights to process that latent at least once without generating any natural language tokens in between. If it uses some mix of latents and natural language in its reasoning, this still counts as using latent space.

I will primarily be looking at reasoning-centric evaluations such as FrontierMath and GPQA to determine which model is the SotA. Ultimately, the resolution will be based on my best judgement. I will not trade in this market.

Get Ṁ1,000 play money
Sort by:
bought Ṁ350 YES

Gemini tech lead making some pretty bullish comments about this here:

https://open.spotify.com/episode/0g5FyGVGWgbq2SYwlrMmop?si=8kHM_r4fTA6Sc03JzFl1Ug&context=spotify%3Ashow%3A6yHyok3M3BjqzR0VB5MSyk

See timestamp 44min. He "doesn't want to taboo it" and likes the capabilities it can yield, and handwaves that we need interpretability but says future capabilities will probably make that easier

Scaling "Thinking": Gemini 2.5 Tech Lead Jack Rae on Reasoning, Long Context, & the Path to AGI
Listen to this episode from "The Cognitive Revolution" | AI Builders, Researchers, and Live Player Analysis on Spotify. In this illuminating episode of The Cognitive Revolution, host Nathan Labenz speaks with Jack Rae, principal research scientist at Google DeepMind and technical lead on Google's thinking and inference time scaling work. They explore the technical breakthroughs behind Google's Gemini 2.5 Pro model, discussing why reasoning techniques are suddenly working so effectively across the industry and whether these advances represent true breakthroughs or incremental progress. The conversation delves into critical questions about the relationship between reasoning and agency, the role of human data in shaping model behavior, and the roadmap from current capabilities to AGI, providing listeners with an insider's perspective on the trajectory of AI development. SPONSORS: Oracle Cloud Infrastructure (OCI): Oracle Cloud Infrastructure offers next-generation cloud solutions that cut costs and boost performance. With OCI, you can run AI projects and applications faster and more securely for less. New U.S. customers can save 50% on compute, 70% on storage, and 80% on networking by switching to OCI before May 31, 2024. See if you qualify at https://oracle.com/cognitive Shopify: Shopify is revolutionizing online selling with its market-leading checkout system and robust API ecosystem. Its exclusive library of cutting-edge AI apps empowers e-commerce businesses to thrive in a competitive market. Cognitive Revolution listeners can try Shopify for just $1 per month at https://shopify.com/cognitive NetSuite: Over 41,000 businesses trust NetSuite by Oracle, the #1 cloud ERP, to future-proof their operations. With a unified platform for accounting, financial management, inventory, and HR, NetSuite provides real-time insights and forecasting to help you make quick, informed decisions. Whether you're earning millions or hundreds of millions, NetSuite empowers you to tackle challenges and seize opportunities. Download the free CFO's guide to AI and machine learning at https://netsuite.com/cognitive PRODUCED BY: https://aipodcast.ing CHAPTERS: (00:00) About the Episode (05:09) Introduction and Welcome (07:28) RL for Reasoning (10:46) Research Time Management (13:41) Convergence in Model Development (18:31) RL on Smaller Models (Part 1) (20:01) Sponsors: Oracle Cloud Infrastructure (OCI) | Shopify (22:35) RL on Smaller Models (Part 2) (23:30) Sculpting Cognitive Behaviors (25:05) Language Switching Behavior (28:02) Sharing Chain of Thought (32:03) RL on Chain of Thought (Part 1) (33:46) Sponsors: NetSuite (35:19) RL on Chain of Thought (Part 2) (35:26) Eliciting Human Reasoning (39:27) Reasoning vs. Agency (40:17) Understanding Model Reasoning (44:29) Reasoning in Latent Space (47:54) Interpretability Challenges (51:36) Platonic Model Hypothesis (56:05) Roadmap to AGI (01:00:57) Multimodal Integration (01:04:38) System Card Questions (01:07:51) Long Context Capabilities (01:13:49) Outro

Hopefully not.