Will a major AI lab claim to use activation steering in its main chat assistant by EOY 2025?
Mini
13
Ṁ517
2026
31%
chance

Also includes methods inspired by activation steering, as long as they don't use any gradient descent step.

Only includes announcements about main chat assistants (e.g. Claude, ChatGPT, Bard, ...) of a major AI lab (OpenAI, Google Deepmind, Anthropic, Meta, Inflection or Mistral).

Does not include to fine-tuning API endpoints.

Get Ṁ1,000 play money
Sort by:

Anthropic is running a demo of an activation-steered Claude obsessed with the Golden Gate Bridge: https://www.reddit.com/r/singularity/comments/1cz7kuh/claude_golden_gate_bridge_is_now_available_bridge/ (Context: https://www.anthropic.com/research/mapping-mind-language-model )