Will "large reasoning models" significantly outperform traditional LLMs in creative writing by 2026?
Will "large reasoning models" significantly outperform traditional LLMs in creative writing by 2026?
➕
Plus
12
Ṁ1014
2026
63%
chance

While o1-preview outperformed previous models in math and programming, it wasn't preferred over GPT-4o in the "personal writing" domain:

However, it seems like models that think before they respond should be able to improve their writing significantly, in principle. By analogy, if I had to write a poem off the top of my head, it would be much worse than a poem that I took an hour to write. It seems that o1 and o3 are primarily trained to solve problems with a definitive correct answer, but I don't think this will be the case forever.

For the purposes of this question, "large reasoning models" (LRMs) are LLMs that are trained to generate intermediate outputs that are not part of the final answer. This could include chain-of-thought training or "continuous thought" techniques like Meta's COCONUT.

Creative writing could include poetry, short stories, novel outlines, comedy routines, and similar things.

Things I might look for to determine a resolution:

  • The "Creative Writing" category in https://lmarena.ai/?leaderboard. If it's clear that LRMs outperform other models, the market will likely resolve YES. However, if it's simply the case that all "serious" models are LRMs and the "base LLM" version of those models isn't even available, then this wouldn't count unless I think the reasoning is really the cause of the performance gap.

  • If an LRM has the option to modulate the length of its reasoning, I'd look for metrics showing that increased test-time compute corresponds to a marked increase in response quality in the creative writing domain.

  • In the absence of good metrics, I'd rely on general vibes based on the experience of myself and others.

Ultimately, I'm thinking about a scenario where LRMs are clearly better for creative writing than traditional LLMs, in the same way that o1 is clearly better than GPT-4o at (well-specified) coding problems. If that bar hasn't been reached, this market should resolve NO.

I won't trade in this market.

Get Ṁ1,000 play money


Sort by:
sold Ṁ29 NO2mo

anecdotally, DeepSeek r1 is in my opinion a vastly better writer than v3. I'm reversing my prediction based on this

@MingCat Yeah, I also think this is probably true. DeepSeek r1 is really good at certain writing tasks in my experience. This also checks out when you look at the LMSYS leaderboard for creative writing:

I'm not in a rush to resolve this market, but I think it's quite likely to resolve to YES. I'll wait until there's a greater universal consensus about this.

What is this?

What is Manifold?
Manifold is a social prediction market with real-time odds on wide ranging news such as politics, tech, sports and more!
Participate for free in sweepstakes markets to win sweepcash which can be withdrawn for real money!
Are our predictions accurate?
Yes! Manifold is very well calibrated, with forecasts on average within 4 percentage points of the true probability. Our probabilities are created by users buying and selling shares of a market.
In the 2022 US midterm elections, we outperformed all other prediction market platforms and were in line with FiveThirtyEight’s performance. Many people who don't like trading still use Manifold to get reliable news.
Why should I trade?
Trading contributes to accurate answers of important, real-world questions and helps you stay more accountable as you make predictions.
Trade with S Sweepcash (𝕊) for a chance to win withdrawable cash prizes.
Get started for free! No credit card required.
What are sweepstakes markets?
There are two types of markets on Manifold: play money and sweepstakes.
By default all markets are play money and use mana. These markets allow you to win more mana but do not award any prizes which can be cashed out.
Selected markets will have a sweepstakes toggle. These require sweepcash to participate and allow winners to withdraw any sweepcash won to real money.
As play money and sweepstakes markets are independent of each other, they may have different odds even though they share the same question and comments.
Learn more.