Will AI be passable at answering Magic: The Gathering rules questions before 2030?
Will AI be passable at answering Magic: The Gathering rules questions before 2030?
➕
Plus
42
Ṁ6398
2030
88%
chance

Asking GPT-3 MTG rules questions returns some rather nonsensical answers. For example:

This answer makes no sense, and those cited rules don't even exist.

This was from a prompt where I supplied it with a list of other rules questions and correct answers to them, so it does "know" that it's supposed to be answering coherently and correctly. I can also tell from other experimentation that card text and the Magic Comprehensive Rules document were a part of GPT-3's training data. GPT-3 is clearly not powerful enough to properly understand such a complicated technical system.

This market resolves to YES if, by the beginning of 2030, I have access to a system that can give me correct answers and explanations to Magic rules questions in natural English text. Specifically:

I will supply it with 20 completely random unreleased questions from RulesGuru. (Plus card text if necessary.) Over those 20 questions, it must have at least a 90% success rate on giving the right answer, and at least a 50% success rate on providing an explaination that clearly and correctly explains why it works that way. A correct explanation can leave out a small detail here or there, but it must be good enough to help a human understand the material, and avoid anything blatantly wrong like referencing parts of the rules that are irrelevant or don't exist.

For a harder version of this question, see Will AI be superhuman at MTG rules by the end of 2030?79%

  • Update 2025-02-21 (PST) (AI summary of creator comment): New Resolution Criteria:

    • The resolution criteria have been updated to be stricter than those originally described.

    • The detailed, updated criteria can be found at the linked page and replace the previous criteria.

Get Ṁ1,000 play money


Sort by:
1mo

The resolution criteria for this market are pretty lax, so I've made a stricter one here:

1mo

Doesn't seem to be getting better...

@IsaacKing https://chatgpt.com/share/67b8ead7-33c0-8012-a542-ddc300e3233c
I tried 5 questions from RulesGuru.
Can you confirm the evaluation?
1. -
2. Answer +, Explanation -
3. Answer +, Explanation -
4. -
5. Answer +, Explanation +

1mo
  1. Completely wrong answer, doesn't even understand the question. It claims that the Emrakul trigger is controlled by Noelle, when the question states it was Aiden who discarded it, thus Aiden controls the trigger. Opening the reasoning trace shows multiple other nonsensical statements, like that Noelle cast Emrakul during Aiden's cleanup step, when the question clearly states that happened last turn. The reasoning trace also directly contradicts the final answer in multiple places, such as stating in the last sentence that Noelle's trigger resolves second, when the final answer states it resolves first.

Seems like it would be a waste of my time to go through the other 4, ChatGPT is obviously not remotely capable of handling even intro level questions. (Which this one is; the average experienced Magic player even with no judge training at all would be able to answer it correctly.)

1mo

(Or am I misunderstanding your question?)

1mo

@MikhailDoroshenko Ok hold on I think I get what you're asking now. The "-"s in your list indicates a wrong answer overall, and you want me to confirm that it got the other three correct, with a wrong explanation for 2 & 3, and a correct one for 5.

Sorry, you had said below that you think this is already solved, so I thought you were presenting this conversation as ChatGPT succeeding at the desired level.

Let me see...

1mo
  1. Answer wrong.

  2. Answer correct, explanation correct.

  3. Answer correct, explanation a little iffy but correct.

  4. Answer wrong, but this is likely because you provided incorrect card text.

  5. Answer correct, explanation correct.

That's not bad, but all of these questions are on the easier end, I think it got lucky.

@IsaacKing Yes, sorry, I should have specified that better. I had a hope that it will succeed, but after I run some tests, I realized that it is not at the necessary level yet, and also realized that it is hard for me to tell when the explanation is correct. Just wanted your confirmation to see how far the models from the specified bar.

bought Ṁ10 YES5mo

2030 is a long way away, this should be higher

@DavidOman I think it is already solved, but I don't know when it will be resolved.

1y

The current state of the art: https://nissa.planeswalkercompanion.com/

1y

Just do the fun ones rather than the site. Humility + Opal, Season + Arbiter, Volrath's Shapeshifter being - well - the card that it is, Panglacial Wurm being the card that it is, whether the Gitrog interaction counts as slowplay (since it's technically not a loop per Toby Elliot's fantastic horsemyths post), etc etc.

2y

I'm not that into predictions that look so far future, but I studied both AI and MTG somewhat, and IMO, this problem's too complex and unnecessary for someone to want to massage into being an accurate rules query engine. (We already have an excellent rules engine, of course, in MTG Arena, but doing both queries and responses in plain text is quite a feat.)

predicts NO 2y

@TylerColeman Arena's rules engine is much simpler than the full Magic rules engine, since it's restricted to only recent cards that have been designed to work in Arena. And even then it's not perfect. For example, finding a legal declaration of blockers is NP-hard, so my understanding is that Arena uses a heuristic algorithm that may not always return perfect results. And there are other bugs here and there.

I plan to spend a few days trying to fine-tune GPT-X to answer rules questions, or train a much smaller dedicated model to do so.

2y

I just realized a problem with this resolution process, which is that the AI system may have access to the internet and simply be able to read the answers off RulesGuru.

If there are no objections, I will change the process to use 20 unreleased questions on RulesGuru instead. (With their wording fixed up so it's clear what's being asked.)

predicts YES 2y

@IsaacKing You could also change the names of the players I think, although that wouldn't slow down a reasonably general intelligence.

predicts YES 2y

@IsaacKing A bigger problem might be that RulesGuru might not exist in 2030.

predicts NO 2y

You could also change the names of the players I think, although that wouldn't slow down a reasonably general intelligence.

Yeah, even GPT-3 can already do that.

A bigger problem might be that RulesGuru might not exist in 2030.

It's my site, so as long as I'm still around and I haven't lost all the files and their multiple backups in some catastrophic accident, it'll be available. (May not still be online, but I'll have the files somewhere.)

2y

Did the above example include any prompt engineering to let the engine know that it's supposed to be impersonating someone who knows something about MTG rather than the most likely idiot on the internet?

predicts NO 2y

Yes, I included several examples of questions answered correctly. In the past I've also tried different prompts, none of which worked significantly better.

Feel free to try it out yourself. Even if you don't know anything about Magic, you can grab questions from RulesGuru and check any rule citations that GPT-3 provides against the rules document here. Even ignoring whether the rest of the answer makes sense, if you can get GPT-3 to cite only rules that exist, that would be a marked improvement. :)

predicts YES 2y

@IsaacKing I'm sorry I see that the answer to my question was in the description.

What is this?

What is Manifold?
Manifold is a social prediction market with real-time odds on wide ranging news such as politics, tech, sports and more!
Participate for free in sweepstakes markets to win sweepcash which can be withdrawn for real money!
Are our predictions accurate?
Yes! Manifold is very well calibrated, with forecasts on average within 4 percentage points of the true probability. Our probabilities are created by users buying and selling shares of a market.
In the 2022 US midterm elections, we outperformed all other prediction market platforms and were in line with FiveThirtyEight’s performance. Many people who don't like trading still use Manifold to get reliable news.
Why should I trade?
Trading contributes to accurate answers of important, real-world questions and helps you stay more accountable as you make predictions.
Trade with S Sweepcash (𝕊) for a chance to win withdrawable cash prizes.
Get started for free! No credit card required.
What are sweepstakes markets?
There are two types of markets on Manifold: play money and sweepstakes.
By default all markets are play money and use mana. These markets allow you to win more mana but do not award any prizes which can be cashed out.
Selected markets will have a sweepstakes toggle. These require sweepcash to participate and allow winners to withdraw any sweepcash won to real money.
As play money and sweepstakes markets are independent of each other, they may have different odds even though they share the same question and comments.
Learn more.