Will an AI get bronze or silver on any International Math Olympiad by end of 2025?
➕
Plus
193
Ṁ67k
2026
78%
chance

Related market: https://manifold.markets/Austin/will-an-ai-get-gold-on-any-internat

Nov 14, 8:20am: Will an AI get bronze or silver on any International Math Olympiad by 2025? → Will an AI get bronze or silver on any International Math Olympiad by end of 2025?

Get Ṁ1,000 play money
Sort by:

@Forrest Did an AI get bronze or silver on any International Math Olympiad by end of 2025?

This is based on another market, which is based on a bet between Christiano and Yudkowsky, so if either of them mention an opinion on what should count, I'll probably defer to it.

As far as I'm aware, there are two issues which might be considered to disqualify the recent attempt: the use of manual translations to Lean, and the fact that it exceeded the time limit by roughly a factor of 15.

It is my current opinion that the 'translations' issue does not disqualify the attempt for the purposes of this market, but that the 'time limit' issue does, and thus that the criteria for YES have not yet been met.

However, I have not looked into the topic in depth yet, and cannot promise that I won't change my mind as a result of more research (nor will I promise to do said research in a timely manner - sorry)

sold Ṁ937 YES

Would be a good idea to have all of that in the description, seeing as this question as of today resolves YES. Thank you.

If there's been news today specifically, what I said above may be out of date.

I feel that having this in the comments is probably good enough, but will admit that if I were a better market creator I'd've given an opinion sooner.

there's not been news today, the official is just doing word gymnastics

I really don't think this qualifies. The "translations" to Lean do some pretty substantial work on behalf of the model. For example, in the theorem for problem 6, the Lean translation that the model is asked to prove includes an answer that was not given in the original IMO problem.

theorem imo_2024_p6 (IsAquaesulian : (ℚ → ℚ) → Prop) (IsAquaesulian_def : ∀ f, IsAquaesulian f ↔ ∀ x y, f (x + f y) = f x + y ∨ f (f x + y) = x + f y) : IsLeast {(c : ℤ) | ∀ f, IsAquaesulian f → {(f r + f (-r)) | (r : ℚ)}.Finite ∧ {(f r + f (-r)) | (r : ℚ)}.ncard ≤ c} 2

The model is supposed to prove that "there exists an integer c such that for any aquaesulian function f there are at most c different rational numbers of the form f(r)+f(−r) for some rational number r, and find the smallest possible value of c".

The original IMO problem does not include that the smallest possible value of c is 2, but the theorem that AlphaProof was given to solve has the number 2 right there in the theorem statement. Part of the problem is to figure out what 2 is.

Link: https://storage.googleapis.com/deepmind-media/DeepMind.com/Blog/imo-2024-solutions/P6/index.html

Thanks for pointing this out. I'm surprised that that I haven't seen anyone else say this. I agree with you that the question should not be resolved YES (yet).

They said they bruteforced the answer for the algebric questions, and then let the model prove it. So this does count. It's just multiple different components.

C.f. Gowers comment here, my understanding is that the formalizer network arrived at this additional information, and it was not provided by humans? https://x.com/wtgowers/status/1816840103349662114

This makes no sense to me. It basically works because the answers are guessable? Still highly skeptical. At the very least this limitation seems to preclude anything that doesn’t involve unconscionably vast compute resources, and solving any problem that does not have a cutesy toy answer that can be naively guessed (All even numbers/c=2, etc.). Certainly this should make others substantially more skeptical that this performance wasn’t just a fluke of the competition having so many answers of this type.

AFAICT It makes many guesses and disproves some proves others. It's a mixture of deduction and abduction, some version of this is done by people as well.

Actually I guess it was auto formalization at train time and human formalization at test time?

Seems like a rather critical detail.

The autoformalization step seems largely trivial. I’m guessing it has a small chance to be slightly off, and would disqualify the whole question. Since the IMO is a once per year attempt, they likely just didn’t want to take any chances.

Does it count if manual translation to formal language is required (as with DeepMind's recent result)? I would think it doesn't count...

Where does it say the translation of the IMO problems was manual? I see they mention a specially trained language model that translates English problems to formal statements, I assumed this was used for the IMO problems as well.

Where does it say the translation of the IMO problems was manual?

It says so in the blog post announcement. ctrl+f for "manual"

Does it count if manual translation to formal language is required

One of the people who formulated the bet in the linked market have said that this is fine.

how can anyone say llms are "really" intelligent when it takes them 3 days to get a silver medal performance on the IMO? it's obvious this will never be possible to be done under the 4.5 hours time limit of the competition

edit: /s (I thought it was obvious)

This wasn't solved by an LLM. And given computing advances, I see no reason to believe it is "never possible" to get under 4.5 hours.

94% seems about reasonable given that they have a year to make improvements and rent out more compute capacity if they want.

Pretty sure colorednoise is trolling.

I think it was solved by an algorithm using an LLM, the post says it used a "pretrained language model"

Not sure that he's trolling - seems more like seething.

bought Ṁ150 NO

It didn't,

"Our systems solved one problem within minutes and took up to three days to solve the others."

To get the medal you have to solve them in 4.5 hours :)

https://deepmind.google/discover/blog/ai-solves-imo-problems-at-silver-medal-level/#:~:text=Our%20systems%20solved%20one%20problem%20within%20minutes%20and%20took%20up%20to%20three%20days%20to%20solve%20the%20others.

They have Geometry solver that takes 19seconds :) and if IMO 2025 has 2 geometry problems then AI is almost guaranteed to get Bronze medal

I wouldn't say so, the IMO bronze cutoff is often above 14. But there's a market if you want to bet on that

/FlorisvanDoorn/what-years-will-alphageometry-will-0e2b5f902931

That just happened to be this year. In many years the cutoff is at 14-16