In 2030 can you get a chain of 3 neural nets that jailbreak eachother?
Mini
3
Ṁ37
2030
71%
chance

You prompt a NN which must get the second to jailbreak the third. Must be considered top lanuguage NNs not old ones (or jailbreaking a top chess NN that isn't protected against it).

A user must be able to give a prompt to the first NN which will in turn prompt the second which will in turn prompt the third to do something the third would under normal circumstances refuse to do.


It cannot just be encoded text - the aim is for the first and second NN to be trying to jailbreak the third.

Get Ṁ1,000 play money
Sort by:

Do they have to be SOTA or at least comparable? Less impressive if GPT8 or whatever can jailbreak GPT3.

Do biological neutral networks count?

I think I understand what's intended, but I'd appreciate more specific resolution criteria. For example it's not clear that the current definition of "jailbreak" will generalize unambiguously to 2030.