It reminds me of a problem noticed in Iterated Prisoner’s Dilemma. Conventional wisdom says the best thing to do is to cooperate on a tit-for-tat basis – that is, we both keep cooperating, because if we don’t the other person will punish us next turn by defecting.
But it has been pointed out there’s a flaw here. Suppose we are iterating for one hundred games. On Turn 100, you might as well defect, because there’s no way your opponent can punish you later. But that means both sides should always play (D,D) on Turn 100. But since you know on Turn 99 that your opponent must defect next turn, they can’t punish you any worse if you defect now. So both sides should always play (D,D) on turn 99. And so on by induction to everyone defecting the entire game. I don’t know of any good way to solve this problem, although it often doesn’t turn up in the real world because no one knows exactly how many interactions they will have with another person. Which suggests one possible solution to the original problem is for nobody to know the exact number of people.
Edit: 1) This is from a ssc article, I can't remember which, I didn't write the above paragraph, I just copied it when I read it and thought it interesting. 2) I'll award the bounty, I just have to think about the answers a little bit.
Human minds are correlated by having shared brain architectures. If, without studying the literature, I decide to defect, then my opponent is also more likely to defect. If I, without discussing with anyone else, decide to cooperate, my opponent is more likely to cooperate as well.
If you consult math textbooks on game theory, then this argument only applies if you're playing against other math majors with similar education.
Against an "unknown intelligence", the universal prior defines a probability P - conditional on observed responses from all previous rounds - and you should cooperate with probability "fixed point of (adjusting myself to predict with P')", since your opponent should also be consulting the universal prior to estimate the likelihood you cooperate, and you cooperating conditional on them cooperating makes them more likely to cooperate.
This strategy is uncomputable but should be well-defined.
IMHO your suggestion is right (so keep the bounty 😁) - if the number of repetitions is known, it's no longer an iterated game in some important sense. It is rather one game consisting of a hundred (for example) parts. The iteration in the original Iterated Prisoner's Dilemma (with an unknown number of cycles) is a necessary condition for a player's reputation to matter.
This is a really nice paradox !
First :
> Conventional wisdom says the best thing to do is to cooperate on a tit-for-tat basis
This isn’t the best thing to do, it is just a good pragmatic thing to do. The best things to do will depend of what you know. My answer will have three parts, representing the different possible constraints of the problem.
1) First, if there is no constraint on the strategy of the other player
Then you should find a way to assign a credence to all possible strategies, and the best way to play against each strategy is "simple" (and so the way to play against a particular probabilities distribution, at each step).
(the best way to assign credence could be Solomonoff induction or whatever, it seems off topic for this paradox)
With no constraint, then you should play D in the last turn, but not obviously before that, so the paradox break down.
2) Both players are knowingly perfectly rational agents
Then, given there is only one rational decision given the same set of knowledge (we forbid randomness here, and mixed strategy, for simplicity), you know the other player will do exactly what you do, and you should always cooperate, even at the end of the game.
This seems weird, and in fact this situation is similar to Newcomb’s paradox, which I think is correctly solved with the "functional decision theory".
3) Now, the mixed constraint : You don’t know what strategy the other player will use, but there is common knowledge than both will use a dominant strategy.
In this case, I think your reasoning actually work, and both will just defect at each step.
And this is the best thing to do, because the other player, will, by the constraints of the problem, always defect (and you should always defect against someone always defecting).
In fact, this is not only the best thing to do, but only possible thing to do.
This seems weird, it seems we do better by not using a dominant strategy here !
Actually, the problem isn't that we are using a dominant strategy, it is the common knowledge of that, and using any other strategy would make this common knowledge false (and so, will change what the dominant strategies are, and we are back at either 1 or 2).
But that means both sides should always play (D,D) on Turn 100
When analising turn 100 we have identical situation as normal prisoners dilemma. The result you got (D,D) is a loss for both sides. If actors are rational and equal in their cognitive abilities, then they see that continuing that line from 100 down to 1 will give them only losses. Or 1 win and 99 guaranteed losses forward.
So we throw away (D,D) and look at other stable point (cooperate, cooperte). How this can be reached with max probability? By never defecting.
Even cooperting for N turns and then being betrayed 100-N turns is generally better than (D,D) for 100 times. But that depends on exact proportions of "points won/lost" for wins and losses. So you would have to introduce exact numbers to find that N.
But since you know on Turn 99 that your opponent must defect next turn, they can’t punish you any worse if you defect now.
There is no Must. Not all actors are by default commiting to revenge. Some actors will commit to increasing probability of overall max points won (even if it is not them who gets them). Such strategy at least allows with a bigger chance to come back to coopertion sequence.
What's the paradox here? The dominant strategy for an iterated prisoner's dilemma where the number of iterations is known in advance is indeed to defect every round, just like it is for the single-run prisoner's dilemma. It's only for one with infinitely many iterations or an unknown number of iterations that tit-for-tat works in classical decision theory.