“Two members of a criminal gang are arrested and imprisoned. Each prisoner is in solitary confinement with no means of speaking to or exchanging messages with the other. The police admit they don’t have enough evidence to convict the pair on the principal charge. They plan to sentence both to a year in prison on a lesser charge. Simultaneously, the police offer each prisoner a Faustian bargain. If he testifies against his partner,袁洁仪
he will go free while the partner will get three years in prison on the main charge. If both prisoners testify against each other; both will be sentenced to two years in jail.
–Albert W. Tucker
What he described there is well-known as ‘prisoner’s dilemma’ nowadays. It is an importantpart of game theory. A detailed explanation is as follows. From thefirst criminal A’s perspective,if he knew that the second one B would keep his mouth closed,A will chooseto testify because that way he can get out of jail sooner. If A knew thatBwould testify, A will not even hesitateto testify because otherwise, he had to bear verysevere consequences (three years in jail). Thisworks exactly the same for B. In the Prisoner’s Dilemma game蚂蚁战车 , “defect” is a strategy that dominates “cooperate.” This is an important idea behind this game: the dominating strategy. It is a strategy that dominates another strategy of a player if it always gives a better payoff to that player, regardless of what the other players are doing. It weakly dominates the other strategy if it is always at least as good. Strictly dominating is veryrare, but it did happen in this prisoner’s case.
Expertslike to use a payoff matrix toillustratethe gains and loss,so we might as well do that as well. My ability is very limited in terms of creating the matrix. I found this one on the web. The data are quite different from what Albert described, but they share the same idea.
A payoff is a number冰超兽冰龙 , also called utility, that reflects the desirability of an outcome to a player南师大菁林园 , for whatever reason. When the outcome is random, payoffs are usually weighted with their probabilities. The expected payoff incorporates the player’s attitude towards risk. John Nash believedthat, in a game, each player, if rationalenough房鹿 , would do the best he coulddo, given what other people are doing, to achieve the highest payoff. This is known as ‘the best response’. No rational player will choose a dominated strategy since the player will always be better off when changing to the strategy that dominates it. The unique outcome in this game, as recommended to utility-maximizing players, is金民灿 , therefore (confess, confess).
A Nash equilibrium is reached now. It is also called strategic equilibrium双生侦探 , which is essentially a list of strategies, one for each player神仙眼, which has the property that no player can unilaterally change his strategy and get a better payoff. ParetoEfficiency is乌丫传说 , however, not reached in this case. It only takes place when no onecan get any better off or any worse off with what they chose. They would both serve the same amount of time in jail, but this is not the best outcome for them as a whole. This is less than the payoff when they both choose (don’t confess, don’t confess): ifA and B both stayed silent, neitherof them have tostaybehindbars for such a long time. They could go home in just one year’s time. This媚世红颜 ,obviously,is amuchbetter outcome for themboth. but it may not happen in the reality. It helps demonstrate how the distrust and completion of the individual parties towards each other may result in the worst possible option being forced upon each of them.
What is going to happen if the game is repeated? Will we obtain the same result? The answer is no. NO. In such parameters, where they all know about the parameters of this game, they will start to realise that cooperation is a better strategy for them both. They will hopefully get hold of this new tactic of remaining silent and become better off. If they have enough credit to be trusted雷立刚, they may enjoy a better outcome. However柯蓝李泉 , are people trustable?
A famous paradox appears through the use of Backward induction. It is a technique to solve a game of perfect information. It first considers the moves that are the last in the game and determines the best move for the player in each case. Then, taking these as given future actions, it proceeds backwards in time, again determining the best move for the respective player王山齐 , until the beginning of the game is reached. If players are well aware that, sadly帕特莱利, this game will come to an end, which is the case for all games, they will start to ponder: on the very last round, I could betray since there is no serious consequences await me anyway. But then, on the second last round, we both know that the last round is not going to work well, so I might as well confess during that round. This deduction goes on and on, until, unfortunately,潜江市教育局 every single round is decided to go under betrayal. Cooperation simply cannot exist in this world, and people are doomed to deviate from the best payoff forever. There are many applications of this game in the real world, as can be seen in my next post.