## Flawed analysis of “one child is a boy” problem?

A mathematical puzzle has reappeared over the last year as the topic of discussion in various blogs and I have not seen any discussion suggesting that the analysis appearing in blogs contains a fundamental flaw.

The problem is as follows: I have two children and at least one of them is a boy; what is the probability that I have two boys? (A variant of this problem specifies whether the boy was born first or last and has a noncontroversial answer).

Most peoples (me included) off-the-top-of-the-head answer is 1/2 (the other child can be a girl or a boy, assuming equal birth probabilities {which is very a good approximation to reality}).

The analysis that I believe to be incorrect goes something like the following: The possible birth orders are `gb`

, `bg`

, `bb`

or `gg`

and based on the information given we can rule out girl/girl, leaving the probability of `bb`

as 1/3. Surprise!

A variant of this puzzle asks for the probability of boy/boy given that we know one of the children is a boy born on a Tuesday. Here the answer is claimed to be 13/27 (brief analysis or using more complicated maths). Even greater surprise!

I think the above analysis is incorrect, which seems to put me in a minority (ok, Wikipedia says the answer could sometimes be 1/2). Perhaps a reader of this blog can tell me where I am going wrong in the following analysis.

Lets call the known boy `B`

, the possible boy `b`

and the possible girls `g`

. The sequence of birth events that can occur are:

`Bg gB bB Bb gg`

There are four sequences that contain at least one boy, two of which contain two boys. So the probability of two boys is 1/2. No surprise.

All of the blog based analysis I have seen treat the ordering of a known boy+girl birth sequence as being significant but do not to treat the ordering of a known boy+boy sequence as significant. This leads them to calculate the incorrect probability of 1/3.

The same analysis can be applied to the “boy born on a Tuesday” problem to get the probability 14/28 (i.e., 1/2).

Those of you who like to code up a problem might like to consider the use of a language like Prolog which I would have thought would be less susceptible to hidden assumptions than a Python solution.

> Lets call the known boy B, the possible boy b and the possible girls g.

> The sequence of birth events that can occur are:

>

> Bg gB bB Bb gg

I think you are trying to challenge your readers to provide a convincing refutation without believing this yourself. But anyway, let me try to put it this way:

First, the trick of enumerating combinations and dividing the number of matching ones by the total only works for equiprobable possibilities, and the way to avoid applying this trick wrongly is to enumerate first and to decide if each combination matches later. The equiprobable combinations are gg, gb, bg, bb. Your five possibilities are not equiprobable, (and they introduce a new unknown, how much preference the all-seeing chooser has for the elder son when he has a choice between two boys).

Regarding the “born on a Tuesday” apparent paradox, it’s quite the same thing in reverse: the day of the week seems arbitrary (and indeed it is), but the important thing is that the day is picked first, and then the question “knowing only that in this family, one child is a boy born on this day of the week, then …?” is asked. The probability in this case differs from the correct 1/3 from the simpler case because families with two boys have more chances to have one of them born on a Tuesday.

Well, it depends if there is “a known child” and what’s the process to get the statement from the father.

If you ask the father “tell me something about a child of you”, the answer is 1/2. If you ask the father “choose one of your children and let me know his/her sex”, the answer is 1/2.

If you ask “at least one of your children is a boy?” the problem gets equivalent to “throw two coins, if boths are tails, throw them again until at least one is heads, what’s the probability that there are two heads?” There is no “known head”, you examine the set.

“At least one of the children is a boy” is not a statement about “a known boy” but about the set of children. When the statement is modified to talk about a definite child, the answer is 1/2.

A modified version of the “tuesday problem” (only to be used on people that has answered 1/3 to the original and the tuesday variation) is “At least one of my children is a boy…. this child I talked about… er, no, nothing” The answer is 1/2 (and there is not even an “irrelevant” datum added, as with the tuesday thing) When the father stops talking about the set of children and starts talking about “that child” the results change from 1/3 to 1/2.

Excuse my poor english.

As a program to check this, I would go with some way to generate random strings of gg, gb, bg, bb and then:

grep b cases | wc (at least one boy)

grep bb cases | wc (two boys)

and then divide.

Thanks to Pascal Cuoq and H for commenting so quickly (H your English is very good).

Both of your comments made sense to me and my own ideas continued to make sense. Where was the problem? The Wikipedia article cited Martin Gardner and I managed to unearth my copy of “The Colossal book of Mathematics” to read Gardener’s explanation of a 1/2 solution; then I understood.

The different answers come about because of differences in the construction of the underlying distribution from which the family is drawn.

If we randomly select a parent from the population of all families in the world with two children and stop selecting when we encounter a parent who answers yes to the question “Do you have at least one son”, then the probability of them having two sons is 1/3. This is how blogs I have read, Pascal Cuoq and H view the problem.

If a person walks up to me and says they have two children and at least one of them is a boy, that person has self selected the problem statement (as Gardener points out a person with two girls would have to say at least one of my children is a girl). This is how I have been viewing the problem, hence my use of the term ‘known-boy’, and the probability of two boys is 1/2.

The use of permutations on this problem doesn’t sit right with me. The 1/3 answer often comes from the wording that allows the boy to be either the elder or younger, but is it acceptable to add irrelevant age information to the list of permutations? Why not add more information, such as what day of the week they were born on, like the Tuesday problem, or hair color? I’m no expert, but my opinion is that the age-irrelevant wording should have an age-irrelevant permutation set:

bb bg

Thus, the answer is 1/2.

The issues of whether there is a “known child,” or if the boy is the older or younger child, are both red herrings. I don’t mean the answers to those questions can’t influence the answer – they can – but they do not address why the answers are different.

Change the problem to an experiment you can repeat with coins. Say I flip two coins, a dime and a penny, and tell you “at least one landed on (heads or tails)” afterwards. The answer to “what is the probability that both landed the same way?” depends on the strategy I use for choosing what to say.

1) If I decided ahead of time that I would only tell you if one landed on a heads, and re-flip the coins if neither do, then the answer is 1/3.

2) If I decided ahead of time that I would prefer to tell you if one landed on a heads, and will tell you “one landed on tails” only if both do, then the answer is 1/3 if I say “heads,” and 1 (yes, that’s 100%) if I say “tails.”

3) If I decided to always tell you how the penny landed, and ignore the dime, the answer is 1/2. This compares to the “known child” case.

4) If I decided to tell you about the one that stopped spinning first (or landed closer to me, or any other random factor), the answer is again 1/2.

If you treat the person who said “I have two children and at least one of them is a boy” as a randomly-chosen person, any of these interpretations of why he choose to say that is possible. But the first two are very improbable, and require that we assume this random person had a bias of some sort. And in fact, if you assume such a bias, you can’t know whether #1 or #2 is right, so the answer coudl be 100%. The most reasonable interpretation is #4, and it is not the same as #3 even though it gets the same answer.

The reason the answer changes is because the event we should use to divide the cases into “possible” and “not possible” is based on what I tell you, not necessarily what is true. In cases #3 and #4 there are times where “at least one landed on heads” is true, but I will tell you “at least one landed on tails.” In case #1, that is not possible; and in case #2, we actually get two different answers and you have way to know which is correct.

You can make a similar comparison to the “boy born on Tuesday” problem. Say I start with seven of each coin, each minted in a different year in the range 2001-2007. I pick a random dime and a random penny, and flip them.

1) If I decided ahead of time that I would only tell you a 2005 coin landed on a heads, and re-flip the coins if neither do, then the answer is 13/27.

2) If I decided ahead of time that I would prefer to tell if one landed on a heads, and add the earliest year I could, then the answer is 13/27 if I say “2001 heads,” 11/25 if I say “2002 heads,” 9/23 if I say “2003 heads,” etc.

3) If I decided to always tell you about the penny, and ignore the dime, the answer is 1/2 just as it was before.

4) If I decided to tell you about a random coin, the answer is again 1/2 in all cases.

The fact that everybody expects the information about the date to have no effect proves that they really expect the coin (or child) to be a specifically-chosen one, as in $4, and the information about it to be whatever is true for that child.