### Archive

Archive for November, 2010

## Flawed analysis of “one child is a boy” problem?

A mathematical puzzle has reappeared over the last year as the topic of discussion in various blogs and I have not seen any discussion suggesting that the analysis appearing in blogs contains a fundamental flaw.

The problem is as follows: I have two children and at least one of them is a boy; what is the probability that I have two boys? (A variant of this problem specifies whether the boy was born first or last and has a noncontroversial answer).

Most peoples (me included) off-the-top-of-the-head answer is 1/2 (the other child can be a girl or a boy, assuming equal birth probabilities {which is very a good approximation to reality}).

The analysis that I believe to be incorrect goes something like the following: The possible birth orders are `gb`, `bg`, `bb` or `gg` and based on the information given we can rule out girl/girl, leaving the probability of `bb` as 1/3. Surprise!

A variant of this puzzle asks for the probability of boy/boy given that we know one of the children is a boy born on a Tuesday. Here the answer is claimed to be 13/27 (brief analysis or using more complicated maths). Even greater surprise!

I think the above analysis is incorrect, which seems to put me in a minority (ok, Wikipedia says the answer could sometimes be 1/2). Perhaps a reader of this blog can tell me where I am going wrong in the following analysis.

Lets call the known boy `B`, the possible boy `b` and the possible girls `g`. The sequence of birth events that can occur are:

`Bg gB bB Bb gg`

There are four sequences that contain at least one boy, two of which contain two boys. So the probability of two boys is 1/2. No surprise.

All of the blog based analysis I have seen treat the ordering of a known boy+girl birth sequence as being significant but do not to treat the ordering of a known boy+boy sequence as significant. This leads them to calculate the incorrect probability of 1/3.

The same analysis can be applied to the “boy born on a Tuesday” problem to get the probability 14/28 (i.e., 1/2).

Those of you who like to code up a problem might like to consider the use of a language like Prolog which I would have thought would be less susceptible to hidden assumptions than a Python solution.

Tags:

Oracle have specified some details in their ‘Java’ copyright claims against Google. One item that caught my eye was the source of a class copyrighted by Oracle and the corresponding class being used by Google. An experiment I ran at the 2005 and 2008 ACCU conferences studied how developers create/organize data structures from a given specification and sheds some light on developer behavior in this area (I am not a lawyer and have never been involved in a copyright case, so I will not say anything about legal issues).

Two of the hypothesis investigated by the ACCU experiment were, 1) within aggregate types (e.g., structs or classes) members having the same type tend to be placed together (e.g., all the `char`s followed by all the `int`s followed by all the `float`s), and 2) the relative ordering of members follows the order of the corresponding items in the specification (e.g., if the specification documents the date and then the location and this information occurs within the same structure the corresponding `date` member will appear before the `location` member).

The first hypothesis was investigated by analyzing C source source and found that members having the same type are very likely to be grouped together. Looking at the private definitions in the Oracle code we see that members having a `HashSet` or `boolean` type are not grouped together with other members of the same type; one possibility was that the class grew over time and new members were appended rather than intermixed with members already present. The corresponding Google code also has the `HashSet` or `boolean` intermixed with members having different types. At least one implementation exists where members having the same type are grouped together.

The second hypothesis was investigated by running an experiment which asked developers to create an API from a specification, with everybody seeing the same specification except the ordering of the information was individually randomized for each developer. The results showed that when creating a struct or class (each subject was allowed to create the API in a language of their choice, with C, C++ and Java actually being used) developers tended to list members in the same relative order as the information appeared in the specification.

If the Apache implementors of the code used by Google based their specification on the Oracle code, it is to be expected that the ordering of the members would be very similar, which is it.

There has been a lot of talk of clean room implementation being used for the Java-like code within Dalvik. Did somebody sit down and write a specification of the Oracle `PolicyNode` class that was then given to somebody else to implement?

The empirical evidence (i.e., members having the same type are not adjacent but intermixed with other types and relative member ordering is the same) suggests that the implementation of at least the private members of the `PolicyNode` class being used by Google was based directly on the Oracle code.

These members are defined to be `private` so Google cannot argue that they are part of an API (various people are claiming that being part of an API has some impact on copyright, I have no idea if this is true).

Can Google get around any copyright issue simply by reordering the member definitions? I have no idea.

It would appear that like the patent lawsuit this copyright lawsuit could prove to be very interesting to watch.

Tags: