A while ago I wrote about Simpson’s Paradox in “Models, models, models…”. Today, after some reading, I can now provide the solution that I think correct. It was originally written in the comment section of the article, and now I paste here as well:
An appropriate answer is:
although Sleve is more productive than Mark in both years separately, however, a high proportion of products were produced by Mark in Year 2 when his productivity was high (at 70); and on the contrary, most products were produced by Sleve in Year 1 when his productivity was low (50). This means that many products were “shifted” (or “assigned”) from Sleve to Mark in Year 2 when the overall productivity improved (say, due to technical advances), resulting most products produced by Mark at high productivity rates. This overall much more increases Mark’s productivity compared to Sleve.
For answering the question, if there is no particular reason why more products were assigned to Mark in Year 2 (i.e. products were shifted regardless), it is better to look at each year individually. The correct way to calculate overall productivity should be involved with some weighting scheme. However, if the technology advances (for example) affect how many products Sleve/Mark can produce, then it is better to use aggregated data (i.e. 2 years’ data).
In this particular case, since there is no other information on the shift of the products, we could assume products were assigned in pure arbitrary manner, and therefore the conclusion is that Sleve is more productive.
This should make sense now…