Fitting a model may be sometimes easy, sometimes difficult or even frustrated. Especially the one that I am using, the full Bayesian models. Yes I know it is extremely flexible, powerful, literally can do anything. But the problem is it is not so convenient and slow (not surprise for its simulation method). The funny part is it often gives you similar results for complex models (the very reason to use Bayesian methods). So the question is, is it worth the effort?

For example, you want to model count data. You may use Poisson, NB, Poisson lognormal with CAR; the 1st order neighbour structure, the 2nd order structure; fixed or random effect for time effects, or even using autocorrelation… Maybe NB is good enough. There are lots of researchers around the world playing around and showing the rest “complex” ones, though I guess they initially used NB for testing different parameter combinations before finalise the models. Then they “found” the results are similar for complex and less complex models. This is expected and the reason they use less complex, fast models to test isn’t it? I am not saying this is purely “showing off”, but as in this TRB 2009 meeting, one presenter stated, the full Bayes is only worth if the data has small units that are correlated, and if you have a good PhD student (like me:)) who is willing to do the analysis…

Still, there are numerous problems about the models. Sometimes the problem seems so simple and basic. Let’s look at the following example that my supervisor gave me the other day:

**The Simpson’s Paradox**:

Let’s say there are two persons: Sleve and Mark. They are working to produce some “products”:

Year 1 | Year 2 | |

Sleve | 500/10=50 | 320/4=80 |

Mark | 270/6=45 | 700/10=70 |

where 500, 320 etc may be the number of products; and they are divided by, say months, to get the “outcome per month” so they can be compared with each other.

So Sleve claims that he is more productive than Mark, because clearly, indeed, the resulting values (50, 80) are higher than Mark’s (45, 70), each year.

Then Mark says, hold on, I am more productive than you (Sleve), why? Let’s take a look at the whole process: for the whole two years, Sleve got (500+320)/(10+4)=58.6; and Mark got (270+700)/(6+10)=60.6. Clearly Mark is more productive than Sleve.

So whose claim is correct? It should be also noted that this has nothing to do with sample size as we are comparing ratios.

That said, model and data are not easy to understand. Sometimes the question is as simple as, say, you got “*more police more crime*” relationship from your model, so is it the case that police encourage more crime; or crime decreases but crime record increases because of more police in place?

Maybe we should always keep this in mind when playing with the models: “All models are wrong, but some are useful.” – George E. P. Box.

so in practice, we always prefer the averaging method as it’s stable.

i mean i would say overall the two years, Mark is more productive.

会计师帮企业做帐，要利润，给你作出利润，要亏损，给你作出亏损，当然前提是给钱。模型是为客户服务的，只要客户付钱，完全可以作出模型，证明全世界人民都羡慕朝鲜。作为phd的模型嘛，主要还是看导师是谁了。。。

@Tao: 全世界人民都羡慕朝鲜的模型应该属于非 "useful" 的模型了。好像大学排名，如果把大连大学排在清华大学之上的model就不怎么useful了，在useful的模型下我们倒是可以看看大连理工和中国人民大学那个更高了。关键是数据分析本身的复杂和how to interpretation。当然还要接受全世界researchers的检验。

PS. 神雕侠侣又出现了:)

I didn’t use any models in my master thesis, simply becoz realise it’s not suitable for practical issues, though model may bring higher marks.. Yes, don’t think the ones who apply models really understand what they are using…

原来你的研究，也是用统计和计量的东西，我现在也在学，不过估计就算毕业了也学不了这么深。

I want to give a remark here, as this phenomenon, called as Pearson-Yule-Simpson effect, is very common in Probability Theory, but rather paradoxical. Using the probability triple (O_i, F_i, P_i), where i = 1 or 2 respectively for both Sleve and Mark, it is easy to see that Sleve performs better. However, when you try to use the average to look at them, you are creating a new probability triple (O, F, P), where O = O1 \cup O2, but F >> F_1 \cup F_2 and the same P. As a result, What P measures are totally different. Further, the Influence Function (in fact, Frechet Differentiation) of the Common Mean shows that the Common Mean varies greatly when F increases or decreases too much; it indicates that Common Mean is not a good indicator for such a situation… Maybe, you can try to use another estimator, such as Windsorized Mean, Trimean….

An appropriate answer is:

although Sleve is more productive than Mark in both years separately, however, a high proportion of products were produced by Mark in Year 2 when his productivity was high (at 70); and on the contrary, most products were produced by Sleve in Year 1 when his productivity was low (50). This means that many products were “shifted” (or “assigned”) from Sleve to Mark in Year 2 when the overall productivity improved (say, due to technical advances), resulting most products produced by Mark at high productivity rates. This overall much more increases Mark’s productivity compared to Sleve.

For answering the question, if there is no particular reason why more products were assigned to Mark in Year 2 (i.e. products were shifted regardless), it is better to look at each year individually. The correct way to calculate overall productivity should be involved with some weighting scheme. However, if the technology advances (for example) affect how many products Sleve/Mark can produce, then it is better to use aggregated data (i.e. 2 years’ data).

In this particular case, since there is no other information on the shift of the products, we could assume products were assigned in pure arbitrary manner, and therefore the conclusion is that Sleve is more productive.

Pingback: A solution to the Simpson’s Paradox | Chao's home