mercoledì 12 giugno 2013

A little more about Degrees of Freedom


From my posts, it might be argued that the correct choice of the number of degrees of freedom (df) is fundamental, as it determines the significance of the results. It is crucial when we calculate the weighted mean. In my previous post, when we have applied the Bland and Kerry procedure, I’ve said that: “The most immediate thought is k-1 df, which is 2 in our case. However, we should take into account that each experiment has n samples (n-1 df). Therefore, in my opinion the correct way to take into account of df is: df=(k-1)*(n-1) if n is the same in all the experiments”.

It is logically correct, but with an important limitation: what about df if n varies along experiments?? Can we find df in a more general way starting from the formulae we used?

Suppose this simple case: we have an experiment with three replicates (1,2,4) and another with other three replicates (2,3,4). Exp1: Mean=2.33 (SD: 1.53) – Exp2: Mean=3 (SD: 1).

Let’s calculate the weighted mean.

W1=3/2.34=1.28; W2=3/1=3 =>

Weighted mean = (1.28*2.33 + 3*3)/(1.28+3) = 2.80

How can I change my data without varying the weighted mean? How many constraints do we have?

I can change the single means in n-1 ways (two per experiment), but this change influences the SD we use in the formula! To do an example:

To maintain 2.33 as mean in Exp1, I have to change at least two replicates, e.g. 0.5, 2.5, 4. However, SD=1.76 and also the Weighted mean is varying:

(0.99*2.33 + 3*3)/(0.99+3)=2.83 =>

To maintain the weighted mean unaltered, I have to maintain also the same SD in each experiment, and I can do it only varying all the three replicates => IN EACH EXPERIMENT, we have two constraints: mean and SD => each experiment has n-2 df.

To generalize, If I have k experiments and the experiment i has ni replicates, the total number of df is:

Df=Si=1 to k (ni-2) (MATHEMATIC WAY)

In the example of the previous post (k=3 experiments each with 5 replicates), DF=(5-2)+(5-2)+(5-2) = 9 despite of 8 calculated with the “LOGIC METHOD”. The significance of the difference would be 0.0001 despite of 0.0002.

It should be noted that the LOGIC METHOD has more df than the MATHEMATIC one when k is high and n low, the contrary when k is low and n is high, although with several df (e.g. >20-30 per condition) the significances are slowly influenced by this choice.

WHAT is the BEST? The MATHEMATIC method starts from the used formula to calculate weighted mean and is more general, and therefore it is statistically more rigorous.

Nessun commento:

Posta un commento