From my
posts, it might be argued that the correct choice of the number of degrees of
freedom (df) is fundamental, as it determines the significance of the results.
It is crucial when we calculate the weighted mean. In my previous post, when we
have applied the Bland and Kerry procedure, I’ve said that: “The most immediate
thought is k-1 df, which is 2 in our case. However, we should take into account
that each experiment has n samples (n-1 df). Therefore, in my opinion the
correct way to take into account of df is: df=(k-1)*(n-1) if n is the same in
all the experiments”.
It is
logically correct, but with an important limitation: what about df if n varies
along experiments?? Can we find df in a more general way starting from the
formulae we used?
Suppose
this simple case: we have an experiment with three replicates (1,2,4) and
another with other three replicates (2,3,4). Exp1: Mean=2.33 (SD: 1.53) – Exp2:
Mean=3 (SD: 1).
Let’s
calculate the weighted mean.
W1=3/2.34=1.28;
W2=3/1=3 =>
Weighted
mean = (1.28*2.33 + 3*3)/(1.28+3) = 2.80
How can I
change my data without varying the weighted mean? How many constraints do we
have?
I can
change the single means in n-1 ways (two per experiment), but this change
influences the SD we use in the formula! To do an example:
To maintain
2.33 as mean in Exp1, I have to change at least two replicates, e.g. 0.5, 2.5,
4. However, SD=1.76 and also the Weighted mean is varying:
(0.99*2.33
+ 3*3)/(0.99+3)=2.83 =>
To maintain
the weighted mean unaltered, I have to maintain also the same SD in each
experiment, and I can do it only varying all the three replicates => IN EACH
EXPERIMENT, we have two constraints: mean and SD => each experiment has n-2
df.
To
generalize, If I have k experiments and the experiment i has ni
replicates, the total number of df is:
Df=Si=1 to k (ni-2) (MATHEMATIC WAY)
In the
example of the previous post (k=3 experiments each with 5 replicates),
DF=(5-2)+(5-2)+(5-2) = 9 despite of 8 calculated with the “LOGIC METHOD”. The
significance of the difference would be 0.0001 despite of 0.0002.
It should
be noted that the LOGIC METHOD has more df than the MATHEMATIC one when k is
high and n low, the contrary when k is low and n is high, although with several
df (e.g. >20-30 per condition) the significances are slowly influenced by
this choice.
WHAT is the
BEST? The MATHEMATIC method starts from the used formula to calculate weighted
mean and is more general, and therefore it is statistically more rigorous.
Nessun commento:
Posta un commento