martedì 23 luglio 2013

TOPIC 2: Interaction between/among substances in combined experiments


In next months, I will treat the mathematical and statistical models used to study the interactions between/among substances in combined experiments. In other words, I will introduce the methods to treat the data when two or more toxic compounds are used to expose cellular/animal models.

I will reprehend some concepts that I have already published, but I will revise all also in light of some concerns that some scientists have raised about my approach. I will give you all the references necessary to a full comprehension of the topic.

This argument is very complex, and therefore my blog will require some time…

Stay tuned!!

mercoledì 3 luglio 2013

To SUM UP

First of all, we cannot choose our best test exclusively on the basis of significance (e.g. best=more significant). The reason is that we can do two types of errors (called type I and type II): we see a significance when the observed effect is actually not significant, we don’t see a significance, when the observed effect is actually significant. Therefore, we should choice our test based on its reasonability and statistical correctness:

1)      Semi-clinical approach (12-APR): its use makes sense if each experiment represents a different subject, and replicates are only a measure of intra-subject variability. The test does not consider the replicates, but only mean values. For this reason, it tend to underestimate the significance of the differences and makes the replicates a secondary point;

2)      Simple approach (22-APR): it may be acceptable with few experiments, as it considers all the replicates together, without taking into account the experiments. It may be influenced by outlier experiments, that can increase the sample variance;

3)      The experiment as random factor (9-may): it is a very elegant method to take into account the inter-experiment variability and with few experiments involving the same cells with a couple of replicates it appears a GOOD CHOICE; with several experiments, its interpretation may be difficult and may induce statistical artifacts;

4)      An approximated method (13-may): it is not the best choice, as it tends to underestimate the significance, gives no weight to experiments, and it is statistically discussable. However, it is very simple to perform with few simple calculations;

5)      Weighted means PART 1 (5-jun): it is a good choice particularly with a number of experiments (e.g. >3-5). Its generalization brings to Weighted means PART 2 (6-jun) based on the method proposed by Bland and Kerry. Attention should be given to the degrees of freedom to be used for statistical analysis. The use of n-2 df for each experiment appears a sound choice. The method functions with few and several experiments, but it appears a GOOD CHOICE with k>3-5 experiments.

Although in the peer-review process all these aspects are often not considered by the reviewers and a study is often accepted without these details, I RECOMMEND to indicate the method of considering both experiments and replicates in statistical analysis. It is an appreciable benefit.

Finally, another important consideration: without multiple comparison corrections, the use of t-student/Mann-Whitney test to compare pairs of data with >2 groups IS INCORRECT! TAKE CARE! In some cases, it implies the article rejection!!

mercoledì 12 giugno 2013

A little more about Degrees of Freedom


From my posts, it might be argued that the correct choice of the number of degrees of freedom (df) is fundamental, as it determines the significance of the results. It is crucial when we calculate the weighted mean. In my previous post, when we have applied the Bland and Kerry procedure, I’ve said that: “The most immediate thought is k-1 df, which is 2 in our case. However, we should take into account that each experiment has n samples (n-1 df). Therefore, in my opinion the correct way to take into account of df is: df=(k-1)*(n-1) if n is the same in all the experiments”.

It is logically correct, but with an important limitation: what about df if n varies along experiments?? Can we find df in a more general way starting from the formulae we used?

Suppose this simple case: we have an experiment with three replicates (1,2,4) and another with other three replicates (2,3,4). Exp1: Mean=2.33 (SD: 1.53) – Exp2: Mean=3 (SD: 1).

Let’s calculate the weighted mean.

W1=3/2.34=1.28; W2=3/1=3 =>

Weighted mean = (1.28*2.33 + 3*3)/(1.28+3) = 2.80

How can I change my data without varying the weighted mean? How many constraints do we have?

I can change the single means in n-1 ways (two per experiment), but this change influences the SD we use in the formula! To do an example:

To maintain 2.33 as mean in Exp1, I have to change at least two replicates, e.g. 0.5, 2.5, 4. However, SD=1.76 and also the Weighted mean is varying:

(0.99*2.33 + 3*3)/(0.99+3)=2.83 =>

To maintain the weighted mean unaltered, I have to maintain also the same SD in each experiment, and I can do it only varying all the three replicates => IN EACH EXPERIMENT, we have two constraints: mean and SD => each experiment has n-2 df.

To generalize, If I have k experiments and the experiment i has ni replicates, the total number of df is:

Df=Si=1 to k (ni-2) (MATHEMATIC WAY)

In the example of the previous post (k=3 experiments each with 5 replicates), DF=(5-2)+(5-2)+(5-2) = 9 despite of 8 calculated with the “LOGIC METHOD”. The significance of the difference would be 0.0001 despite of 0.0002.

It should be noted that the LOGIC METHOD has more df than the MATHEMATIC one when k is high and n low, the contrary when k is low and n is high, although with several df (e.g. >20-30 per condition) the significances are slowly influenced by this choice.

WHAT is the BEST? The MATHEMATIC method starts from the used formula to calculate weighted mean and is more general, and therefore it is statistically more rigorous.

giovedì 6 giugno 2013

Weighted Means: Part 2


Let’s look at the Bland and Kerry procedure, giving that we have calculated the weighted mean as reported in the previous post. We will see the direct application of the procedure, without details about theory that can be read directly at the bibliographic reference.

Suppose that we have the same data recalled in the previous post.  We use only the difference between Controls and C1, which has the lowest significance.

The weighted mean of controls is 69.8, that of C1 is 89.6.

Firstly, we calculate a term called “Weighted sum of observations squared” (s2)  which is dependent on the number of experiments (k):

s2 = k*(Si=1 to k Xi2*Wi)/ (Si=1 to k Wi)

In our cases:

s2 (C)=3*(622*0.0682+72.62*0.0693+83.62*0.0243)/(0.0682+0.0693+0.0243)=14782.25

s2 (C1)=3*(85.62*0.0266+86.82*0.109+99.62*0.0425)/(0.0266+0.109+0.0425)=24218.09

Then, we have to calculate the correction term, Corr, simply calculated as:

Corr=k*X2weighted

Corr(C)=3*69.82=14616.12 and Corr(C1)=3*89.62=24084.48

Then, we define “Sum of squares about the mean” (S2) as:

S2=s2-Corr =>

S2(C)=166.13 and S2(C1)=133.61

The weighted estimation of SD of each condition is simply:

SDbest=sqrt[S2/(K-1)] =>

SDbest (C) = 9.11 and SDbest(C1) = 8.17

Note that in this case we have some comments to do:

1)  The values are independent on the number of replicates, but depend on the SD values of the single experiments, being it a SD weighted mean;

2) The relationship between SD calculated here and those calculated with the previously presented method was approximately sqrt(5) [some rounding differences];

3)  How many degrees of freedoms we have? The most immediate thought is k-1 df, which is 2 in our case. However, we should take into account that each experiment has n samples (n-1 df). Therefore, in my opinion the correct way to take into account of df is: df=(k-1)*(n-1) if n is the same in all the experiments. In our case, df=8. Why the multiplication? Because we want that each experiment has a constraint, and therefore we introduce an AND among the assumptions on our experiments, and therefore the probabilities are multiplicative. It solves also the point 1.

Therefore,  we have to compare C=69.8 (SD: 9.11) and C1=89.6 (SD: 8.17) with 8 df per group, which implies a significance of p=0.0002, lower than that obtained with the calculations reported in my previous post. On the contrary, if we retain that n-1 should not be considered as df, we may use only k-1 df (2 in this case) with a very lower significance (p=0.049). In this case, the number of experiments should be well > 3 to control the beta error.

In the next post, we will summarize all the results.

mercoledì 5 giugno 2013

Weighted Means: Part 1


The general idea is that each experiment , with n replicates that can vary experiment by experiment, is exactly performed in the same way, but, as already said, random factors may influence it (the researcher’s state of mind, a different hand in performing it, atmospheric conditions, etc). However, differently from the case in which we have used the number of experiment as random factor, I consider the experiment as statistical unit and not the replicate. Therefore, its statistical power increases with the number of performed experiments (I suggest at least 5, although we will apply our method with our data from three experiments).

Suppose we have k experiments, and each experiment has n replicates that may vary (n1,n2,….nk). Each experiments has its own mean (X1,….,XK ) and its own standard deviation (s1,…,sk) and therefore its own standard error (SE), calculated as SE1=s1/sqrt(n1),…, SEK=sK/sqrt(nK). When we calculate weighted mean, we want to “weight” the general mean of means for the SE of each experiments, giving more weight to those experiments with lowest SE (lower SD, higher n, or both). Therefore, we can calculate a weight for each experiment:

Wi=1/(SEi)^2, i=1,…,k

And use this weight in the calculation of mean of means:

Xbest=(Si=1 to k Xi*Wi)/ (Si=1 to k Wi)

We can also calculate the general SE starting from the standard errors of the experiments:

SE=1/sqrt (Si=1 to k Wi)

Let’s look at our example (not normalized data, with n=5 replicates for each experiment):

 
Controls:
SE1=8.57/sqrt(5)=3.83; SE2=8.50/sqrt(5)=3.80; SE3=14.33/sqrt(5)=6.41 =>
W1=1/SE1^2=0.0682; W2=1/SE2^2=0.0693; W3=1/SE3^2=0.0243 =>
Xbest=(62*0.0682+72.6*0.0693+83.6*0.0243)/(0.0682+0.0693+0.0243)=11.29/0.1618=69.8
The value is different to the crude mean of three means (72.7), as the third experiment has higher variability and therefore less weight on the weighted mean.
Let’s calculate SE:
1/sqrt(0.0682+0.0693+0.0243)=1/0.402=2.488
Note that the statistic unit is the number of experiments, and therefore SD=SE*sqrt(k)=2.488*sqrt(3)=4.309
Again a value completely different from the mean of the SDs of the experiments: 10.47 =>
We conclude that controls have a mean of 69.8 (SD: 4.31) with 2 (k-1) degrees of freedom (df) for each condition.
Other conditions:
C1=89.6 (SD: 4.08)
C2=108.8 (SD: 3.77)
C3=131.8 (SD: 4.10)
C4=139.8 (SD: 4.20)
On this data, we may apply ANOVA and post hoc tests.
IMPORTANT: one may think that the method only depends on the number of experiments, but not on replicates. Not True. The number of replicates determines the SD of the weighted means (high n values highly reduce SD) and therefore the model TAKES INTO ACCOUNT BOTH OF THEM.
To see the comparisons (t-student with Bonferroni’s correction, significance at p=0.0125):
C1 vs C p=0.0045
C2 vs C p<0.001
C3 vs C p<0.001
C4 vs C p<0.001
Substantially in line with what found when we have used the experiment as random factor of we have put all the experiments together, indicating that the method is anyway efficient also with few experiments.
Another possible calculation of SD: It has not solid statistical basis but may be logically reasonable. As we calculate weighted mean, we can calculate weighted SD as:
SDbest=(Si=1 to k SDi*Wi)/ (Si=1 to k Wi) =(Si=1 to k ni/SDi)/ (Si=1 to k Wi)
SD of controls=(5/8.57 + 5/8.50 + 5/14.33)/0.1618 = 9.39
SD of C1 =(5/13.07+5/6.76+5/10.85)/0.181 = 8.75
Therefore, we have to compare C=69.8 (SD: 9.39) and C1=89.6 (SD:8.75), but in this case, being the weighted mean (SD) a sort of “gold standard” experiment (see approximated method), the number of df=n-1=5-1=4 for each condition
Making the comparison: p=0.0087, again significant but less than the previous case despite n>k
NOTE: in this last case, each experiment should have the same n to univocally define df.
Next time we will see the method by Bland and Kerry to have a weighted estimation of SD, comparing the results with those found today.
 
 
 
 
 
 
 


lunedì 27 maggio 2013

The Weighted Means: Essential Bibliography

In the next weeks I will treat the last and more complicated method to analyze the data: the weighted mean. The general hypothesis is that a relatively high number of experiment will be performed (my suggestion: n>5, although it is not so restrictive), and so the method is complimentary to that treated as simple approach – experiment as a random factor.

I have partially treated the argument in:

Goldoni M, Tagliaferri S. Dose-response or dose-effect curves in in vitro experiments and their use to study combined effects of neurotoxicants. Methods Mol Biol. 2011;758:415-34. doi: 10.1007/978-1-61779-170-3_28. Review. PubMed PMID: 21815082  

But I will re-treat the argument step by step to completely explain all the passages. Most of them are available in this essential bibliography:

1)      JR Taylor. An introduction to error analysis:  the study of uncertainties in physical measurements. University Science Books, 1997.

I will adapt the passages proposed for physical measures to toxicology (Chapter in the book entitled: Weighted Means, chapter 7 in the 2nd edition).

2)      Bland, J. M., and Kerry, S. M. (1998) Weighted comparison of means, BMJ. 316, 129.

This is a very brief but  interesting lecture taken from clinics. http://www.bmj.com/content/316/7125/129

Good Luck!!

lunedì 13 maggio 2013

An Approximated Method


This method is quite simple to perform, but it does not take into account the number of experiments and it is statistically discussable. It may be used in all the conditions and presents some similarities with the semi-clinical approach (see one of my previous posts).

If we have k experiments with n replicates (n is the same for all the experiments), we may think to calculate the best possible experiment giving to all the experiments the same weight:

-the mean is the mean of k experiments;

-the SD is the mean of SD of k experiments.

In this way, we extrapolate a sort of “mean experiment”, where the number of degree of freedoms (df) remains n-1 for each condition. This method is “dangerous” if we have a low number of replicates and several experiments, and statistically is discussable because it does not take into account all the possible df of our system and gives to all the experiments the same weight.

Let’s look at the data (crude values):

 
 
 The last line represents the values we have to compare, considering n=5 replicates for each conditions. In the statistical softwares, there aren’t general options to compare mean (SD) with ANOVA only knowing that values and n, but we can use some sites that perform such a calculation. See for example:
 

In alternative, some softwares perform t-Student tests. We can assess the differences between pairs of conditions using t-Student tests, but we have to consider that we have to apply the Bonferroni’s correction to the p value (true significance is 0.05/m, where m is the number of performed multiple comparisons). The results:
 

To perform the comparisons between exposed conditions and control (software OPENSTAT available at http://www.statprograms4u.com/ or the previous site with pairs of column):
0 vs 1 – p=0.0253
0 vs 2 – p=0.0003
0 vs 3 – p<0.0001
0 vs 4 – p<0.0001
Considering a significant p=0.05/4=0.0125, the condition 1 is the unique not significant condition, and therefore the results are intermediate between A SIMPLE APPROACH with and without random factors (all significant) and the SEMI-CLINICAL approach (no significances). However, this method, although very simplified and discussable, controls fairly the beta error.
 
 

 

giovedì 9 maggio 2013

THE EXPERIMENT AS RANDOM FACTOR: OUTPUT with SPSS

Let’s go with the analysis, by using the variable “experiment” as random factor.

 
Note that, as for a classical two-way ANOVA, we will have the significant effects of group alone,
experiment alone and the INTERACTION between factors. Interaction means that it evaluates if the trend among experiments is parallel or not. I will show you the meaning with a graph.
Let’s go to the output, looking at the most important tables. The reader may repeat the analysis with normalized data.
 
It is interesting to note that: (1) the significance of the factor group is confirmed (p<0.001); (2) the factor experiment is significant (p=0.002). It means that experiments are not properly homogeneous looking at crude values; (3) interaction is not significant (p=0.185). It means that the trend in the three experiments is substantially parallel; (4) the use of the random factor influences the results of Dunnett’s test, slightly increasing the significance of the difference (as evident in the 1 vs 0 group comparison). Graphically, the trend is the following:
 
It is quite evident that experiment 3 is always the highest, and that experiment 1 is that with major deviations from parallelism, although not significant (e.g. a highest relative effect, to be tested with variable norm1).
CONCLUSION: All the concentrations of the toxicants are significantly effective as compared to controls with p<0.001, but experiments are not perfectly homogeneous (experiment 3 has always higher values as compared to the others), although the trend is overall parallel.
This analysis takes into account both the number of experiment and replicates, and it is particularly efficient with a low number of experiments (3-5). With a higher number of experiments, we may consider other possibilities, as the random factor has too many experiments and therefore a great influence on the statistical analysis, with the risk to create artifacts. I will show you other methods in the next posts.
Note: there is not a best “gold standard” number of experiments to make this approach efficient. It is only my opinion. The conclusions may vary depending on the number of replicates and exposure conditions.
 
 
 
 
 
 

lunedì 6 maggio 2013

FIXED AND RANDOM FACTORS

When you perform ANOVA with SPSS, you may have noticed that in Generalized Linear Model  => Univariate…  we have put the variable “group”, which represents the dose of exposure, under FIXED FACTOR(s), but there is another option: RANDOM FACTOR(s).



What about  the difference, using a very simplified terminology? FIXED FACTOR represents a factor which is modified following the design of the researcher. In other words, the researcher FIXES a difference between each category of the factor, as for example the concentration of exposure.  On the other hand, RANDOM FACTOR is a grouping variable on which the researcher cannot act, but can randomly influence the results. To do an example: if we perform a multicentric clinical trial, a category which represents each center is a random factor. The same protocol is used in all the centers, but we cannot know a priori whether  there are uncontrollable differences that may influence the results… YES, when we perform N times the same experiment under the same nominal conditions we introduce a RANDOM FACTOR, as we cannot exclude that there is some type of uncontrollable confounder which may influence the experimental trend.
Therefore, if we introduce the variable “experiment” in which the number of the experiment is indicated, we may think to use it as random factor and perform ANOVA with a fixed factor and a random factor.

 
 


Note that for random factors post-hoc tests cannot be performed as irrelevant for the analysis. IT MAY INFLUENCE THE ANOVA RESULTS, BUT IT PROPERLY TAKES INTO ACCOUNT THE FACT THAT SEVERAL EXPERIMENTS ARE PERFORMED DESPITE OF ONE. It is particularly efficient with a relatively low number of experiments (e.g. 3-5). For a higher number of experiments, other methods may be used (see the next discussions).

In the next post, I will show you the Output of such an analysis.

lunedì 22 aprile 2013

A SIMPLE APPROACH


Looking at in vitro models, this approach is very simple to apply and does not require any particular statistical knowledge. In my opinion the conditions to use it are:

1)      The same cellular line for each experiment. No differences due to the cell source or origin (i.e. a commercially available cell line).

2)      Each experiment is exactly the same as the others in all the experimental details.

The number of experiments may be also low (n=2-3).

Under these assumptions, each replicate inside each experiment is a cell sample made of n cells, and there is no reason to not consider the total number of replicates – 1 as the total number of degrees of freedom, as there is no reason why the experiments are different. In this view, also a different number of replicates per experiment may be used.

This may have some consequences that we will see, but as a first step we may think to consider the replicates all together as the experiment was one. It may be applied with each type of normalization. Therefore, to compare the effects of a toxicant we have to use One-Way ANOVA for independent measures, as reported in the figure with not normalized (crude) and normalized 1 (norm1) data. Note that the grouping variable has the following meaning: 0=control; 1=C1; 2=C2; 3=C3; 4=C4. Only 0-1 are reported for spatial reasons.



Let’s go with SPSS. There are two ways to perform one-way ANOVA with SPSS; I will show you one method, as it will be further used for more complicated models.

Go to Analyze…=> Generalized Linear Model… => Univariate…. And select the variable group as fixed factor and crude/norm1 as dependent variable.

Then go on post-hoc… and paste the variable group in the post-hoc test for… window. Note that there are several possible post-hoc tests, whose use depends on several factors, among which heteroscedasticity (e.g. difference variance among groups tested by Levene’s test during analysis, different n groups, etc). To simplify the analysis, we will use the Dunnett’s test to compare only the exposed groups with control (the reference group is the first).

The most important results:

ANOVA is highly significant: F=58.99, p<0.001 (crude) – F=42.38, p<0.001 (norm1) and all the groups are significantly higher than control, independently on the type of normalization. Note that in this case, the normalized variable HAS a SD due to the fact that it is calculated on the replicates.
 

Therefore THE RESULT is COMPLETELY DIFFERENT FROM THE SEMI-CLINICAL APPROACH… and in this case it makes sense!

There are two important limitations: (1) the result may be very sensitive to outlier experiment or replicate, therefore AN EXPERIMENT DIFFERENT FROM THE OTHERS MAY ALTER THE RESULTS and cause some problems of normality. (2) A reader may think that performing different experiments is completely useless. Why not only one experiment with more replicates? In reality, the use of more than one experiment has the aim to verify the inter-experiment variability, which may be an important factor in determining the repeatability of the results. Therefore, THE USE OF >1 EXPERIMENT IS DESIDERABLE.  We are apparently in contradiction: This approach cannot distinguish an experiment with several replicates and more experiments with few replicates. But it is not impossible to take into account of it, as we will see the next time.

NOTE: Normalization 1 should be cautiously made in this case, because a normalization experiment per experiment is not completely in agreement with the assumptions made and is not completely justifiable, as we do not expect a different susceptibility in different experiments.

mercoledì 17 aprile 2013

A PROBLEM OF NOMENCLATURE


It is possible that the term: "REPEATED MEASURES ANOVA" is not familiar to all the people. In some statistical programs and texts, it is called "TWO WAY ANOVA", creating a confusion of terms. Classically, two way ANOVA is not intended as a test for repeated measures. It is a typical independent ANOVA with two fixed factors despite of one.  It implies the main effect of each factor, and an interaction term (see for example  http://www.statisticshell.com/docs/twoway.pdf).  In my opinion, it is better to leave the two tests separated and to clearly indicate when repeated measures are used.

lunedì 15 aprile 2013

VIEWING OUR DATA



The graphical representation of the data well resumes the differences between the two types of normalization. In each graph, we report the mean of three means of three experiments (E=Effect) as a function of the concentration of a generic toxicant. Two points are evident:

(1)   After normalization 1 (GREEN), Control has no error bar, because all the data are normalized to 1 experiment per experiment. After Normalization 2 (BLUE), the error bar on controls has the same relative amplitude of the non-normalized data (in RED).
(2)   The dispersion of data is higher after normalization 1 than normalization 2. It also justifies the difference in significances observed in our previous post. Why? The reason is that the elimination of  variability of controls, which is done with normalization 1, may increase the gap among experiments at the same condition, measuring a relative effect respect to control. And it may have an effect on the dispersion around the mean. This approach valorizes a difference in subjective susceptibility to a toxicant.
It should be noted that NORMALIZATION 1 is not WRONG a priori. However, the Researcher should clearly justify this approach. As already said, it may be used when each experiment represents a different subject/animal, and a background difference between each experimental unit may be canceled to assess the individual susceptibility to a toxicant. Other approaches may be used, and I will show them in the next episodes.   


venerdì 12 aprile 2013

A SEMI-CLINICAL APPROACH

This method has the important limitation that IT DOES NOT TAKE INTO ACCOUNT the number of replicates for each condition. Therefore, it may underestimate the significance of the differences (type 2 error) and its use should be justified. MY SUGGESTIONS: its use does make sense if every experiment represents a specific case. In other words, with an example: the same cellular line extracted in different subjects/animals. In such a case, despite the replicates, we may think that the inter-variability is higher than intra-variability (e.g. the difference among subjects is higher than the variability of replicates). Furthermore, the number of experiments should be quite high, at least higher than the number of conditions. I suggest a number of experiments/subjects >= 10-15. I will show you the limitations with few experiments, considering our data.

THE PROCEDURE: we may use both normalized and not-normalized data. We calculate the mean value for each condition for each experiment. With not normalized and normalized data (example of the previous comments) and with this method we don’t use the SD arising from replicates:



Being each experiment a specific condition, we need a repeated measures test. In our case, we have 5 conditions and we need REPEATED MEASURES ANOVA. Note that after normalization the SD of controls is 0 and therefore an independent ANOVA cannot be performed. Let’s do it with SPSS. To visualize all the passages and to have a complete explanation of all, see the great work of Andy Field (http://www.statisticshell.com/docs/repeatedmeasures.pdf) Data will be inserted as follows:



To do Repeated Measures ANOVA: Analyze => Generalized Linear Model => Repeated Measures… we define factor1 with 5 levels, and then ADD…Once added, we can proceed with DEFINE. We select the variables C-C4 and move them in right panel. We have however the limitation that POST-HOC tests are not selectable, although there is a way to do them. IN SPSS, HOWEVER, A POST-HOC TEST WHICH COMPARES ALL THE CONDITION OF EXPOSURE WITH CONTROL, LIKE DUNNETT’S TEST, IS NOT SELECTABLE. I will show you a way to by-pass this limitation. Anyway, to do a full factorial post-hoc test, we click on OPTIONS. We select factor1 and move it in the VISUALIZE MEANS FOR. Now, we click on COMPARE MAIN EFFECTS and we select a test (i.e. Bonferroni, the most classical one). Then Continue and OK. From the OUTPUT, it appears quite evident that there are some concerns about data due to the low sample size (n=3 points per condition). Mauchly’s sfericity cannot be calculated, although the program performs ANOVA with the following result:



Despite the Sphericity, the test is anyway significant. However, with Bonferroni’s post hoc test we have bad news: the differences between pairs are in most cases not significant (except C3 vs C4, p=0.038 with not normalized data and C3 vs C5, p=0.035 with normalized data). We can be less conservative and compare C1-C2-C3-C4 with C performing only 4 comparisons (always Bonferroni). How to do that? We perform single paired sample t-test, but the significance would be p=0.05/4=0.0125 for each test (SPSS: Analyze => Compare Means =>Paired Samples T Test… and
select the pair of variables). Results:



Only C-C4 is at the limit of significance (Not normalized data). Why this result, despite the fact that differences among conditions appear evident? N=3 experiments are a little number with a very low statistical power. This is the reason why I suggest at least 10-15 experiments/subjects/animals. There is another order of problems. The Normalization of Data changes the results, and it may be critical when we are near 0.05. It should be noted that the normalization we have done should make the inter-experiment variability at the background concentration NULL, it is not an OVERALL normalization of the results. Therefore, the method we applied finds the differences between the conditions APART FROM such a variability, while the analysis on non-normalized data does not. The results are therefore different. The normalization which contains also this variability is very simple. You take the three not normalized control values (62, 72.6, 83.6), calculate the mean (72.733) and divide all the column for this value, obtaining the following results:



The results with normalized data 2 are the same of not-normalized data. Therefore, the choice of type of normalization is very critical and should be always chosen and justified.

To sum up:
1) NORMALIZATION OF EVERY SINGLE EXPERIMENT TO 1 ELIMINATES THE BACKROUND VARIABILITY OF CONTROL AND THE RELATIVE EFFECT OF EXPOSURE CONDITIONS MAY BE EVALUATED.

2) NO NORMALIZATION OR NORMALIZATION OF MEAN OF CONTROLS TO 1 TAKES INTO ACCOUNT ALL THE EXPERIMENTAL VARIABILITY.

We will consider the differences in the results, if present, case by case. Note that case 2 has the possibility to use independent measures ANOVA if there is no reason to think that exposure conditions are influenced by the number of the experiment (e.g. no differences in susceptibility are expected).


A BRIEF NOTE: "Repeated measures" means that with in vitro models each cell line arises from a different subject/animal. It is not applicable  to a traditional in vivo experiment, unless the same subject/animal testes all the proposed conditions. Generally, in vivo approach is different: we have only one experiment with n animals/subjects per condition. Therefore, we have to use a classical independent ANOVA (the number of ways may vary depending on the study) and not a repeated measures ANOVA.

mercoledì 3 aprile 2013

The meaning of "REPLICATE"

When we use the term "replicate", with the scientific language we can intend different things: 1) EXPERIMENTAL REPLICATE: each sample is an exact copy of the other. All the experimental procedures are perfectly repeated in all the replicates of the same condition. To do an example, we may expose 10.000 cells per replicate at a given concentration of a toxicant. Another example: each animal receiving a given dose of a substance. The variability of replicates mainly depends on the biological variability and susceptibility. 2) ANALYTICAL REPLICATE: we test a condition only once, but we divide the sample in n aliquots to be measured. In this case, the replicate is used to test the stability of an experimental signal more than the biological repeatability of the experiment. We expect the analytical replicate variability is lower than that of experimental replicates. Generally, when we mention an experiment with n replicates, we intend the case 1 and therefore I will always refer to it.