mathematical models in toxicology: aprile 2013

lunedì 22 aprile 2013

A SIMPLE APPROACH

Looking at in vitro models, this approach is very simple to apply and does not require any particular statistical knowledge. In my opinion the conditions to use it are:

1) The same cellular line for each experiment. No differences due to the cell source or origin (i.e. a commercially available cell line).

2) Each experiment is exactly the same as the others in all the experimental details.

The number of experiments may be also low (n=2-3).

Under these assumptions, each replicate inside each experiment is a cell sample made of n cells, and there is no reason to not consider the total number of replicates – 1 as the total number of degrees of freedom, as there is no reason why the experiments are different. In this view, also a different number of replicates per experiment may be used.

This may have some consequences that we will see, but as a first step we may think to consider the replicates all together as the experiment was one. It may be applied with each type of normalization. Therefore, to compare the effects of a toxicant we have to use One-Way ANOVA for independent measures, as reported in the figure with not normalized (crude) and normalized 1 (norm1) data. Note that the grouping variable has the following meaning: 0=control; 1=C1; 2=C2; 3=C3; 4=C4. Only 0-1 are reported for spatial reasons.

Let’s go with SPSS. There are two ways to perform one-way ANOVA with SPSS; I will show you one method, as it will be further used for more complicated models.

Go to Analyze…=> Generalized Linear Model… => Univariate…. And select the variable group as fixed factor and crude/norm1 as dependent variable.

Then go on post-hoc… and paste the variable group in the post-hoc test for… window. Note that there are several possible post-hoc tests, whose use depends on several factors, among which heteroscedasticity (e.g. difference variance among groups tested by Levene’s test during analysis, different n groups, etc). To simplify the analysis, we will use the Dunnett’s test to compare only the exposed groups with control (the reference group is the first).

The most important results:

ANOVA is highly significant: F=58.99, p<0.001 (crude) – F=42.38, p<0.001 (norm1) and all the groups are significantly higher than control, independently on the type of normalization. Note that in this case, the normalized variable HAS a SD due to the fact that it is calculated on the replicates.

Therefore THE RESULT is COMPLETELY DIFFERENT FROM THE SEMI-CLINICAL APPROACH… and in this case it makes sense!

There are two important limitations: (1) the result may be very sensitive to outlier experiment or replicate, therefore AN EXPERIMENT DIFFERENT FROM THE OTHERS MAY ALTER THE RESULTS and cause some problems of normality. (2) A reader may think that performing different experiments is completely useless. Why not only one experiment with more replicates? In reality, the use of more than one experiment has the aim to verify the inter-experiment variability, which may be an important factor in determining the repeatability of the results. Therefore, THE USE OF >1 EXPERIMENT IS DESIDERABLE. We are apparently in contradiction: This approach cannot distinguish an experiment with several replicates and more experiments with few replicates. But it is not impossible to take into account of it, as we will see the next time.

NOTE: Normalization 1 should be cautiously made in this case, because a normalization experiment per experiment is not completely in agreement with the assumptions made and is not completely justifiable, as we do not expect a different susceptibility in different experiments.

mercoledì 17 aprile 2013

A PROBLEM OF NOMENCLATURE

It is possible that the term: "REPEATED MEASURES ANOVA" is not familiar to all the people. In some statistical programs and texts, it is called "TWO WAY ANOVA", creating a confusion of terms. Classically, two way ANOVA is not intended as a test for repeated measures. It is a typical independent ANOVA with two fixed factors despite of one. It implies the main effect of each factor, and an interaction term (see for example http://www.statisticshell.com/docs/twoway.pdf). In my opinion, it is better to leave the two tests separated and to clearly indicate when repeated measures are used.

lunedì 15 aprile 2013

VIEWING OUR DATA

The graphical representation of the data well resumes the differences between the two types of normalization. In each graph, we report the mean of three means of three experiments (E=Effect) as a function of the concentration of a generic toxicant. Two points are evident:

(1) After normalization 1 (GREEN), Control has no error bar, because all the data are normalized to 1 experiment per experiment. After Normalization 2 (BLUE), the error bar on controls has the same relative amplitude of the non-normalized data (in RED).

(2) The dispersion of data is higher after normalization 1 than normalization 2. It also justifies the difference in significances observed in our previous post. Why? The reason is that the elimination of variability of controls, which is done with normalization 1, may increase the gap among experiments at the same condition, measuring a relative effect respect to control. And it may have an effect on the dispersion around the mean. This approach valorizes a difference in subjective susceptibility to a toxicant.

It should be noted that NORMALIZATION 1 is not WRONG a priori. However, the Researcher should clearly justify this approach. As already said, it may be used when each experiment represents a different subject/animal, and a background difference between each experimental unit may be canceled to assess the individual susceptibility to a toxicant. Other approaches may be used, and I will show them in the next episodes.

venerdì 12 aprile 2013

A SEMI-CLINICAL APPROACH

This method has the important limitation that IT DOES NOT TAKE INTO ACCOUNT the number of replicates for each condition. Therefore, it may underestimate the significance of the differences (type 2 error) and its use should be justified. MY SUGGESTIONS: its use does make sense if every experiment represents a specific case. In other words, with an example: the same cellular line extracted in different subjects/animals. In such a case, despite the replicates, we may think that the inter-variability is higher than intra-variability (e.g. the difference among subjects is higher than the variability of replicates). Furthermore, the number of experiments should be quite high, at least higher than the number of conditions. I suggest a number of experiments/subjects >= 10-15. I will show you the limitations with few experiments, considering our data.

THE PROCEDURE: we may use both normalized and not-normalized data. We calculate the mean value for each condition for each experiment. With not normalized and normalized data (example of the previous comments) and with this method we don’t use the SD arising from replicates:

Being each experiment a specific condition, we need a repeated measures test. In our case, we have 5 conditions and we need REPEATED MEASURES ANOVA. Note that after normalization the SD of controls is 0 and therefore an independent ANOVA cannot be performed. Let’s do it with SPSS. To visualize all the passages and to have a complete explanation of all, see the great work of Andy Field (http://www.statisticshell.com/docs/repeatedmeasures.pdf) Data will be inserted as follows:

To do Repeated Measures ANOVA: Analyze => Generalized Linear Model => Repeated Measures… we define factor1 with 5 levels, and then ADD…Once added, we can proceed with DEFINE. We select the variables C-C4 and move them in right panel. We have however the limitation that POST-HOC tests are not selectable, although there is a way to do them. IN SPSS, HOWEVER, A POST-HOC TEST WHICH COMPARES ALL THE CONDITION OF EXPOSURE WITH CONTROL, LIKE DUNNETT’S TEST, IS NOT SELECTABLE. I will show you a way to by-pass this limitation. Anyway, to do a full factorial post-hoc test, we click on OPTIONS. We select factor1 and move it in the VISUALIZE MEANS FOR. Now, we click on COMPARE MAIN EFFECTS and we select a test (i.e. Bonferroni, the most classical one). Then Continue and OK. From the OUTPUT, it appears quite evident that there are some concerns about data due to the low sample size (n=3 points per condition). Mauchly’s sfericity cannot be calculated, although the program performs ANOVA with the following result:

Despite the Sphericity, the test is anyway significant. However, with Bonferroni’s post hoc test we have bad news: the differences between pairs are in most cases not significant (except C3 vs C4, p=0.038 with not normalized data and C3 vs C5, p=0.035 with normalized data). We can be less conservative and compare C1-C2-C3-C4 with C performing only 4 comparisons (always Bonferroni). How to do that? We perform single paired sample t-test, but the significance would be p=0.05/4=0.0125 for each test (SPSS: Analyze => Compare Means =>Paired Samples T Test… and
select the pair of variables). Results:

Only C-C4 is at the limit of significance (Not normalized data). Why this result, despite the fact that differences among conditions appear evident? N=3 experiments are a little number with a very low statistical power. This is the reason why I suggest at least 10-15 experiments/subjects/animals. There is another order of problems. The Normalization of Data changes the results, and it may be critical when we are near 0.05. It should be noted that the normalization we have done should make the inter-experiment variability at the background concentration NULL, it is not an OVERALL normalization of the results. Therefore, the method we applied finds the differences between the conditions APART FROM such a variability, while the analysis on non-normalized data does not. The results are therefore different. The normalization which contains also this variability is very simple. You take the three not normalized control values (62, 72.6, 83.6), calculate the mean (72.733) and divide all the column for this value, obtaining the following results:

The results with normalized data 2 are the same of not-normalized data. Therefore, the choice of type of normalization is very critical and should be always chosen and justified.

To sum up:
1) NORMALIZATION OF EVERY SINGLE EXPERIMENT TO 1 ELIMINATES THE BACKROUND VARIABILITY OF CONTROL AND THE RELATIVE EFFECT OF EXPOSURE CONDITIONS MAY BE EVALUATED.

2) NO NORMALIZATION OR NORMALIZATION OF MEAN OF CONTROLS TO 1 TAKES INTO ACCOUNT ALL THE EXPERIMENTAL VARIABILITY.

We will consider the differences in the results, if present, case by case. Note that case 2 has the possibility to use independent measures ANOVA if there is no reason to think that exposure conditions are influenced by the number of the experiment (e.g. no differences in susceptibility are expected).

A BRIEF NOTE: "Repeated measures" means that with in vitro models each cell line arises from a different subject/animal. It is not applicable to a traditional in vivo experiment, unless the same subject/animal testes all the proposed conditions. Generally, in vivo approach is different: we have only one experiment with n animals/subjects per condition. Therefore, we have to use a classical independent ANOVA (the number of ways may vary depending on the study) and not a repeated measures ANOVA.

mercoledì 3 aprile 2013

The meaning of "REPLICATE"

When we use the term "replicate", with the scientific language we can intend different things: 1) EXPERIMENTAL REPLICATE: each sample is an exact copy of the other. All the experimental procedures are perfectly repeated in all the replicates of the same condition. To do an example, we may expose 10.000 cells per replicate at a given concentration of a toxicant. Another example: each animal receiving a given dose of a substance. The variability of replicates mainly depends on the biological variability and susceptibility. 2) ANALYTICAL REPLICATE: we test a condition only once, but we divide the sample in n aliquots to be measured. In this case, the replicate is used to test the stability of an experimental signal more than the biological repeatability of the experiment. We expect the analytical replicate variability is lower than that of experimental replicates. Generally, when we mention an experiment with n replicates, we intend the case 1 and therefore I will always refer to it.

martedì 2 aprile 2013

An overall glance to the data

In figure, I report the mean and standard deviation (SD) value for each “native” and normalized condition. It should be noted that the normalization to 1 of controls makes the relative effect of a toxicant clearer, and some differences exist among experiments (e.g. experiment 1 has more pronounced concentration-effect).

Why SD and not standard error (SE)? SE is often reported because the variability of a sample seems lower, but in my opinion it is not the best way to report the dispersion. SE=SD/sqrt(N), where N is the number of data included in a sample (in our case n=5 per experiment and per condition). SE represents the variability of the mean around the mean of the means in sampling theory, and therefore it is an estimation of the distance between the found experimental mean and the expected mean. It is not the dispersion of a sample. Since SE depends on N, it should be noted that if N is variable among several samples, SE has not the same “weight”. Therefore, although the report of SE in tables/graphs gives the idea that the experimental data have low dispersion and high reproducibility, I think that SD is preferable. SD is the best parameter to describe the dispersion of data objectively. Despite several methods to test the normality, a first descriptive sign of problems dealing with normality is a clear difference between mean and median. Therefore, although not calculated here with only N=5 replicates, in the presence of non-normality, we have 2 ways to report the data: 1)Log-normal distribution (e.g. normality of the log-transformed variable): we can calculate mean and SD on logarithms and do the anti-logarithm (e.g. 10^ or e^) of the values. We obtain the geometric mean and SD. It should be noted that geometric SD should always reported as (geometric SD) and not +-, because it is not an “error” on linear scale. 2)Other not-normal distributions (excluding some specific cases, as Poisson or Lorentz distributions, not treated here): the best way to report the data is median (25th-75th percentiles, called interquartile range, or other percentiles range). It should be noted that the data may be also reported as 95% confidence interval calculated in different ways depending on the distribution, but this argument won’t be treated in detail as not functional to our discussion. In the next weeks, we will see different methods to treat the data, with advantages and disadvantages.