martedì 2 aprile 2013

An overall glance to the data

In figure, I report the mean and standard deviation (SD) value for each “native” and normalized condition. It should be noted that the normalization to 1 of controls makes the relative effect of a toxicant clearer, and some differences exist among experiments (e.g. experiment 1 has more pronounced concentration-effect).



Why SD and not standard error (SE)? SE is often reported because the variability of a sample seems lower, but in my opinion it is not the best way to report the dispersion. SE=SD/sqrt(N), where N is the number of data included in a sample (in our case n=5 per experiment and per condition). SE represents the variability of the mean around the mean of the means in sampling theory, and therefore it is an estimation of the distance between the found experimental mean and the expected mean. It is not the dispersion of a sample. Since SE depends on N, it should be noted that if N is variable among several samples, SE has not the same “weight”. Therefore, although the report of SE in tables/graphs gives the idea that the experimental data have low dispersion and high reproducibility, I think that SD is preferable. SD is the best parameter to describe the dispersion of data objectively. Despite several methods to test the normality, a first descriptive sign of problems dealing with normality is a clear difference between mean and median. Therefore, although not calculated here with only N=5 replicates, in the presence of non-normality, we have 2 ways to report the data: 1)Log-normal distribution (e.g. normality of the log-transformed variable): we can calculate mean and SD on logarithms and do the anti-logarithm (e.g. 10^ or e^) of the values. We obtain the geometric mean and SD. It should be noted that geometric SD should always reported as (geometric SD) and not +-, because it is not an “error” on linear scale. 2)Other not-normal distributions (excluding some specific cases, as Poisson or Lorentz distributions, not treated here): the best way to report the data is median (25th-75th percentiles, called interquartile range, or other percentiles range). It should be noted that the data may be also reported as 95% confidence interval calculated in different ways depending on the distribution, but this argument won’t be treated in detail as not functional to our discussion. In the next weeks, we will see different methods to treat the data, with advantages and disadvantages.

Nessun commento:

Posta un commento