95% confidence interval

Welcome to the Wikiversity learning project about 95% confidence intervals. Many clinical trials for medical treatments report results for increased or decreased risks in treated and control groups. Depending on the number of patients in the study and the variability in results, such differences might either be judged statistically significant or insignificant. Often values are reported as statistically significant with 95% confidence if the observed difference is expected to arise by chance with a probability of less than 5%.

Conceptual introduction

Clinical research studies are often concerned with making correlations between treatments and patient outcomes. For example, clinical studies have been performed to investigate if use of cholesterol-lowering drugs among patients with high serum cholesterol can reduce the risk of heart attacks (Pignone et al., 2000).

Pignone performed a meta-analysis of the results of four randomized, placebo-controlled studies conducted in the 1980s and 1990s involving over 20,000 study participants. On average, there was an observed 30% reduction in the incidence of heart disease among the cholesterol-lowering drug users who were studied compared to the placebo control groups. How can we decide if this 30% reduction is representative of what would happen if millions of people started using cholesterol-lowering drugs?

In addition to the incidence of heart attacks, the clinical research studies reviewed by Pignone et al. reported data for the measured levels of cholesterol in the study participants. This is probably the most common type of research data for statistical analysis. In many cases, measured data for a group of treated study participants must be compared to the data for a placebo-treated group. If the measured blood cholesterol levels were plotted on the horizontal axis and the number of people with each amount of cholesterol was plotted on the vertical axis, a graph with a shape such as those shown in Figure 1 might be generated, so-called "normal" curves. How can we determine if there was a statistically significant difference in cholesterol levels between the treated and control groups? In addition to calculating the difference between the means, we need a standardized numerical method to determine the amount of overlap between the two distributions being compared.

Figure 1. Normal distributions. When comparing two averages (treated vs placebo) we need to know if the variation (average deviation from the mean) is large or small.

Time-to-event methods

A second major type of data for clinical trials comes from studies in which the time to a particular event is recorded. The study by Pignone et al. involved the time from the start of the study until study participants had a heart attack. When the end point is death, such studies make use of “statistical survival analysis” in order to test if a particular treatment causes a reduced death rate.

In the study by Pignone et al, statistical analysis indicated that on the basis of chance alone, the observed 30% reduction in heart attacks could have been inaccurate, but there was only a 1 in 20 chance that the reduced rate of heart disease was actually less than 21% or more than 38%.

For many scientific disciplines, it is traditional to seek large enough study populations so that reported correlations between treatments and outcomes are not expected to have been possible just by chance with confidence of 95% (that is, there should be less than a 1 in 20 statistical chance that random variation in the study outcomes could account for the observed correlation). Confidence intervals are usually either given as percents or as hazard (odds) ratios. (Note: for statisticians, there is a difference between hazard ratios and odds ratios. Conceptually they are similar, so I will not make a distinction here.) The results of Pignone et al. indicate that hazard ratio for heart disease was 0.70 (ratio of heart disease incidence in the treated group divided by that for the control group).

The 95% confidence interval for the hazard ratio was 0.62 to 0.79. If the 95% confidence interval for a study includes 1.0, then there is a better than 1 in 20 chance that random variation in outcome incidence among the study groups (treated and control groups) is what produced the observed correlation between treatment and outcome. In such circumstances (P>0.05), it is common practice to report the observed odds ratio as a trend towards a statistically-significant correlation. If the 95% confidence interval for a study does not include 1.0, then there is a less than 1 in 20 chance that random variation can account for the observed correlation and it is traditional to describe the observed correlation as statistically significant with greater than 95% confidence.

Often the calculated probability (P) that there is no real correlation between treatment and outcome is given in the study results. For example, Shepherd et al performed a study with over 6,000 men to evaluated the use of pravastatin for prevention of heart disease (Shepherd et al., 1995). They reported that there were 248 coronary events (nonfatal myocardial infarction or death from coronary heart disease) in the placebo group, and 174 in the pravastatin group. This result gave a calculated 31 percent reduced rate of heart attack in the treated group and a 95 percent confidence interval of 17 to 43 percent.

The calculated probability that this correlation between pravastatin use and reduced risk of heart attack was due to chance was stated as P < 0.001. In contrast, they also evaluated total mortality in the treated and control groups. They observed a 22 percent reduction in the risk of death in the pravastatin group, but due to the lower rate of deaths compared to heart attacks, the confidence interval was wider (95 percent confidence interval, 0 to 40 percent). The calculated probability that there was a real difference in death rate between the treated and control groups was only P = 0.051.

In general, if the 95% confidence interval when expressed in percent includes 0.0%, then P will be greater than 0.05 and there will be a better than 1 in 20 chance that random variations are responsible for the observed difference between treated and control groups. Thus, Shepherd et al could conclude that there was a statistically significant benefit from pravastatin for preventing heart attacks, but they could only conclude that there was a trend towards a statistically significant (95% confidence) reduction in death due to use of the drug.

Reading

Pignone, M., Phillips, C., and Mulrow, C. (2000). Use of lipid lowering drugs for primary prevention of coronary heart disease: meta-analysis of randomised trials. Bmj 321, 983-986.
Rossouw, J. E., Anderson, G. L., Prentice, R. L., LaCroix, A. Z., Kooperberg, C., Stefanick, M. L., Jackson, R. D., Beresford, S. A., Howard, B. V., Johnson, K. C., et al. (2002). Risks and benefits of estrogen plus progestin in healthy postmenopausal women: principal results From the Women's Health Initiative randomized controlled trial. Jama 288, 321-333.
Shepherd, J., Cobbe, S. M., Ford, I., Isles, C. G., Lorimer, A. R., MacFarlane, P. W., McKillop, J. H., and Packard, C. J. (1995). Prevention of coronary heart disease with pravastatin in men with hypercholesterolemia. West of Scotland Coronary Prevention Study Group. N Engl J Med 333, 1301-1307.

Useful online resources

Confidence interval article at Wikipedia.
Introduction to Medical Statistics at Wikibooks (needs work)
T-tests
Survival analysis