Tuesday, October 1, 2013

95% Confidence Intervals

95% Confidence Intervals

In the area of clinical medicine, there is considerable interest in statistics - and considerable abuse of them. Statistics are not always easy to understand. The easier type of statistic is the descriptive statistic. A cardiac surgery unit can look at all its patients and all the deaths related to surgery and produce a statistic for surgical mortality. This would be a descriptive statistic.
Often statistics are used to tell us about things where we can't reasonably look at everyone - a classic example is an opinion poll. We might want to know how many people will vote Labour at the next election. So we take a random sample of the population and ask them how they will vote at the next election, and see what percentage respond "Labour". 
The random sample by chance are unlikely to vote exactly like the population we're trying to study. So we can calculate a 95% confidence interval. For example, our survey might find that 45% of people will vote Labour, with a calculated 95% confidence interval of 40-50%. What does this mean? 
It means that the chance that the actual percentage of the population (known as the parameter) that will vote Labour is between 40 and 50% is 95%. To put it another way, there is only a 5% chance that the percentage in the population is outside the 40 to 50% range.
Another example might be if we want to find out if people with John Smith Syndrome have a higher serum rhubarb than the general population. The normal range for serum rhubarb is 950-1050, the average being 1000. We take a sample of people with John Smith Syndrome and find their average serum rhubarb is 1075, with 95 CIs of 1051-1099. We can say that it's 95% likely that the serum rhubarb in John Smith Syndrome is above the normal range. However, this statistical significance doesn't mean that the raise in serum rhubarb is clinically significant. It might be that serum rhubarb has to be higher than 1200 to cause a problem. 
There are some caveats. If our sample isn't random, then we cannot make these assumptions. If we know about the respondents social class and age, we may be able to correct for certain biases in selection by mathematical corrections. Also if the respondents do not respond accurately about their voting intentions (they may be loath to admit voting for a particular party), then again we cannot make these assumptions.
So these methods may tell us that the blood pressure of one population is 95% likely to be higher than another. What they cannot tell us would be why there is this difference.

No comments:

Post a Comment