Confidence Intervals
Confidence Intervals
Estimating the mean from a sample is going to have some fluctuation defined by the standard error. We can define a range or confidence interval which we expect to contain the true mean. Often we report a 95% confidence interval. This interval is defined by plus or minus 1.96 times the standard error:
\[ -1.96\frac{\sigma}{\sqrt{N}} \le \mu \le +1.96\frac{\sigma}{\sqrt{N}}\]
Term Definitions
- \(\sigma\) Standard deviation of the sample
- \(N\) Number of observations in the sample
- \(\mu\) sample mean
Let's try this with our sample:
library(tidyverse)
# load meta data
meta <- read.table("./data/gbm_cptac_2021/data_clinical_patient.txt",
header = T,
sep="\t")
## defined some population of ages
ages <- sample(meta$AGE,20)
## data frame of results
summary <- data.frame(
Sample_Mean=mean(ages,na.rm = T),
Standard_Error=sd(ages,na.rm = T)/sqrt(length(ages[!is.na(ages)])),
Lower_Bound_CI = mean(ages,na.rm = T) - 1.96*(sd(ages,na.rm = T)/sqrt(length(ages[!is.na(ages)]))),
Upper_Bound_CI = mean(ages,na.rm = T) + 1.96*(sd(ages,na.rm = T)/sqrt(length(ages[!is.na(ages)])))
)
summary
Sample_Mean Standard_Error Lower_Bound_CI Upper_Bound_CI
1 61.35 2.316162 56.81032 65.88968
Note
So we are 95% confident that the true mean lies somewhere between 56.81032
and 65.88968