Skip to content


When we try to assess an underlying population we often take samples of that population. Let's try and take a sample using the sample() function in R:

# load meta data
meta <- read.table("./data/gbm_cptac_2021/data_clinical_patient.txt",
                   header = T,

## defined some population of ages
ages <- sample(meta$AGE,20)

If we wanted to take the same random sample we could use the set.seed() function:

## grab the same sample
ages1 <- sample(meta$AGE,20)
ages2 <- sample(meta$AGE,20)

Sampling Error

Not every sample is going to be a true approximation of the underline population. This difference is known as the sampling error. What's assess our sample and see how it stacks up against our population:

  Sample_Mean=mean(ages,na.rm = T),
  Population_Mean=mean(meta$AGE,na.rm = T)
  Sample_Mean Population_Mean
1        57.4        57.88889

Here we note that while similar to our true meta data mean, it is not exact. When we don't know the actual population mean we can get a whole range (or distribution) of means. The standard error of the mean is the measure of that sampling distribution:


Explanation of Terms

  • \(\sigma\) Standard deviation of the sample
  • \(N\) Number of observations in the sample

Math Tip

We can see that increasing the size of the sample, decreases the standard error of the mean.
