Skip to content

Quantitative Variables

Quantitaive variables are numerical data: so variables such as height, weight, age. We can describe these variables with the following terms:

Explanation of Terms

  • Minimum : smallest value in your variable
  • Median : middle value in your variable
  • Mean : average value of your variable
  • Max : largest value in your variable
  • Count : how many values are in your variable
  • Standard deviation : measure of the spread of your variable

Let's see how to do this in our code:

library(tidyverse)
meta <- read.table("./data/gbm_cptac_2021/data_clinical_patient.txt",
                     header = T,
                   sep="\t")

height.sum <- meta %>%
  summarise( 
    minimum = min(HEIGHT, na.rm = T),
    median = median(HEIGHT, na.rm = T),
    mean = mean(HEIGHT, na.rm = T),
    max = max(HEIGHT, na.rm = T),
    count = length(HEIGHT[!is.na(HEIGHT)]),
    SD = sd(HEIGHT, na.rm = T))

height.sum
  minimum median     mean max count       SD
1     150    170 169.6768 196    99 9.875581

Note

You'll note here that we explicitly remove NA values to calculate these descriptive statistics.

Now that we know how to calculate our descriptive statistics, let's try and visualize our numeric data:

ggplot(meta, aes(x=HEIGHT)) + 
  geom_histogram(fill="lightpink")+
  theme_bw()+
  labs(
    x="Height",
    y="Frequency",
    title=" Histogram of Height"
  )

References