Summary statistics
get_help()
docs
R
comes with several functions (in the included {base}
and {stats}
packages) that are used to calculate summary statistics from numeric arrays. The greatest hits of R summary statistics functions include the following:
Function | What it calculates |
---|---|
mean() |
The arithmetic mean (average) of all values |
median() |
The median of all values |
sd() |
The standard deviation of all values |
sum() |
The sum of all values |
max() |
The maximum value among all values |
min() |
The minimum value among all values |
Note, there is no function to calculate the mode (most commonly appearing value) of a numeric array.
By default, if the array you want to summarize contains any NA
values, all functions above will return NA
because R
is very conservative when it encounters missing data. To ignore NA
values during calculations, provide the second argument na.rm = TRUE
when using any of the functions.
mean(array to summarize)
mean(array with NAs to summarize, na.rm = TRUE)
Some examples below use the carnivores
dataset. Learn more about this dataset with get_help("carnivores")
.
# Show the carnivores dataset
carnivores
## # A tibble: 9 × 4
## name genus awake brainwt
## <chr> <fct> <dbl> <dbl>
## 1 Arctic fox Vulpes 11.5 0.0445
## 2 Cheetah Acinonyx 11.9 NA
## 3 Dog Canis 13.9 0.07
## 4 Gray seal Haliochoerus 17.8 0.325
## 5 Jaguar Panthera 13.6 0.157
## 6 Lion Panthera 10.5 NA
## 7 Northern fur seal Callorhinus 15.3 NA
## 8 Red fox Vulpes 14.2 0.0504
## 9 Tiger Panthera 8.2 NA
# Summarize an array directly
mean( c(100, 125, 145, 167) )
## [1] 134.25
median( c(100, 125, 145, 167) )
## [1] 135
sd( c(100, 125, 145, 167) )
## [1] 28.55842
sum( c(100, 125, 145, 167) )
## [1] 537
min( c(100, 125, 145, 167) )
## [1] 100
max( c(100, 125, 145, 167) )
## [1] 167
# Calculate various summary statistics from the `awake` column in `carnivores`
# This column does not contain NAs
mean(carnivores$awake)
## [1] 12.98889
median(carnivores$awake)
## [1] 13.6
sd(carnivores$awake)
## [1] 2.821544
sum(carnivores$awake)
## [1] 116.9
min(carnivores$awake)
## [1] 8.2
max(carnivores$awake)
## [1] 17.8
# Calculate various summary statistics from the `brainwt` column in `carnivores`
# This column _does_ contain NAs, so we need to use the argument `na.rm=TRUE`.
mean(carnivores$brainwt, na.rm = TRUE)
## [1] 0.12938
median(carnivores$brainwt, na.rm = TRUE)
## [1] 0.07
sd(carnivores$brainwt, na.rm = TRUE)
## [1] 0.11832
sum(carnivores$brainwt, na.rm = TRUE)
## [1] 0.6469
min(carnivores$brainwt, na.rm = TRUE)
## [1] 0.0445
max(carnivores$brainwt, na.rm = TRUE)
## [1] 0.325
# If you do not use the `na.rm = TRUE` argument when NAs are present, all calculations will be NA
mean(carnivores$brainwt)
## [1] NA
median(carnivores$brainwt)
## [1] NA
sd(carnivores$brainwt)
## [1] NA
sum(carnivores$brainwt)
## [1] NA
min(carnivores$brainwt)
## [1] NA
max(carnivores$brainwt)
## [1] NA