Summary statistics
   get_help() docs


Description

R comes with several functions (in the included {base} and {stats} packages) that are used to calculate summary statistics from numeric arrays. The greatest hits of R summary statistics functions include the following:

Function What it calculates
mean() The arithmetic mean (average) of all values
median() The median of all values
sd() The standard deviation of all values
sum() The sum of all values
max() The maximum value among all values
min() The minimum value among all values

Note, there is no function to calculate the mode (most commonly appearing value) of a numeric array.


By default, if the array you want to summarize contains any NA values, all functions above will return NA because R is very conservative when it encounters missing data. To ignore NA values during calculations, provide the second argument na.rm = TRUE when using any of the functions.

Conceptual Usage

mean(array to summarize)

mean(array with NAs to summarize, na.rm = TRUE)

Examples

Some examples below use the carnivores dataset. Learn more about this dataset with get_help("carnivores").

# Show the carnivores dataset
carnivores
## # A tibble: 9 × 4
##   name              genus        awake brainwt
##   <chr>             <fct>        <dbl>   <dbl>
## 1 Arctic fox        Vulpes        11.5  0.0445
## 2 Cheetah           Acinonyx      11.9 NA     
## 3 Dog               Canis         13.9  0.07  
## 4 Gray seal         Haliochoerus  17.8  0.325 
## 5 Jaguar            Panthera      13.6  0.157 
## 6 Lion              Panthera      10.5 NA     
## 7 Northern fur seal Callorhinus   15.3 NA     
## 8 Red fox           Vulpes        14.2  0.0504
## 9 Tiger             Panthera       8.2 NA


# Summarize an array directly
mean( c(100, 125, 145, 167) )
## [1] 134.25
median( c(100, 125, 145, 167) )
## [1] 135
sd( c(100, 125, 145, 167) )
## [1] 28.55842
sum( c(100, 125, 145, 167) )
## [1] 537
min( c(100, 125, 145, 167) )
## [1] 100
max( c(100, 125, 145, 167) )
## [1] 167


# Calculate various summary statistics from the `awake` column in `carnivores`
# This column does not contain NAs

mean(carnivores$awake) 
## [1] 12.98889
median(carnivores$awake)
## [1] 13.6
sd(carnivores$awake)
## [1] 2.821544
sum(carnivores$awake)
## [1] 116.9
min(carnivores$awake)
## [1] 8.2
max(carnivores$awake)
## [1] 17.8


# Calculate various summary statistics from the `brainwt` column in `carnivores`
# This column _does_ contain NAs, so we need to use the argument `na.rm=TRUE`.

mean(carnivores$brainwt, na.rm = TRUE) 
## [1] 0.12938
median(carnivores$brainwt, na.rm = TRUE)
## [1] 0.07
sd(carnivores$brainwt, na.rm = TRUE)
## [1] 0.11832
sum(carnivores$brainwt, na.rm = TRUE)
## [1] 0.6469
min(carnivores$brainwt, na.rm = TRUE)
## [1] 0.0445
max(carnivores$brainwt, na.rm = TRUE)
## [1] 0.325


# If you do not use the `na.rm = TRUE` argument when NAs are present, all calculations will be NA
mean(carnivores$brainwt)
## [1] NA
median(carnivores$brainwt)
## [1] NA
sd(carnivores$brainwt)
## [1] NA
sum(carnivores$brainwt)
## [1] NA
min(carnivores$brainwt)
## [1] NA
max(carnivores$brainwt)
## [1] NA