dplyr::summarize()
   get_help() docs


Description

The summarize() function is part of the {dplyr} package, which is part of the {tidyverse}.

It is used to summarize tibbles (data frame), such that the resulting tibble has new column(s) and is smaller, retaining only what was needed for summarizing. This function is often used with group_by() to quickly summarize across different groupings at once.

To use this function, you need to either first load the {dplyr} library, or always use the function with dplyr::summarize() notation.

# Load the library
library(dplyr)
# Or, load the full tidyverse:
library(tidyverse)

# Or, use :: notation
dplyr::summarize()

Conceptual Usage

tibble %>% 
  summarize(name_of_new_column = summary statistic calculation)

tibble %>% 
  group_by(grouping column) %>% 
  summarize(name_of_new_column = summary statistic calculation)

Examples

The examples below use the msleep dataset. Learn more about this dataset with get_help("msleep").

# Show the msleep dataset with head()
head(msleep)
## # A tibble: 6 × 11
##   name  genus vore  order conservation sleep_total sleep_rem sleep_cycle awake  brainwt  bodywt
##   <chr> <chr> <chr> <chr> <chr>              <dbl>     <dbl>       <dbl> <dbl>    <dbl>   <dbl>
## 1 Owl … Aotus omni  Prim… <NA>                17         1.8      NA       7    0.0155    0.48 
## 2 Moun… Aplo… herbi Rode… nt                  14.4       2.4      NA       9.6 NA         1.35 
## 3 Grea… Blar… omni  Sori… lc                  14.9       2.3       0.133   9.1  0.00029   0.019
## 4 Cow   Bos   herbi Arti… domesticated         4         0.7       0.667  20    0.423   600    
## 5 Thre… Brad… herbi Pilo… <NA>                14.4       2.2       0.767   9.6 NA         3.85 
## 6 Nort… Call… carni Carn… vu                   8.7       1.4       0.383  15.3 NA        20.5


# Calculate the mean time spent awake
msleep %>% 
  summarize(mean_awake = mean(awake))
## # A tibble: 1 × 1
##   mean_awake
##        <dbl>
## 1       13.6


# Calculate the mean time spent awake for each vore group
msleep %>% 
  group_by(vore) %>%
  summarize(mean_awake = mean(awake))
## # A tibble: 5 × 2
##   vore    mean_awake
##   <chr>        <dbl>
## 1 carni        14.3 
## 2 herbi        14.9 
## 3 insecti       7.48
## 4 omni         13.0 
## 5 <NA>         13.4


# Calculate the mean time spent awake for each combination of vore and conservation groups
msleep %>% 
  group_by(vore, conservation) %>%
  summarize(mean_awake = mean(awake))
## # A tibble: 20 × 3
## # Groups:   vore [5]
##    vore    conservation mean_awake
##    <chr>   <chr>             <dbl>
##  1 carni   cd                21.4 
##  2 carni   domesticated      12.7 
##  3 carni   lc                 9.67
##  4 carni   vu                17.9 
##  5 carni   <NA>              16.0 
##  6 herbi   cd                22.1 
##  7 herbi   domesticated      17.7 
##  8 herbi   en                 9.7 
##  9 herbi   lc                13.8 
## 10 herbi   nt                10.6 
## 11 herbi   vu                17.0 
## 12 herbi   <NA>              11.7 
## 13 insecti en                 5.9 
## 14 insecti lc                 9.95
## 15 insecti <NA>               4.1 
## 16 omni    domesticated      14.9 
## 17 omni    lc                12.1 
## 18 omni    <NA>              13.4 
## 19 <NA>    lc                16.2 
## 20 <NA>    <NA>              11.6


# Calculate the median brainwt
## Caution! Because the brainwt column contains NAs, 
## we need to use the _argument_ `na.rm = TRUE` with `median()`
msleep %>% 
  summarize(median_brainwt = median(brainwt, na.rm = TRUE))
## # A tibble: 1 × 1
##   median_brainwt
##            <dbl>
## 1         0.0118