dplyr::summarize()
get_help()
docs
The summarize()
function is part of the {dplyr}
package, which is part of the {tidyverse}
.
It is used to summarize tibbles (data frame), such that the resulting tibble has new column(s) and is smaller, retaining only what was needed for summarizing. This function is often used with group_by()
to quickly summarize across different groupings at once.
To use this function, you need to either first load the {dplyr}
library, or always use the function with dplyr::summarize()
notation.
# Load the library
library(dplyr)
# Or, load the full tidyverse:
library(tidyverse)
# Or, use :: notation
::summarize() dplyr
%>%
tibble summarize(name_of_new_column = summary statistic calculation)
%>%
tibble group_by(grouping column) %>%
summarize(name_of_new_column = summary statistic calculation)
The examples below use the msleep
dataset. Learn more about this dataset with get_help("msleep")
.
# Show the msleep dataset with head()
head(msleep)
## # A tibble: 6 × 11
## name genus vore order conservation sleep_total sleep_rem sleep_cycle awake brainwt bodywt
## <chr> <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Owl … Aotus omni Prim… <NA> 17 1.8 NA 7 0.0155 0.48
## 2 Moun… Aplo… herbi Rode… nt 14.4 2.4 NA 9.6 NA 1.35
## 3 Grea… Blar… omni Sori… lc 14.9 2.3 0.133 9.1 0.00029 0.019
## 4 Cow Bos herbi Arti… domesticated 4 0.7 0.667 20 0.423 600
## 5 Thre… Brad… herbi Pilo… <NA> 14.4 2.2 0.767 9.6 NA 3.85
## 6 Nort… Call… carni Carn… vu 8.7 1.4 0.383 15.3 NA 20.5
# Calculate the mean time spent awake
%>%
msleep summarize(mean_awake = mean(awake))
## # A tibble: 1 × 1
## mean_awake
## <dbl>
## 1 13.6
# Calculate the mean time spent awake for each vore group
%>%
msleep group_by(vore) %>%
summarize(mean_awake = mean(awake))
## # A tibble: 5 × 2
## vore mean_awake
## <chr> <dbl>
## 1 carni 14.3
## 2 herbi 14.9
## 3 insecti 7.48
## 4 omni 13.0
## 5 <NA> 13.4
# Calculate the mean time spent awake for each combination of vore and conservation groups
%>%
msleep group_by(vore, conservation) %>%
summarize(mean_awake = mean(awake))
## # A tibble: 20 × 3
## # Groups: vore [5]
## vore conservation mean_awake
## <chr> <chr> <dbl>
## 1 carni cd 21.4
## 2 carni domesticated 12.7
## 3 carni lc 9.67
## 4 carni vu 17.9
## 5 carni <NA> 16.0
## 6 herbi cd 22.1
## 7 herbi domesticated 17.7
## 8 herbi en 9.7
## 9 herbi lc 13.8
## 10 herbi nt 10.6
## 11 herbi vu 17.0
## 12 herbi <NA> 11.7
## 13 insecti en 5.9
## 14 insecti lc 9.95
## 15 insecti <NA> 4.1
## 16 omni domesticated 14.9
## 17 omni lc 12.1
## 18 omni <NA> 13.4
## 19 <NA> lc 16.2
## 20 <NA> <NA> 11.6
# Calculate the median brainwt
## Caution! Because the brainwt column contains NAs,
## we need to use the _argument_ `na.rm = TRUE` with `median()`
%>%
msleep summarize(median_brainwt = median(brainwt, na.rm = TRUE))
## # A tibble: 1 × 1
## median_brainwt
## <dbl>
## 1 0.0118