{tidyselect}()
helper functions
get_help()
docs
The {tidyselect}
package is part of the {tidyverse}
, and it provides a useful set of helper functions for conveniently (de-)selecting columns from tibbles (data frames). In fact, it is functions in {tidyselect}
which enable much {dplyr}
and {tidyr}
functionality, including selecting columns with dplyr::select()
or specifying columns to pivot with tidyr::pivot_longer()
, and as a consequence all {tidyselect}
functionality is by-definition included when you load core {tidyverse}
packages like {dplyr}
and {tidyr}
. The {introverse}
lead author, notably, prefers to call this functionality “select magic.”
We generally use these helper functions as part of our arguments to functions that deal with column selection, such as dplyr::select()
, dplyr::pull()
, or tidyr::pivot_longer()
. Helper functions include the following:
By adding a minus sign to all function arguments, you can make them deselect instead of selecting the given columns
Function | What it selects | |
---|---|---|
everything() |
All columns, except those already explicitly stated | |
contains() |
All columns that contain a given string argument | |
starts_with() |
All columns that start with a given string, i.e. prefix | |
ends_with() |
All columns that end with a given string, i.e. suffix | |
last_col() |
The last column in the tibble (data frame) | |
matches() |
All columns that match a regular expression (a special type of pattern-matching string) |
To use these function, you need to either first load the {dplyr}
(the {dplyr}
library conveniently includes the {tidyselect}
functionality!) library, or always use the function with dplyr::()
notation.
# Load a core library like dplyr ot tidyr
library(dplyr)
# Or, load the full tidyverse:
library(tidyverse)
# Or, use :: notation, for example with the everything() function
::everything() dplyr
The first examples below use the msleep
dataset. Learn more about this dataset with get_help("msleep")
.
# Show the msleep dataset with head()
head(msleep)
## # A tibble: 6 × 11
## name genus vore order conservation sleep_total sleep_rem sleep_cycle awake brainwt bodywt
## <chr> <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Owl … Aotus omni Prim… <NA> 17 1.8 NA 7 0.0155 0.48
## 2 Moun… Aplo… herbi Rode… nt 14.4 2.4 NA 9.6 NA 1.35
## 3 Grea… Blar… omni Sori… lc 14.9 2.3 0.133 9.1 0.00029 0.019
## 4 Cow Bos herbi Arti… domesticated 4 0.7 0.667 20 0.423 600
## 5 Thre… Brad… herbi Pilo… <NA> 14.4 2.2 0.767 9.6 NA 3.85
## 6 Nort… Call… carni Carn… vu 8.7 1.4 0.383 15.3 NA 20.5
# Use everything() to move the column brainwt to the front
# This code says: first select brainwt, and then _everything else_
%>%
msleep select(brainwt, everything())
## # A tibble: 61 × 11
## brainwt name genus vore order conservation sleep_total sleep_rem sleep_cycle awake
## <dbl> <chr> <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
## 1 0.0155 Owl monkey Aotus omni Prim… <NA> 17 1.8 NA 7
## 2 NA Mountain be… Aplo… herbi Rode… nt 14.4 2.4 NA 9.6
## 3 0.00029 Greater sho… Blar… omni Sori… lc 14.9 2.3 0.133 9.1
## 4 0.423 Cow Bos herbi Arti… domesticated 4 0.7 0.667 20
## 5 NA Three-toed … Brad… herbi Pilo… <NA> 14.4 2.2 0.767 9.6
## 6 NA Northern fu… Call… carni Carn… vu 8.7 1.4 0.383 15.3
## 7 0.07 Dog Canis carni Carn… domesticated 10.1 2.9 0.333 13.9
## 8 0.115 Goat Capri herbi Arti… lc 5.3 0.6 NA 18.7
## 9 0.0055 Guinea pig Cavis herbi Rode… domesticated 9.4 0.8 0.217 14.6
## 10 NA Grivet Cerc… omni Prim… lc 10 0.7 NA 14
## # … with 51 more rows, and 1 more variable: bodywt <dbl>
# Use contains() to keep only columns that contain the letter "w"
%>%
msleep select(contains("w"))
## # A tibble: 61 × 3
## awake brainwt bodywt
## <dbl> <dbl> <dbl>
## 1 7 0.0155 0.48
## 2 9.6 NA 1.35
## 3 9.1 0.00029 0.019
## 4 20 0.423 600
## 5 9.6 NA 3.85
## 6 15.3 NA 20.5
## 7 13.9 0.07 14
## 8 18.7 0.115 33.5
## 9 14.6 0.0055 0.728
## 10 14 NA 4.75
## # … with 51 more rows
# Use contains() to REMOVE columns that contain the letter "w", with a minus sign
%>%
msleep select(-contains("w"))
## # A tibble: 61 × 8
## name genus vore order conservation sleep_total sleep_rem sleep_cycle
## <chr> <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl>
## 1 Owl monkey Aotus omni Prim… <NA> 17 1.8 NA
## 2 Mountain beaver Aplod… herbi Rode… nt 14.4 2.4 NA
## 3 Greater short-tailed shrew Blari… omni Sori… lc 14.9 2.3 0.133
## 4 Cow Bos herbi Arti… domesticated 4 0.7 0.667
## 5 Three-toed sloth Brady… herbi Pilo… <NA> 14.4 2.2 0.767
## 6 Northern fur seal Callo… carni Carn… vu 8.7 1.4 0.383
## 7 Dog Canis carni Carn… domesticated 10.1 2.9 0.333
## 8 Goat Capri herbi Arti… lc 5.3 0.6 NA
## 9 Guinea pig Cavis herbi Rode… domesticated 9.4 0.8 0.217
## 10 Grivet Cerco… omni Prim… lc 10 0.7 NA
## # … with 51 more rows
# Use starts_with() to keep only columns that start with "sleep"
%>%
msleep select(starts_with("sleep"))
## # A tibble: 61 × 3
## sleep_total sleep_rem sleep_cycle
## <dbl> <dbl> <dbl>
## 1 17 1.8 NA
## 2 14.4 2.4 NA
## 3 14.9 2.3 0.133
## 4 4 0.7 0.667
## 5 14.4 2.2 0.767
## 6 8.7 1.4 0.383
## 7 10.1 2.9 0.333
## 8 5.3 0.6 NA
## 9 9.4 0.8 0.217
## 10 10 0.7 NA
## # … with 51 more rows
# Use starts_with() to REMOVE columns that start with "sleep", with a minus sign
%>%
msleep select(-starts_with("sleep"))
## # A tibble: 61 × 8
## name genus vore order conservation awake brainwt bodywt
## <chr> <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl>
## 1 Owl monkey Aotus omni Primates <NA> 7 0.0155 0.48
## 2 Mountain beaver Aplodontia herbi Rodentia nt 9.6 NA 1.35
## 3 Greater short-tailed shrew Blarina omni Soricomo… lc 9.1 0.00029 0.019
## 4 Cow Bos herbi Artiodac… domesticated 20 0.423 600
## 5 Three-toed sloth Bradypus herbi Pilosa <NA> 9.6 NA 3.85
## 6 Northern fur seal Callorhinus carni Carnivora vu 15.3 NA 20.5
## 7 Dog Canis carni Carnivora domesticated 13.9 0.07 14
## 8 Goat Capri herbi Artiodac… lc 18.7 0.115 33.5
## 9 Guinea pig Cavis herbi Rodentia domesticated 14.6 0.0055 0.728
## 10 Grivet Cercopithecus omni Primates lc 14 NA 4.75
## # … with 51 more rows
# Use ends_with() to keep only columns that end with "wt"
%>%
msleep select(ends_with("wt"))
## # A tibble: 61 × 2
## brainwt bodywt
## <dbl> <dbl>
## 1 0.0155 0.48
## 2 NA 1.35
## 3 0.00029 0.019
## 4 0.423 600
## 5 NA 3.85
## 6 NA 20.5
## 7 0.07 14
## 8 0.115 33.5
## 9 0.0055 0.728
## 10 NA 4.75
## # … with 51 more rows
# Use ends_with() to REMOVE columns that end with "wt", with a minus sign
%>%
msleep select(-ends_with("sleep"))
## # A tibble: 61 × 11
## name genus vore order conservation sleep_total sleep_rem sleep_cycle awake brainwt
## <chr> <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Owl monkey Aotus omni Prim… <NA> 17 1.8 NA 7 0.0155
## 2 Mountain be… Aplo… herbi Rode… nt 14.4 2.4 NA 9.6 NA
## 3 Greater sho… Blar… omni Sori… lc 14.9 2.3 0.133 9.1 0.00029
## 4 Cow Bos herbi Arti… domesticated 4 0.7 0.667 20 0.423
## 5 Three-toed … Brad… herbi Pilo… <NA> 14.4 2.2 0.767 9.6 NA
## 6 Northern fu… Call… carni Carn… vu 8.7 1.4 0.383 15.3 NA
## 7 Dog Canis carni Carn… domesticated 10.1 2.9 0.333 13.9 0.07
## 8 Goat Capri herbi Arti… lc 5.3 0.6 NA 18.7 0.115
## 9 Guinea pig Cavis herbi Rode… domesticated 9.4 0.8 0.217 14.6 0.0055
## 10 Grivet Cerc… omni Prim… lc 10 0.7 NA 14 NA
## # … with 51 more rows, and 1 more variable: bodywt <dbl>
# Tidyselect allows us to use colons `:` to select a range of columns
# Select vore through awake
%>%
msleep select(vore:awake)
## # A tibble: 61 × 7
## vore order conservation sleep_total sleep_rem sleep_cycle awake
## <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
## 1 omni Primates <NA> 17 1.8 NA 7
## 2 herbi Rodentia nt 14.4 2.4 NA 9.6
## 3 omni Soricomorpha lc 14.9 2.3 0.133 9.1
## 4 herbi Artiodactyla domesticated 4 0.7 0.667 20
## 5 herbi Pilosa <NA> 14.4 2.2 0.767 9.6
## 6 carni Carnivora vu 8.7 1.4 0.383 15.3
## 7 carni Carnivora domesticated 10.1 2.9 0.333 13.9
## 8 herbi Artiodactyla lc 5.3 0.6 NA 18.7
## 9 herbi Rodentia domesticated 9.4 0.8 0.217 14.6
## 10 omni Primates lc 10 0.7 NA 14
## # … with 51 more rows
# DE-select vore through awake, with a minus sign
%>%
msleep select(-(vore:awake))
## # A tibble: 61 × 4
## name genus brainwt bodywt
## <chr> <chr> <dbl> <dbl>
## 1 Owl monkey Aotus 0.0155 0.48
## 2 Mountain beaver Aplodontia NA 1.35
## 3 Greater short-tailed shrew Blarina 0.00029 0.019
## 4 Cow Bos 0.423 600
## 5 Three-toed sloth Bradypus NA 3.85
## 6 Northern fur seal Callorhinus NA 20.5
## 7 Dog Canis 0.07 14
## 8 Goat Capri 0.115 33.5
## 9 Guinea pig Cavis 0.0055 0.728
## 10 Grivet Cercopithecus NA 4.75
## # … with 51 more rows
# Use last_col() to keep only the last-appearing column in msleep
%>%
msleep select(last_col())
## # A tibble: 61 × 1
## bodywt
## <dbl>
## 1 0.48
## 2 1.35
## 3 0.019
## 4 600
## 5 3.85
## 6 20.5
## 7 14
## 8 33.5
## 9 0.728
## 10 4.75
## # … with 51 more rows
# Use last_col() to keep the columns `sleep_cycle` through the last-appearing column in msleep
%>%
msleep select(sleep_cycle:last_col())
## # A tibble: 61 × 4
## sleep_cycle awake brainwt bodywt
## <dbl> <dbl> <dbl> <dbl>
## 1 NA 7 0.0155 0.48
## 2 NA 9.6 NA 1.35
## 3 0.133 9.1 0.00029 0.019
## 4 0.667 20 0.423 600
## 5 0.767 9.6 NA 3.85
## 6 0.383 15.3 NA 20.5
## 7 0.333 13.9 0.07 14
## 8 NA 18.7 0.115 33.5
## 9 0.217 14.6 0.0055 0.728
## 10 NA 14 NA 4.75
## # … with 51 more rows
# ADVANCED EXAMPLE:
# Use matches() to keep only columns that match the regex `\\w+_\\w+`
%>%
msleep select(matches("\\w+_\\w+"))
## # A tibble: 61 × 3
## sleep_total sleep_rem sleep_cycle
## <dbl> <dbl> <dbl>
## 1 17 1.8 NA
## 2 14.4 2.4 NA
## 3 14.9 2.3 0.133
## 4 4 0.7 0.667
## 5 14.4 2.2 0.767
## 6 8.7 1.4 0.383
## 7 10.1 2.9 0.333
## 8 5.3 0.6 NA
## 9 9.4 0.8 0.217
## 10 10 0.7 NA
## # … with 51 more rows
The remaining examples use the wide dataset billboard
, which comes from the {tidyr}
package and contains song rankings for the Billboard top 100 songs in the year 2000.
# Show billboard dataset
billboard
## # A tibble: 317 × 79
## artist track date.entered wk1 wk2 wk3 wk4 wk5 wk6 wk7 wk8 wk9 wk10 wk11
## <chr> <chr> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 2 Pac Baby… 2000-02-26 87 82 72 77 87 94 99 NA NA NA NA
## 2 2Ge+her The … 2000-09-02 91 87 92 NA NA NA NA NA NA NA NA
## 3 3 Door… Kryp… 2000-04-08 81 70 68 67 66 57 54 53 51 51 51
## 4 3 Door… Loser 2000-10-21 76 76 72 69 67 65 55 59 62 61 61
## 5 504 Bo… Wobb… 2000-04-15 57 34 25 17 17 31 36 49 53 57 64
## 6 98^0 Give… 2000-08-19 51 39 34 26 26 19 2 2 3 6 7
## 7 A*Teens Danc… 2000-07-08 97 97 96 95 100 NA NA NA NA NA NA
## 8 Aaliyah I Do… 2000-01-29 84 62 51 41 38 35 35 38 38 36 37
## 9 Aaliyah Try … 2000-03-18 59 53 38 28 21 18 16 14 12 10 9
## 10 Adams,… Open… 2000-08-26 76 76 74 69 68 67 61 58 57 59 66
## # … with 307 more rows, and 65 more variables: wk12 <dbl>, wk13 <dbl>, wk14 <dbl>, wk15 <dbl>,
## # wk16 <dbl>, wk17 <dbl>, wk18 <dbl>, wk19 <dbl>, wk20 <dbl>, wk21 <dbl>, wk22 <dbl>,
## # wk23 <dbl>, wk24 <dbl>, wk25 <dbl>, wk26 <dbl>, wk27 <dbl>, wk28 <dbl>, wk29 <dbl>,
## # wk30 <dbl>, wk31 <dbl>, wk32 <dbl>, wk33 <dbl>, wk34 <dbl>, wk35 <dbl>, wk36 <dbl>,
## # wk37 <dbl>, wk38 <dbl>, wk39 <dbl>, wk40 <dbl>, wk41 <dbl>, wk42 <dbl>, wk43 <dbl>,
## # wk44 <dbl>, wk45 <dbl>, wk46 <dbl>, wk47 <dbl>, wk48 <dbl>, wk49 <dbl>, wk50 <dbl>,
## # wk51 <dbl>, wk52 <dbl>, wk53 <dbl>, wk54 <dbl>, wk55 <dbl>, wk56 <dbl>, wk57 <dbl>, …
# Use tidyr::pivot_longer and starts_with to pivot all columns that start with "wk"
%>%
billboard pivot_longer(starts_with("wk"),
names_to = "week",
values_to = "song_ranking")
## # A tibble: 24,092 × 5
## artist track date.entered week song_ranking
## <chr> <chr> <date> <chr> <dbl>
## 1 2 Pac Baby Don't Cry (Keep... 2000-02-26 wk1 87
## 2 2 Pac Baby Don't Cry (Keep... 2000-02-26 wk2 82
## 3 2 Pac Baby Don't Cry (Keep... 2000-02-26 wk3 72
## 4 2 Pac Baby Don't Cry (Keep... 2000-02-26 wk4 77
## 5 2 Pac Baby Don't Cry (Keep... 2000-02-26 wk5 87
## 6 2 Pac Baby Don't Cry (Keep... 2000-02-26 wk6 94
## 7 2 Pac Baby Don't Cry (Keep... 2000-02-26 wk7 99
## 8 2 Pac Baby Don't Cry (Keep... 2000-02-26 wk8 NA
## 9 2 Pac Baby Don't Cry (Keep... 2000-02-26 wk9 NA
## 10 2 Pac Baby Don't Cry (Keep... 2000-02-26 wk10 NA
## # … with 24,082 more rows
# Use ends_with to select artist, track, and columns ending in the number 5
%>%
billboard select(artist, track, ends_with("5"))
## # A tibble: 317 × 10
## artist track wk5 wk15 wk25 wk35 wk45 wk55 wk65 wk75
## <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <lgl>
## 1 2 Pac Baby Don't Cry (Keep... 87 NA NA NA NA NA NA NA
## 2 2Ge+her The Hardest Part Of ... NA NA NA NA NA NA NA NA
## 3 3 Doors Down Kryptonite 66 38 6 4 17 NA NA NA
## 4 3 Doors Down Loser 67 72 NA NA NA NA NA NA
## 5 504 Boyz Wobble Wobble 17 78 NA NA NA NA NA NA
## 6 98^0 Give Me Just One Nig... 26 47 NA NA NA NA NA NA
## 7 A*Teens Dancing Queen 100 NA NA NA NA NA NA NA
## 8 Aaliyah I Don't Wanna 38 61 NA NA NA NA NA NA
## 9 Aaliyah Try Again 21 2 13 NA NA NA NA NA
## 10 Adams, Yolanda Open My Heart 68 59 NA NA NA NA NA NA
## # … with 307 more rows