{tidyselect}() helper functions
   get_help() docs


Description

The {tidyselect} package is part of the {tidyverse}, and it provides a useful set of helper functions for conveniently (de-)selecting columns from tibbles (data frames). In fact, it is functions in {tidyselect} which enable much {dplyr} and {tidyr} functionality, including selecting columns with dplyr::select() or specifying columns to pivot with tidyr::pivot_longer(), and as a consequence all {tidyselect} functionality is by-definition included when you load core {tidyverse} packages like {dplyr} and {tidyr}. The {introverse} lead author, notably, prefers to call this functionality “select magic.”

We generally use these helper functions as part of our arguments to functions that deal with column selection, such as dplyr::select(), dplyr::pull(), or tidyr::pivot_longer(). Helper functions include the following:

By adding a minus sign to all function arguments, you can make them deselect instead of selecting the given columns

Function What it selects
everything() All columns, except those already explicitly stated
contains() All columns that contain a given string argument
starts_with() All columns that start with a given string, i.e. prefix
ends_with() All columns that end with a given string, i.e. suffix
last_col() The last column in the tibble (data frame)
matches() All columns that match a regular expression (a special type of pattern-matching string)

To use these function, you need to either first load the {dplyr} (the {dplyr} library conveniently includes the {tidyselect} functionality!) library, or always use the function with dplyr::() notation.

# Load a core library like dplyr ot tidyr
library(dplyr)
# Or, load the full tidyverse:
library(tidyverse)

# Or, use :: notation, for example with the everything() function
dplyr::everything()

Examples

The first examples below use the msleep dataset. Learn more about this dataset with get_help("msleep").

# Show the msleep dataset with head()
head(msleep)
## # A tibble: 6 × 11
##   name  genus vore  order conservation sleep_total sleep_rem sleep_cycle awake  brainwt  bodywt
##   <chr> <chr> <chr> <chr> <chr>              <dbl>     <dbl>       <dbl> <dbl>    <dbl>   <dbl>
## 1 Owl … Aotus omni  Prim… <NA>                17         1.8      NA       7    0.0155    0.48 
## 2 Moun… Aplo… herbi Rode… nt                  14.4       2.4      NA       9.6 NA         1.35 
## 3 Grea… Blar… omni  Sori… lc                  14.9       2.3       0.133   9.1  0.00029   0.019
## 4 Cow   Bos   herbi Arti… domesticated         4         0.7       0.667  20    0.423   600    
## 5 Thre… Brad… herbi Pilo… <NA>                14.4       2.2       0.767   9.6 NA         3.85 
## 6 Nort… Call… carni Carn… vu                   8.7       1.4       0.383  15.3 NA        20.5


# Use everything() to move the column brainwt to the front
# This code says: first select brainwt, and then _everything else_
msleep %>% 
  select(brainwt, everything())
## # A tibble: 61 × 11
##     brainwt name         genus vore  order conservation sleep_total sleep_rem sleep_cycle awake
##       <dbl> <chr>        <chr> <chr> <chr> <chr>              <dbl>     <dbl>       <dbl> <dbl>
##  1  0.0155  Owl monkey   Aotus omni  Prim… <NA>                17         1.8      NA       7  
##  2 NA       Mountain be… Aplo… herbi Rode… nt                  14.4       2.4      NA       9.6
##  3  0.00029 Greater sho… Blar… omni  Sori… lc                  14.9       2.3       0.133   9.1
##  4  0.423   Cow          Bos   herbi Arti… domesticated         4         0.7       0.667  20  
##  5 NA       Three-toed … Brad… herbi Pilo… <NA>                14.4       2.2       0.767   9.6
##  6 NA       Northern fu… Call… carni Carn… vu                   8.7       1.4       0.383  15.3
##  7  0.07    Dog          Canis carni Carn… domesticated        10.1       2.9       0.333  13.9
##  8  0.115   Goat         Capri herbi Arti… lc                   5.3       0.6      NA      18.7
##  9  0.0055  Guinea pig   Cavis herbi Rode… domesticated         9.4       0.8       0.217  14.6
## 10 NA       Grivet       Cerc… omni  Prim… lc                  10         0.7      NA      14  
## # … with 51 more rows, and 1 more variable: bodywt <dbl>


# Use contains() to keep only columns that contain the letter "w"
msleep %>%
  select(contains("w"))
## # A tibble: 61 × 3
##    awake  brainwt  bodywt
##    <dbl>    <dbl>   <dbl>
##  1   7    0.0155    0.48 
##  2   9.6 NA         1.35 
##  3   9.1  0.00029   0.019
##  4  20    0.423   600    
##  5   9.6 NA         3.85 
##  6  15.3 NA        20.5  
##  7  13.9  0.07     14    
##  8  18.7  0.115    33.5  
##  9  14.6  0.0055    0.728
## 10  14   NA         4.75 
## # … with 51 more rows


# Use contains() to REMOVE columns that contain the letter "w", with a minus sign
msleep %>%
  select(-contains("w"))
## # A tibble: 61 × 8
##    name                       genus  vore  order conservation sleep_total sleep_rem sleep_cycle
##    <chr>                      <chr>  <chr> <chr> <chr>              <dbl>     <dbl>       <dbl>
##  1 Owl monkey                 Aotus  omni  Prim… <NA>                17         1.8      NA    
##  2 Mountain beaver            Aplod… herbi Rode… nt                  14.4       2.4      NA    
##  3 Greater short-tailed shrew Blari… omni  Sori… lc                  14.9       2.3       0.133
##  4 Cow                        Bos    herbi Arti… domesticated         4         0.7       0.667
##  5 Three-toed sloth           Brady… herbi Pilo… <NA>                14.4       2.2       0.767
##  6 Northern fur seal          Callo… carni Carn… vu                   8.7       1.4       0.383
##  7 Dog                        Canis  carni Carn… domesticated        10.1       2.9       0.333
##  8 Goat                       Capri  herbi Arti… lc                   5.3       0.6      NA    
##  9 Guinea pig                 Cavis  herbi Rode… domesticated         9.4       0.8       0.217
## 10 Grivet                     Cerco… omni  Prim… lc                  10         0.7      NA    
## # … with 51 more rows


# Use starts_with() to keep only columns that start with "sleep"
msleep %>%
  select(starts_with("sleep"))
## # A tibble: 61 × 3
##    sleep_total sleep_rem sleep_cycle
##          <dbl>     <dbl>       <dbl>
##  1        17         1.8      NA    
##  2        14.4       2.4      NA    
##  3        14.9       2.3       0.133
##  4         4         0.7       0.667
##  5        14.4       2.2       0.767
##  6         8.7       1.4       0.383
##  7        10.1       2.9       0.333
##  8         5.3       0.6      NA    
##  9         9.4       0.8       0.217
## 10        10         0.7      NA    
## # … with 51 more rows


# Use starts_with() to REMOVE columns that start with "sleep", with a minus sign
msleep %>%
  select(-starts_with("sleep"))
## # A tibble: 61 × 8
##    name                       genus         vore  order     conservation awake  brainwt  bodywt
##    <chr>                      <chr>         <chr> <chr>     <chr>        <dbl>    <dbl>   <dbl>
##  1 Owl monkey                 Aotus         omni  Primates  <NA>           7    0.0155    0.48 
##  2 Mountain beaver            Aplodontia    herbi Rodentia  nt             9.6 NA         1.35 
##  3 Greater short-tailed shrew Blarina       omni  Soricomo… lc             9.1  0.00029   0.019
##  4 Cow                        Bos           herbi Artiodac… domesticated  20    0.423   600    
##  5 Three-toed sloth           Bradypus      herbi Pilosa    <NA>           9.6 NA         3.85 
##  6 Northern fur seal          Callorhinus   carni Carnivora vu            15.3 NA        20.5  
##  7 Dog                        Canis         carni Carnivora domesticated  13.9  0.07     14    
##  8 Goat                       Capri         herbi Artiodac… lc            18.7  0.115    33.5  
##  9 Guinea pig                 Cavis         herbi Rodentia  domesticated  14.6  0.0055    0.728
## 10 Grivet                     Cercopithecus omni  Primates  lc            14   NA         4.75 
## # … with 51 more rows


# Use ends_with() to keep only columns that end with "wt"
msleep %>%
  select(ends_with("wt"))
## # A tibble: 61 × 2
##     brainwt  bodywt
##       <dbl>   <dbl>
##  1  0.0155    0.48 
##  2 NA         1.35 
##  3  0.00029   0.019
##  4  0.423   600    
##  5 NA         3.85 
##  6 NA        20.5  
##  7  0.07     14    
##  8  0.115    33.5  
##  9  0.0055    0.728
## 10 NA         4.75 
## # … with 51 more rows


# Use ends_with() to REMOVE columns that end with "wt", with a minus sign
msleep %>%
  select(-ends_with("sleep"))
## # A tibble: 61 × 11
##    name         genus vore  order conservation sleep_total sleep_rem sleep_cycle awake  brainwt
##    <chr>        <chr> <chr> <chr> <chr>              <dbl>     <dbl>       <dbl> <dbl>    <dbl>
##  1 Owl monkey   Aotus omni  Prim… <NA>                17         1.8      NA       7    0.0155 
##  2 Mountain be… Aplo… herbi Rode… nt                  14.4       2.4      NA       9.6 NA      
##  3 Greater sho… Blar… omni  Sori… lc                  14.9       2.3       0.133   9.1  0.00029
##  4 Cow          Bos   herbi Arti… domesticated         4         0.7       0.667  20    0.423  
##  5 Three-toed … Brad… herbi Pilo… <NA>                14.4       2.2       0.767   9.6 NA      
##  6 Northern fu… Call… carni Carn… vu                   8.7       1.4       0.383  15.3 NA      
##  7 Dog          Canis carni Carn… domesticated        10.1       2.9       0.333  13.9  0.07   
##  8 Goat         Capri herbi Arti… lc                   5.3       0.6      NA      18.7  0.115  
##  9 Guinea pig   Cavis herbi Rode… domesticated         9.4       0.8       0.217  14.6  0.0055 
## 10 Grivet       Cerc… omni  Prim… lc                  10         0.7      NA      14   NA      
## # … with 51 more rows, and 1 more variable: bodywt <dbl>


# Tidyselect allows us to use colons `:` to select a range of columns
# Select vore through awake
msleep %>%
  select(vore:awake)
## # A tibble: 61 × 7
##    vore  order        conservation sleep_total sleep_rem sleep_cycle awake
##    <chr> <chr>        <chr>              <dbl>     <dbl>       <dbl> <dbl>
##  1 omni  Primates     <NA>                17         1.8      NA       7  
##  2 herbi Rodentia     nt                  14.4       2.4      NA       9.6
##  3 omni  Soricomorpha lc                  14.9       2.3       0.133   9.1
##  4 herbi Artiodactyla domesticated         4         0.7       0.667  20  
##  5 herbi Pilosa       <NA>                14.4       2.2       0.767   9.6
##  6 carni Carnivora    vu                   8.7       1.4       0.383  15.3
##  7 carni Carnivora    domesticated        10.1       2.9       0.333  13.9
##  8 herbi Artiodactyla lc                   5.3       0.6      NA      18.7
##  9 herbi Rodentia     domesticated         9.4       0.8       0.217  14.6
## 10 omni  Primates     lc                  10         0.7      NA      14  
## # … with 51 more rows


# DE-select vore through awake, with a minus sign
msleep %>%
  select(-(vore:awake))
## # A tibble: 61 × 4
##    name                       genus          brainwt  bodywt
##    <chr>                      <chr>            <dbl>   <dbl>
##  1 Owl monkey                 Aotus          0.0155    0.48 
##  2 Mountain beaver            Aplodontia    NA         1.35 
##  3 Greater short-tailed shrew Blarina        0.00029   0.019
##  4 Cow                        Bos            0.423   600    
##  5 Three-toed sloth           Bradypus      NA         3.85 
##  6 Northern fur seal          Callorhinus   NA        20.5  
##  7 Dog                        Canis          0.07     14    
##  8 Goat                       Capri          0.115    33.5  
##  9 Guinea pig                 Cavis          0.0055    0.728
## 10 Grivet                     Cercopithecus NA         4.75 
## # … with 51 more rows


# Use last_col() to keep only the last-appearing column in msleep
msleep %>%
  select(last_col())
## # A tibble: 61 × 1
##     bodywt
##      <dbl>
##  1   0.48 
##  2   1.35 
##  3   0.019
##  4 600    
##  5   3.85 
##  6  20.5  
##  7  14    
##  8  33.5  
##  9   0.728
## 10   4.75 
## # … with 51 more rows


# Use last_col() to keep the columns `sleep_cycle` through the last-appearing column in msleep
msleep %>%
  select(sleep_cycle:last_col())
## # A tibble: 61 × 4
##    sleep_cycle awake  brainwt  bodywt
##          <dbl> <dbl>    <dbl>   <dbl>
##  1      NA       7    0.0155    0.48 
##  2      NA       9.6 NA         1.35 
##  3       0.133   9.1  0.00029   0.019
##  4       0.667  20    0.423   600    
##  5       0.767   9.6 NA         3.85 
##  6       0.383  15.3 NA        20.5  
##  7       0.333  13.9  0.07     14    
##  8      NA      18.7  0.115    33.5  
##  9       0.217  14.6  0.0055    0.728
## 10      NA      14   NA         4.75 
## # … with 51 more rows


# ADVANCED EXAMPLE:
# Use matches() to keep only columns that match the regex `\\w+_\\w+`
msleep %>%
  select(matches("\\w+_\\w+"))
## # A tibble: 61 × 3
##    sleep_total sleep_rem sleep_cycle
##          <dbl>     <dbl>       <dbl>
##  1        17         1.8      NA    
##  2        14.4       2.4      NA    
##  3        14.9       2.3       0.133
##  4         4         0.7       0.667
##  5        14.4       2.2       0.767
##  6         8.7       1.4       0.383
##  7        10.1       2.9       0.333
##  8         5.3       0.6      NA    
##  9         9.4       0.8       0.217
## 10        10         0.7      NA    
## # … with 51 more rows


The remaining examples use the wide dataset billboard, which comes from the {tidyr} package and contains song rankings for the Billboard top 100 songs in the year 2000.

# Show billboard dataset
billboard
## # A tibble: 317 × 79
##    artist  track date.entered   wk1   wk2   wk3   wk4   wk5   wk6   wk7   wk8   wk9  wk10  wk11
##    <chr>   <chr> <date>       <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
##  1 2 Pac   Baby… 2000-02-26      87    82    72    77    87    94    99    NA    NA    NA    NA
##  2 2Ge+her The … 2000-09-02      91    87    92    NA    NA    NA    NA    NA    NA    NA    NA
##  3 3 Door… Kryp… 2000-04-08      81    70    68    67    66    57    54    53    51    51    51
##  4 3 Door… Loser 2000-10-21      76    76    72    69    67    65    55    59    62    61    61
##  5 504 Bo… Wobb… 2000-04-15      57    34    25    17    17    31    36    49    53    57    64
##  6 98^0    Give… 2000-08-19      51    39    34    26    26    19     2     2     3     6     7
##  7 A*Teens Danc… 2000-07-08      97    97    96    95   100    NA    NA    NA    NA    NA    NA
##  8 Aaliyah I Do… 2000-01-29      84    62    51    41    38    35    35    38    38    36    37
##  9 Aaliyah Try … 2000-03-18      59    53    38    28    21    18    16    14    12    10     9
## 10 Adams,… Open… 2000-08-26      76    76    74    69    68    67    61    58    57    59    66
## # … with 307 more rows, and 65 more variables: wk12 <dbl>, wk13 <dbl>, wk14 <dbl>, wk15 <dbl>,
## #   wk16 <dbl>, wk17 <dbl>, wk18 <dbl>, wk19 <dbl>, wk20 <dbl>, wk21 <dbl>, wk22 <dbl>,
## #   wk23 <dbl>, wk24 <dbl>, wk25 <dbl>, wk26 <dbl>, wk27 <dbl>, wk28 <dbl>, wk29 <dbl>,
## #   wk30 <dbl>, wk31 <dbl>, wk32 <dbl>, wk33 <dbl>, wk34 <dbl>, wk35 <dbl>, wk36 <dbl>,
## #   wk37 <dbl>, wk38 <dbl>, wk39 <dbl>, wk40 <dbl>, wk41 <dbl>, wk42 <dbl>, wk43 <dbl>,
## #   wk44 <dbl>, wk45 <dbl>, wk46 <dbl>, wk47 <dbl>, wk48 <dbl>, wk49 <dbl>, wk50 <dbl>,
## #   wk51 <dbl>, wk52 <dbl>, wk53 <dbl>, wk54 <dbl>, wk55 <dbl>, wk56 <dbl>, wk57 <dbl>, …


# Use tidyr::pivot_longer and starts_with to pivot all columns that start with "wk"
billboard %>%
  pivot_longer(starts_with("wk"),
               names_to = "week", 
               values_to = "song_ranking")
## # A tibble: 24,092 × 5
##    artist track                   date.entered week  song_ranking
##    <chr>  <chr>                   <date>       <chr>        <dbl>
##  1 2 Pac  Baby Don't Cry (Keep... 2000-02-26   wk1             87
##  2 2 Pac  Baby Don't Cry (Keep... 2000-02-26   wk2             82
##  3 2 Pac  Baby Don't Cry (Keep... 2000-02-26   wk3             72
##  4 2 Pac  Baby Don't Cry (Keep... 2000-02-26   wk4             77
##  5 2 Pac  Baby Don't Cry (Keep... 2000-02-26   wk5             87
##  6 2 Pac  Baby Don't Cry (Keep... 2000-02-26   wk6             94
##  7 2 Pac  Baby Don't Cry (Keep... 2000-02-26   wk7             99
##  8 2 Pac  Baby Don't Cry (Keep... 2000-02-26   wk8             NA
##  9 2 Pac  Baby Don't Cry (Keep... 2000-02-26   wk9             NA
## 10 2 Pac  Baby Don't Cry (Keep... 2000-02-26   wk10            NA
## # … with 24,082 more rows


# Use ends_with to select artist, track, and columns ending in the number 5
billboard %>%
  select(artist, track, ends_with("5"))
## # A tibble: 317 × 10
##    artist         track                     wk5  wk15  wk25  wk35  wk45  wk55  wk65 wk75 
##    <chr>          <chr>                   <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <lgl>
##  1 2 Pac          Baby Don't Cry (Keep...    87    NA    NA    NA    NA    NA    NA NA   
##  2 2Ge+her        The Hardest Part Of ...    NA    NA    NA    NA    NA    NA    NA NA   
##  3 3 Doors Down   Kryptonite                 66    38     6     4    17    NA    NA NA   
##  4 3 Doors Down   Loser                      67    72    NA    NA    NA    NA    NA NA   
##  5 504 Boyz       Wobble Wobble              17    78    NA    NA    NA    NA    NA NA   
##  6 98^0           Give Me Just One Nig...    26    47    NA    NA    NA    NA    NA NA   
##  7 A*Teens        Dancing Queen             100    NA    NA    NA    NA    NA    NA NA   
##  8 Aaliyah        I Don't Wanna              38    61    NA    NA    NA    NA    NA NA   
##  9 Aaliyah        Try Again                  21     2    13    NA    NA    NA    NA NA   
## 10 Adams, Yolanda Open My Heart              68    59    NA    NA    NA    NA    NA NA   
## # … with 307 more rows