Lecture


R lab

The Tidyverse

This “meta-package” consists of several R packages, or as the Founding Father Hadley Wickahm says, “The tidyverse is an opinionated collection of R packages designed for data science.”

  • ggplot2, for plotting.
  • dplyr, for transforming and summarizing dataframe content
  • tidyr, for transforming (“tidying”) data frame structure
  • purrr, an upgrade of base R functional programming tools. (Beyond scope of class.)
  • tibble, an improved data frame. (Package will be used implicitly.)

Read about other packages that accompany the Tidyverse but aren’t technically packaged with it here. Click on ggplot2, dplyr, and tidyr, above for the comprehensive references for virtually all functionality of these packages. Most material within will be beyond the scope of this class, but everything you might need will be there.

This class will focus on dplyr and ggplot2, as well as RMarkdown. Next week we will continue to explore ggplot2 and introduce tidyr.

dplyr Functions

Commands can be strung together in order using the pipe %>% operator.*

Function Use
filter() Filter data frame on row
select() Filter data frame on column
mutate() Add new column to data frame
group_by() Establish a grouping for downstream operations. Remove with ungroup()
tally() Count the number of observations per grouping
summarize() Perform a summary statistic on a column. Also can spell summarise()
arrange() Arrange a column

dplyr Examples


### Install the packages ####
install.packages("tidyverse") # Only do this one time

### Load the packages ###
library(tidyverse) # Do this for every R session where you use the package(s)


###### Picking rows with filter() #######

### Base R equivalent
iris[iris$Species == "virginica",]

### using dplyr::filter()
filter(iris, Species == "virginica") 
	    Sepal.Length Sepal.Width Petal.Length Petal.Width   Species
	1           6.3         3.3          6.0         2.5 virginica
	2           5.8         2.7          5.1         1.9 virginica
	3           7.1         3.0          5.9         2.1 virginica
	4           6.3         2.9          5.6         1.8 virginica
	5           6.5         3.0          5.8         2.2 virginica
	6           7.6         3.0          6.6         2.1 virginica
	7           4.9         2.5          4.5         1.7 virginica
	8           7.3         2.9          6.3         1.8 virginica
	9           6.7         2.5          5.8         1.8 virginica
# Equivalent code with %>% pipe
iris %>% filter(Species == "virginica")


#### Separate "and" conditions with a comma ####
iris %>% filter(Species == "virginica", Sepal.Length > 7.5)
	  Sepal.Length Sepal.Width Petal.Length Petal.Width   Species
	1          7.6         3.0          6.6         2.1 virginica
	2          7.7         3.8          6.7         2.2 virginica
	3          7.7         2.6          6.9         2.3 virginica
	4          7.7         2.8          6.7         2.0 virginica
	5          7.9         3.8          6.4         2.0 virginica
	6          7.7         3.0          6.1         2.3 virginica
	


###### Picking rows with select() #######
iris %>% select(Species, Petal.Length)
	       Species Petal.Length
	1       setosa          1.4
	2       setosa          1.4
	3       setosa          1.3
	4       setosa          1.5
	5       setosa          1.4
	6       setosa          1.7
	
###### Removing rows with select(-) ######
iris %>% select(-Species)
	    Sepal.Length Sepal.Width Petal.Length Petal.Width
	1            5.1         3.5          1.4         0.2
	2            4.9         3.0          1.4         0.2
	3            4.7         3.2          1.3         0.2
	4            4.6         3.1          1.5         0.2
	5            5.0         3.6          1.4         0.2
	6            5.4         3.9          1.7         0.4
	

##### Combining filter() and select() with %>% #########
iris %>% 
	filter(Species == "virginica", Sepal.Width > 3.5) %>% 
	select(Petal.Width)
	  Petal.Width
	1         2.5
	2         2.2
	3         2.0
	
	
##### Creating new columns with mutate() #####
iris %>% mutate(Sepal.Area = Sepal.Width * Sepal.Length) 
		    Sepal.Length Sepal.Width Petal.Length Petal.Width    Species Sepal.Area
	1            5.1         3.5          1.4         0.2     setosa      17.85
	2            4.9         3.0          1.4         0.2     setosa      14.70
	3            4.7         3.2          1.3         0.2     setosa      15.04
	4            4.6         3.1          1.5         0.2     setosa      14.26
	5            5.0         3.6          1.4         0.2     setosa      18.00
	6            5.4         3.9          1.7         0.4     setosa      21.06
	
#### Combining verbs: Which flowers have sepal areas less than 15, ordered by area? ####
iris %>% 
	mutate(Sepal.Area = Sepal.Width * Sepal.Length) %>% 
	filter(Sepal.Area < 15) %>% 
	arrange(Sepal.Area)
	1           5.0         2.0          3.5         1.0 versicolor      10.00
	2           4.5         2.3          1.3         0.3     setosa      10.35
	3           5.0         2.3          3.3         1.0 versicolor      11.50
	4           4.9         2.4          3.3         1.0 versicolor      11.76
	5           4.9         2.5          4.5         1.7  virginica      12.25
	6           5.5         2.3          4.0         1.3 versicolor      12.65
	7           5.1         2.5          3.0         1.1 versicolor      12.75

#### Combining verbs: Which flowers have sepal areas less than 15, ordered by descending area? ####
iris %>% 
	mutate(Sepal.Area = Sepal.Width * Sepal.Length) %>% 
	filter(Sepal.Area < 15) %>% 
	arrange(desc(Sepal.Area))
	   Sepal.Length Sepal.Width Petal.Length Petal.Width    Species Sepal.Area
	1           4.8         3.1          1.6         0.2     setosa      14.88
	2           5.7         2.6          3.5         1.0 versicolor      14.82
	3           4.6         3.2          1.4         0.2     setosa      14.72
	4           4.9         3.0          1.4         0.2     setosa      14.70
	5           6.3         2.3          4.4         1.3 versicolor      14.49


##### Combining verbs: How many flowers have sepal areas less than 15? ####
iris %>% 
	mutate(Sepal.Area = Sepal.Width * Sepal.Length) %>% 
	filter(Sepal.Area < 15) %>% 
	tally()
	  n
	1 29



##### Combining verbs: How many flowers of each species have sepal areas less than 15?
iris %>% 
	mutate(Sepal.Area = Sepal.Width * Sepal.Length) %>% 
	filter(Sepal.Area < 15) %>% 
	group_by(Species) %>% 
	tally()
	     Species     n
	1     setosa    11
	2 versicolor    15
	3  virginica     3
	


##### Summarizing data ####
iris %>% summarize(mean.sepal.width = mean(Sepal.Width))
	  mean.sepal.width
	1         3.057333

iris %>% 
	group_by(Species) %>% 
	summarize(mean.sepal.width = mean(Sepal.Width))
	     Species mean.sepal.width
	1     setosa            3.428
	2 versicolor            2.770
	3  virginica            2.974


Rmarkdown