Lecture


R lab

Base R Functions

Function Use
summary() Returns five-number summary for a numeric vector and count table for character/factor/boolean vector
mean Returns the mean from a numeric vector
median() Returns the median from a numeric vector
log() Returns the natural logarithm of a number or numeric vector
min()/max() Returns the minimum/maximum value from a numeric vector
exp() Return the exponetial e^x for an argument x
sum() Returns the sum of a numeric vector
sqrt() Returns the square root of a number or numeric vector
length() Returns the length of a vector
c() Concatenate data into a vector
typeof() Returns the variable type

Data frame importing, exploring, and indexing


###### Read in CSV data frame ######
> iris <- read.csv("iris.csv")

###### Explore the data frame ######
> names(iris)   # Show the column names
	[1] "Sepal.Length" "Sepal.Width"  "Petal.Length" "Petal.Width"  "Species"     

> nrow(iris) # How many rows?
	[1] 150
> ncol(iris) # How many columns?
	[1] 5

> head(iris)    # Show the first 6 rows of data
		  Sepal.Length Sepal.Width Petal.Length Petal.Width Species
	1          5.1         3.5          1.4         0.2  setosa
	2          4.9         3.0          1.4         0.2  setosa
	3          4.7         3.2          1.3         0.2  setosa
	4          4.6         3.1          1.5         0.2  setosa
	5          5.0         3.6          1.4         0.2  setosa
	6          5.4         3.9          1.7         0.4  setosa

> summary(iris) # Run the summary() function on every data frame column
	  Sepal.Length    Sepal.Width     Petal.Length    Petal.Width   
	 Min.   :4.300   Min.   :2.000   Min.   :1.000   Min.   :0.100  
	 1st Qu.:5.100   1st Qu.:2.800   1st Qu.:1.600   1st Qu.:0.300  
	 Median :5.800   Median :3.000   Median :4.350   Median :1.300  
	 Mean   :5.843   Mean   :3.057   Mean   :3.758   Mean   :1.199  
	 3rd Qu.:6.400   3rd Qu.:3.300   3rd Qu.:5.100   3rd Qu.:1.800  
	 Max.   :7.900   Max.   :4.400   Max.   :6.900   Max.   :2.500  
	       Species  
	 setosa    :50  
	 versicolor:50  
	 virginica :50  


###### Index column from data frame ######
> iris$Sepal.Length
	  [1] 5.1 4.9 4.7 4.6 5.0 5.4 4.6 5.0 4.4 4.9 5.4 4.8 4.8 4.3 5.8 5.7 5.4 5.1
	 [19] 5.7 5.1 5.4 5.1 4.6 5.1 4.8 5.0 5.0 5.2 5.2 4.7 4.8 5.4 5.2 5.5 4.9 5.0
	 [37] 5.5 4.9 4.4 5.1 5.0 4.5 4.4 5.0 5.1 4.8 5.1 4.6 5.3 5.0 7.0 6.4 6.9 5.5
	 [55] 6.5 5.7 6.3 4.9 6.6 5.2 5.0 5.9 6.0 6.1 5.6 6.7 5.6 5.8 6.2 5.6 5.9 6.1
	 [73] 6.3 6.1 6.4 6.6 6.8 6.7 6.0 5.7 5.5 5.5 5.8 6.0 5.4 6.0 6.7 6.3 5.6 5.5
	 [91] 5.5 6.1 5.8 5.0 5.6 5.7 5.7 6.2 5.1 5.7 6.3 5.8 7.1 6.3 6.5 7.6 4.9 7.3
	[109] 6.7 7.2 6.5 6.4 6.8 5.7 5.8 6.4 6.5 7.7 7.7 6.0 6.9 5.6 7.7 6.3 6.7 7.2
	[127] 6.2 6.1 6.4 7.2 7.4 7.9 6.4 6.3 6.1 7.7 6.3 6.4 6.0 6.9 6.7 6.9 5.8 6.8
	[145] 6.7 6.7 6.3 6.5 6.2 5.9

###### Summarize a column #####
> summary(iris$Sepal.Length)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  4.300   5.100   5.800   5.843   6.400   7.900 
  
###### Use logical indexing on data frame ######
> iris$Sepal.Length[iris$Species == "setosa"] # sepal lengths for all setosa irises
	 [1] 5.1 4.9 4.7 4.6 5.0 5.4 4.6 5.0 4.4 4.9 5.4 4.8 4.8 4.3 5.8 5.7 5.4 5.1 5.7
	[20] 5.1 5.4 5.1 4.6 5.1 4.8 5.0 5.0 5.2 5.2 4.7 4.8 5.4 5.2 5.5 4.9 5.0 5.5 4.9
	[39] 4.4 5.1 5.0 4.5 4.4 5.0 5.1 4.8 5.1 4.6 5.3 5.0

> iris$Sepal.Length[iris$Sepal.Length >= 5] # only show sepal lengths that are >=5 
	  [1] 5.1 5.0 5.4 5.0 5.4 5.8 5.7 5.4 5.1 5.7 5.1 5.4 5.1 5.1 5.0 5.0 5.2 5.2
	 [19] 5.4 5.2 5.5 5.0 5.5 5.1 5.0 5.0 5.1 5.1 5.3 5.0 7.0 6.4 6.9 5.5 6.5 5.7
	 [37] 6.3 6.6 5.2 5.0 5.9 6.0 6.1 5.6 6.7 5.6 5.8 6.2 5.6 5.9 6.1 6.3 6.1 6.4
	 [55] 6.6 6.8 6.7 6.0 5.7 5.5 5.5 5.8 6.0 5.4 6.0 6.7 6.3 5.6 5.5 5.5 6.1 5.8
	 [73] 5.0 5.6 5.7 5.7 6.2 5.1 5.7 6.3 5.8 7.1 6.3 6.5 7.6 7.3 6.7 7.2 6.5 6.4
	 [91] 6.8 5.7 5.8 6.4 6.5 7.7 7.7 6.0 6.9 5.6 7.7 6.3 6.7 7.2 6.2 6.1 6.4 7.2
	[109] 7.4 7.9 6.4 6.3 6.1 7.7 6.3 6.4 6.0 6.9 6.7 6.9 5.8 6.8 6.7 6.7 6.3 6.5
	[127] 6.2 5.9

> iris$Sepal.Length[iris$Species == "setosa" & iris$Sepal.Length >= 5] # only show setosa sepal lengths that are >=5 
	 [1] 5.1 5.0 5.4 5.0 5.4 5.8 5.7 5.4 5.1 5.7 5.1 5.4 5.1 5.1 5.0 5.0 5.2 5.2 5.4
	[20] 5.2 5.5 5.0 5.5 5.1 5.0 5.0 5.1 5.1 5.3 5.0

Basic R plotting


##### Histograms #####
## Sepal lengths *with axis labels* 
hist(iris$Sepal.Length, xlab = "Sepal lengths (cm)", ylab = "Count", main = "")
## Sepal lengths *with axis labels and color* 
hist(iris$Sepal.Length, xlab = "Sepal lengths (cm)", ylab = "Count", main = "", col = "purple")
# Setosa sepal lengths (uses logical indexing)
hist(iris$Sepal.Length[iris$Species == "setosa"], xlab = "Setosa Sepal lengths (cm)", ylab = "Count", main = "")

##### Boxplots #####

## Boxplot of iris sepal lengths *with labels* 
boxplot(iris$Sepal.Length, ylab = "Sepal lengths (cm)",  main = "")
## Boxplot of iris sepal lengths *with labels and color!*
boxplot(iris$Sepal.Length, ylab = "Sepal lengths (cm)",  main = "", col="orange")

## Boxplots across a categorical variable
boxplot(data = iris, Sepal.Length ~ Species, xlab = "Species", ylab = "Sepal lengths (cm)",  main = "")
##  Boxplots across a categorical variable with many colors
boxplot(data = iris, Sepal.Length ~ Species, xlab = "Species", ylab = "Sepal lengths (cm)",  main = "", col = c("forestgreen", "limegreen", "chartreuse"))


##### Barplots #####
# Barplot of iris species counts
# First, create a table of the data
species.table <- table(iris$Species)
# Plot the table
barplot(species.table, ylab = "Count", xlab = "Species", col = c("forestgreen", "limegreen", "chartreuse"))

# Barplot of iris species counts where petal lengths are < 1.5
petal.table <- table(iris$Species[iris$Petal.Length < 1.5])
barplot(petal.table, ylab = "Count", xlab = "Species", col = c("forestgreen", "limegreen", "chartreuse"))


##### Scatterplot ####
# Setosa sepal lengths against setosa petal lengths
# For ease, create setosa-only data frame first:
iris.setosa <- iris[iris$Species == "setosa",] ## Trailing comma means take all columns
# Now plot
plot(iris.setosa$Petal.Length, iris.setosa$Sepal.Length, xlab = "Setosa petal length (cm)", ylab = "Setosa sepal length (cm)")