Lecture
R lab
Base R Functions
Function |
Use |
summary() |
Returns five-number summary for a numeric vector and count table for character/factor/boolean vector |
mean |
Returns the mean from a numeric vector |
median() |
Returns the median from a numeric vector |
log() |
Returns the natural logarithm of a number or numeric vector |
min() /max() |
Returns the minimum/maximum value from a numeric vector |
exp() |
Return the exponetial e^x for an argument x |
sum() |
Returns the sum of a numeric vector |
sqrt() |
Returns the square root of a number or numeric vector |
length() |
Returns the length of a vector |
c() |
Concatenate data into a vector |
typeof() |
Returns the variable type |
Data frame importing, exploring, and indexing
###### Read in CSV data frame ######
> iris <- read.csv("iris.csv")
###### Explore the data frame ######
> names(iris) # Show the column names
[1] "Sepal.Length" "Sepal.Width" "Petal.Length" "Petal.Width" "Species"
> nrow(iris) # How many rows?
[1] 150
> ncol(iris) # How many columns?
[1] 5
> head(iris) # Show the first 6 rows of data
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3.0 1.4 0.2 setosa
3 4.7 3.2 1.3 0.2 setosa
4 4.6 3.1 1.5 0.2 setosa
5 5.0 3.6 1.4 0.2 setosa
6 5.4 3.9 1.7 0.4 setosa
> summary(iris) # Run the summary() function on every data frame column
Sepal.Length Sepal.Width Petal.Length Petal.Width
Min. :4.300 Min. :2.000 Min. :1.000 Min. :0.100
1st Qu.:5.100 1st Qu.:2.800 1st Qu.:1.600 1st Qu.:0.300
Median :5.800 Median :3.000 Median :4.350 Median :1.300
Mean :5.843 Mean :3.057 Mean :3.758 Mean :1.199
3rd Qu.:6.400 3rd Qu.:3.300 3rd Qu.:5.100 3rd Qu.:1.800
Max. :7.900 Max. :4.400 Max. :6.900 Max. :2.500
Species
setosa :50
versicolor:50
virginica :50
###### Index column from data frame ######
> iris$Sepal.Length
[1] 5.1 4.9 4.7 4.6 5.0 5.4 4.6 5.0 4.4 4.9 5.4 4.8 4.8 4.3 5.8 5.7 5.4 5.1
[19] 5.7 5.1 5.4 5.1 4.6 5.1 4.8 5.0 5.0 5.2 5.2 4.7 4.8 5.4 5.2 5.5 4.9 5.0
[37] 5.5 4.9 4.4 5.1 5.0 4.5 4.4 5.0 5.1 4.8 5.1 4.6 5.3 5.0 7.0 6.4 6.9 5.5
[55] 6.5 5.7 6.3 4.9 6.6 5.2 5.0 5.9 6.0 6.1 5.6 6.7 5.6 5.8 6.2 5.6 5.9 6.1
[73] 6.3 6.1 6.4 6.6 6.8 6.7 6.0 5.7 5.5 5.5 5.8 6.0 5.4 6.0 6.7 6.3 5.6 5.5
[91] 5.5 6.1 5.8 5.0 5.6 5.7 5.7 6.2 5.1 5.7 6.3 5.8 7.1 6.3 6.5 7.6 4.9 7.3
[109] 6.7 7.2 6.5 6.4 6.8 5.7 5.8 6.4 6.5 7.7 7.7 6.0 6.9 5.6 7.7 6.3 6.7 7.2
[127] 6.2 6.1 6.4 7.2 7.4 7.9 6.4 6.3 6.1 7.7 6.3 6.4 6.0 6.9 6.7 6.9 5.8 6.8
[145] 6.7 6.7 6.3 6.5 6.2 5.9
###### Summarize a column #####
> summary(iris$Sepal.Length)
Min. 1st Qu. Median Mean 3rd Qu. Max.
4.300 5.100 5.800 5.843 6.400 7.900
###### Use logical indexing on data frame ######
> iris$Sepal.Length[iris$Species == "setosa"] # sepal lengths for all setosa irises
[1] 5.1 4.9 4.7 4.6 5.0 5.4 4.6 5.0 4.4 4.9 5.4 4.8 4.8 4.3 5.8 5.7 5.4 5.1 5.7
[20] 5.1 5.4 5.1 4.6 5.1 4.8 5.0 5.0 5.2 5.2 4.7 4.8 5.4 5.2 5.5 4.9 5.0 5.5 4.9
[39] 4.4 5.1 5.0 4.5 4.4 5.0 5.1 4.8 5.1 4.6 5.3 5.0
> iris$Sepal.Length[iris$Sepal.Length >= 5] # only show sepal lengths that are >=5
[1] 5.1 5.0 5.4 5.0 5.4 5.8 5.7 5.4 5.1 5.7 5.1 5.4 5.1 5.1 5.0 5.0 5.2 5.2
[19] 5.4 5.2 5.5 5.0 5.5 5.1 5.0 5.0 5.1 5.1 5.3 5.0 7.0 6.4 6.9 5.5 6.5 5.7
[37] 6.3 6.6 5.2 5.0 5.9 6.0 6.1 5.6 6.7 5.6 5.8 6.2 5.6 5.9 6.1 6.3 6.1 6.4
[55] 6.6 6.8 6.7 6.0 5.7 5.5 5.5 5.8 6.0 5.4 6.0 6.7 6.3 5.6 5.5 5.5 6.1 5.8
[73] 5.0 5.6 5.7 5.7 6.2 5.1 5.7 6.3 5.8 7.1 6.3 6.5 7.6 7.3 6.7 7.2 6.5 6.4
[91] 6.8 5.7 5.8 6.4 6.5 7.7 7.7 6.0 6.9 5.6 7.7 6.3 6.7 7.2 6.2 6.1 6.4 7.2
[109] 7.4 7.9 6.4 6.3 6.1 7.7 6.3 6.4 6.0 6.9 6.7 6.9 5.8 6.8 6.7 6.7 6.3 6.5
[127] 6.2 5.9
> iris$Sepal.Length[iris$Species == "setosa" & iris$Sepal.Length >= 5] # only show setosa sepal lengths that are >=5
[1] 5.1 5.0 5.4 5.0 5.4 5.8 5.7 5.4 5.1 5.7 5.1 5.4 5.1 5.1 5.0 5.0 5.2 5.2 5.4
[20] 5.2 5.5 5.0 5.5 5.1 5.0 5.0 5.1 5.1 5.3 5.0
Basic R plotting
##### Histograms #####
## Sepal lengths *with axis labels*
hist(iris$Sepal.Length, xlab = "Sepal lengths (cm)", ylab = "Count", main = "")
## Sepal lengths *with axis labels and color*
hist(iris$Sepal.Length, xlab = "Sepal lengths (cm)", ylab = "Count", main = "", col = "purple")
# Setosa sepal lengths (uses logical indexing)
hist(iris$Sepal.Length[iris$Species == "setosa"], xlab = "Setosa Sepal lengths (cm)", ylab = "Count", main = "")
##### Boxplots #####
## Boxplot of iris sepal lengths *with labels*
boxplot(iris$Sepal.Length, ylab = "Sepal lengths (cm)", main = "")
## Boxplot of iris sepal lengths *with labels and color!*
boxplot(iris$Sepal.Length, ylab = "Sepal lengths (cm)", main = "", col="orange")
## Boxplots across a categorical variable
boxplot(data = iris, Sepal.Length ~ Species, xlab = "Species", ylab = "Sepal lengths (cm)", main = "")
## Boxplots across a categorical variable with many colors
boxplot(data = iris, Sepal.Length ~ Species, xlab = "Species", ylab = "Sepal lengths (cm)", main = "", col = c("forestgreen", "limegreen", "chartreuse"))
##### Barplots #####
# Barplot of iris species counts
# First, create a table of the data
species.table <- table(iris$Species)
# Plot the table
barplot(species.table, ylab = "Count", xlab = "Species", col = c("forestgreen", "limegreen", "chartreuse"))
# Barplot of iris species counts where petal lengths are < 1.5
petal.table <- table(iris$Species[iris$Petal.Length < 1.5])
barplot(petal.table, ylab = "Count", xlab = "Species", col = c("forestgreen", "limegreen", "chartreuse"))
##### Scatterplot ####
# Setosa sepal lengths against setosa petal lengths
# For ease, create setosa-only data frame first:
iris.setosa <- iris[iris$Species == "setosa",] ## Trailing comma means take all columns
# Now plot
plot(iris.setosa$Petal.Length, iris.setosa$Sepal.Length, xlab = "Setosa petal length (cm)", ylab = "Setosa sepal length (cm)")