R* to multiply numbers5*6/ to divide numbers600/2.53^ or two asterisks (**) to raise numbers to an exponent.138**2
# Equally valid approach
138^2 14^0.5(11+10)/3After you define a variable, always print out the variable to make sure it was defined correctly! You can reveal the value of a variable simply by typing and running the variable’s name:
## Gives no output on its own, but does something *internally*.
y <- 8.5
## But if you run y by itself in the interactive Console, its value will appear!
y
Remember that the computer is this annoying person (AP):
You: “Can you calculate 5 times 2 for me?”
AP: “Sure I can do that!”
You: “…”
AP: “…”
You: “Ugh, can you TELL ME what 5 times 2 is?”
AP: “It’s 10”
You: “Well why didn’t you just tell me the first time?!”
AP: “You didn’t ask me to TELL you 5 times 2. You just asked me to CALCULATE 5 times 2.”
banana_price whose value is 0.55 (aka 55 cents).
<- to assign the value 0.55 to the variable name banana_price.banana_price <- 0.55banana_price variable (since these questions don’t communicate with each other!!). Use R as a calculator to determine the cost of 17 bananas. Perform this operation using the banana_price variable itself.
banana_price <- 0.55
banana_price * 17num_bananas. Assign this variable the value 17, and recalculate your answer using only the variables banana_price and num_bananas.
banana_price <- 0.55
num_bananas <- 17
banana_price * num_bananasbanana_price <- 0.62
num_bananas <- 17
banana_price * num_bananaspersimmon_price <- 2.75
num_persimmons <- 10
persimmon_price * num_persimmonstotal_cost. Print out this variable to be sure it’s correct.
persimmon_price <- 2.75
num_persimmons <- 10
total_cost <- persimmon_price * num_persimmons
total_costmy_favorite_fruit, and give it the value of…your favorite fruit! This is a string variable, so you must enclose the assigned value in quotes. Print your variable to make sure it is correct.
my_favorite_food :).my_favorite_fruit <- "papaya"
my_favorite_fruitsqrt() to calculate the square root of 206.
sqrt()sqrt(206)log() to calculate the natural log (this is actually the default behavior of this function) of 206.
log(206)log() to calculate the log in base 10 of 206.
log() can optionally take a second argument specifying the base. For example, log(500, 6) takes the log of 500 in base 6.log(206, 10)class() will tell you what type of data a given variable or value is. Use class() to find the type of value 7.
class()class(7)class() to determine the type of value "7".
class("7")nchar() function will tell you how many individual characters are in a string. The code below defines a character variable. Use the nchar() function to determine how many characters it contains.
banana_color <- "yellow"banana_color <- "yellow"
nchar(banana_color)Note: The terms “array” and “vector” both refer to the same thing and are used interchangeably, but a “list” is something else entirely! These exercises don’t consider lists at all.
array_numbers that contains the values 1, 2, 3, 4, and 5. Then, print the variable itself to make sure its contents look right.
c() (“combine”) to define your array.c(8,9,10) and c(8, 9, 10) are equivalent.array_numbers <- c(1, 2, 3, 4, 5)
array_numbersarray_numbers2 that again contains the values 1, 2, 3, 4, and 5, but! this time use a colon : instead of c(). Again, type the variable name itself to make sure its contents look right.
array_numbers2 <- 1:5
array_numbers2even_numbers that contains the values 2, 4, 6, 8, and 10 using the c() function.
even_numbers <- c(2,4,6,8,10)
even_numberslength() tells you how many items are in a given array. This function is analogous to the nchar() function we previously learned: Just like we use nchar() to ask how long a string is, we use length() to ask how long an array is. Define an array with 5 numbers of your choosing (be brave!!), and use the function length() to determine how many values it contains.
some_numbers <- c(10, 11, 2, 19, 6)
length(some_numbers)example_array by 10. Engage with and understand this code, and then modify the the code to instead multiply every item in the array by 5.
example_array.example_array <- c(5, 78, -3, 11, 6, 0)
example_array*5example_array_plus15, which you should print to ensure looks correct.
example_array <- 4:14
example_array_plus15 <- example_array + 15
example_array_plus15sqrt() function to quickly calculate the square root of every value in the array defined below.
sqrt() function to perform all the calculations at once.example_array <- 4:14
sqrt(example_array)log() function takes a second argument to specify a base other than the default e.log() as the first argument), or in two lines of code where you first define an array variable and then give that variable to log() as the first argument.log() doesn’t work well on non-positive numbers.log(-3:3, 10)mean(), median(), and sd() for calculating the mean (average), median, and standard deviation of some numbers. The code below defines an array aptly named calculate_me_please. Use the mean(), median(), and sd() functions to calculate these statistics.
0.0000004?), but the median is not!# Define this array first by running in Console
calculate_me_please <- c(88.1, 89.5, 92.22, 85.17, 87.56, 0.0000004, 82.2, 85.1, 90.998, 97.25)
mean(calculate_me_please)
median(calculate_me_please)
sd(calculate_me_please)56 > 7==110 == 1232 <= 88100 == (-10)^2!=. Generally speaking, exclamation marks mean “not.”100 != -5575 > 7512^4 < 13^3TRUE or FALSE. This time, write a compound logical statement to ask if: 100 is equal to 100 and 66 is equal to 66.
TRUE if BOTH conditions are TRUE. Simply use an ampersand & to combine logical statements.TRUE and FALSE to develop a comfort with &.# This example code asks: is 1 equal to 1 and is 2 equal to 5?
1 == 1 & 2 == 5
# This is equivalent, but the parentheses are your choice!
(1 == 1) & (2 == 5)
100 == 100 & 66 == 66100 > 87 & 66 < 12100 > 50 & 100 < 15050 >= 35 & 60 == -20&!17 < 111 & 18 > 4 & 66 != 67In the previous questions, you used & to evaluate whether multiple statements were all TRUE. We can also use an operator for “or”, the pipe operator | (located above your Enter/Return Key! Press: Shift and \), to evaluate whether at least one statement is TRUE.
%>%, which will be used more frequently in this class where we code in R. In the rest of the computing world, people think of | as the “default” pipe symbol.| to ask: Is 100 greater than 87 or is 66 less than 12?
100 > 87 | 66 < 12100 > 50 | 100 < 15050 >= 35 | 60 == -20(17 < 111 & 18 > 4) | 66 != 67For this section, you will perform logical comparisons on variables rather than directly on values. The goal of this activity is to get you into the habit of working preferentially with variables that stand in for values. This strategy makes your code cleaner and easier to work with.
The first example is templated for you. Remember: Variables should always start with a letter can should only contain letters, numbers, and underscores.
# First we define variables
value1 <- 12
value2 <- -4
# Then we compare variables INSTEAD OF comparing values 12 and -4
value1 > value2
value1 <- 7
value2 <- 5.44
value1 == value2comp1_val1 <- 18
comp1_val2 <- 5
comp2_val1 <- 11
comp2_val2 <- 100
comp1_val1 > comp1_val2 & comp2_val1 < comp2_val2In this section, you will continue practicing logical operations (asking if something is TRUE or FALSE) incorporating functions into your code. The first question is templated for you.
sqrt(25) == 5
# Or with improved coding style:
value1 <- sqrt(25)
value2 <- 5
value1 == value2
nchar() asks how many characters are in a string (how long is a string?). See this function in action below. Again, run and engage with this code below as practice!
nchar("hello")
hello_phrase <- "hi there!" # there are 9 characters in this string including space and !
# Note that the answer is 9 and NOT 12 (the number of characters in the variable)
nchar(hello_phrase)
# Does the number of characters equal 10? Nope!
nchar(hello_phrase) == 10
nchar("blazing") to nchar("saddles") using a logical operator that asks if two things are equal: ==.TRUE if done correctly (they both have 7 letters).nchar("blazing") == nchar("saddles")
# Or with improved coding style:
nchar_blazing <- nchar("blazing")
nchar_saddles <- nchar("saddles")
nchar_blazing == nchar_saddles
# This approach also uses variables well:
blazing_str <- "blazing"
saddles_str <- "saddles"
nchar(blazing_str) == nchar(saddles_str)string1, and assign the value “for biologists” to a variable called string2. Then, ask whether the number of characters in string1 is greater than or equal to the number of characters in string2.
string1 <- "data science"
string2 <- "for biologists"
nchar(string1) >= nchar(string2)log() function calculates the natural log by default.& for multiple comparisons!log(10) < log(100) & log(10) == 1.5
# Or, with variables:
value1 <- log(10)
value2 <- log(100)
value3 <- 1.5
value1 < value2 & value1 == value3length() to determine the length of the array, and then ask using a logical operator if the length equal 5. If done correctly, your answer should be FALSE.
multiples_of_11 <- c(11, 22, 33, 44, 55, 66)
multiples_of_11 <- c(11, 22, 33, 44, 55, 66)
length(multiples_of_11) == 5length() to ask how long the array is. Then, ask if the length of the array is equal to 4.
TRUE if done correctly.my_favorite_things <- c("raindrops on roses", "whiskers on kittens", "bright copper kettles", "warm_woolen_mittens")
length(my_favorite_things)
length(my_favorite_things) == 4example_array are greater than 0. Engage with and understand this code, and then modify the the code to instead ask if every item is equal to 0 (again, be careful not to delete the example_array definition when modifying!).
TRUE or FALSE values.example_array <- c(5, 78, -3, 1, 6, 0)
example_array > 0
example_array <- c(5, 78, -3, 1, 6, 0)
# All are FALSE except the last entry
example_array == 0%in%One of the most POWERFUL logical operators is %in% - it’s literally the word in enclosed by %. You can use it to ask if a given value is present in an array.
Here is an example of this logical operator in action. Notice how helpful it is that I added the extra print statements so you know for sure what is being printed when!
# Define array
rainbow <- c("red", "orange", "yellow", "green", "blue", "indigo", "violet")
# Will return TRUE
print("Is 'red' in the rainbow array?")
"red" %in% rainbow
# Will return FALSE
print("Is 'black' in the rainbow array?")
"black" %in% rainbow
# This is _not_ The Way
# It asks: Is each value in rainbow equal to "red"?
# This code technically works, but is not at all the same as using %in%
print("Does 'red' equal the whole rainbow array?")
"red" == rainbow
%in% operator to ask three separate logical questions:
Is “goat” in this array?
Is “marmoset” in this array?
Is “GOAT” in this array? What does the answer to this code tell you about the importance of letter case??
Hint: Case (uppercase and lowercase letters) matters. “goat” and “GOAT” are totally different.
animals <- c("goat", "turtle", "axolotl", "hyena", "coral")
animals <- c("goat", "turtle", "axolotl", "hyena", "coral")
"goat" %in% animals
"marmoset" %in% animals
"GOAT" %in% animals:. Then, use the %in% operator to ask if the value -25 is in this array.
many_numbers <- 50:500
-25 %in% many_numbersW.misc_letters <- c("F", "Z", "U", "t", "g", "b", "R", "O", "i", "P", "e", "o", "I", "q", "M", "N", "D", "m", "y", "s")
is_W_there <- "W" %in% misc_letters
is_W_thereThis section will teach you how to use the ifelse() function. It allows you to define a variable based on a certain logical statement (thing that is TRUE or FALSE). Here’s the anatomy of the function, which takes three arguments:
ifelse(<logical statement>, <value if the statement is TRUE>, <value if the statement is FALSE>)
# Evaluates to "greater", since the statement is TRUE
print("example 1")
ifelse(10 > 5, "greater", "less than")
# Evaluates to "crocodile", since the statement is FALSE
print("example 2")
ifelse(10 == 12, "aligator", "crocodile")
# Save it to a variable!
print("example 3")
beast <- ifelse(10 == 12, "aligator", "crocodile")
beast
Practice using the ifelse() function to define several variables as follows…
TRUE, the variable value should be “yup”. If the condition is FALSE, the variable value should be “nope”.
var <- ifelse(5 > 20, "yup", "nope")
# Always *look at* your variable to make sure it worked!
var11 is is a number (or in R terms, its class is “numeric”). If it is a number, the variable value should be “number”. If is it not a number, the variable value should be “not a number”.
class()! Remember also that numeric is not a class, but "numeric" (with quotes!) is.var <- ifelse(class(11) == "numeric", "number", "not a number")
var"11" (it has quotes!) is a number. If it is a number, the variable value should be “number”. If is it not a number, the variable value should be “not a number”.
var <- ifelse(class("11") == "numeric", "number", "not a number")
varTRUE, the variable value should be 100. If the condition is FALSE, the variable value should be 10.
my_string <- "rumbledyhump"
my_string <- "rumbledyhump"
var <- ifelse(nchar(my_string) == 200, 100, 10)
varTRUE, the variable value should be 4. If this is FALSE, the variable value should be 25.
var <- ifelse(6**2 == 36, 4, 25)
varThis set of exercises will use a data frame (aka tibble- we will learn the distinction between these terms!) called msleep, which has been pre-loaded for you. This dataset contains information about some species of mammals as follows:
name: The common namegenus: The taxonomic genusvore: Dietary strategy (carnivore, omnivore or herbivore)order: The taxonomic orderconservation: The species’ conservation status
cd –> conservation-dependenten –> endangeredlc –> least concernnt –> near-threatenedvu –> vulnerablesleep_total: Total amount of sleep, in hours, the species tends to experiencesleep_rem: Total amount of REM sleep, in hours, the species tends to experiencesleep_cycle: Length of sleep cycle, in hours, the species tends to experienceawake: Amount of time spent awake, in hours, the species tends to experiencebrainwt: Average brain weight, in kilograms, for this speciesbodywt: Average body weight, in kilograms, for this specieshead(), nrow(), ncol(), and names().
# What are the first six rows of the msleep dataset?
head(msleep)
# How many rows are there in the msleep dataset?
nrow(msleep)
# How many columns are there in the msleep dataset?
ncol(msleep)
# What are the column names in the msleep dataset?
names(msleep)$. The code below extracts the name variable from the data frame. Run and understand this code, and then modify it to extract the awake column.
msleep$"name". You can, technically, but it’s good habit not to.msleep$name
msleep$awakemedian(), min(), max(), and sd() (standard deviation) on the awake column.
mean(msleep$awake)
# calculate the median
median(msleep$awake)
# calculate the minimum
min(msleep$awake)
# calculate the maximum
max(msleep$awake)
# calculate the standard deviation
sd(msleep$awake)sleep_rem) these mammals get. The output will not look like you expect - instead of a number, you get the output “NA”. To start to understand why, view the full contents of this column (i.e. msleep$sleep_rem). You will see some values “NA”, meaning there is missing data. Missing data is very common - real life science (especially biology!) is tricky, and sometimes we can’t gather as much data as we want.
Whenever R encounters NA’s in summary statistics functions (mean(), median(), sd(), etc.), it gets scared and won’t calculate anything at all. Run the code below to see how this column contains NAs, and then how the median() calculation doesn’t work.
# Look at data first - it has NAs!
msleep$sleep_rem
# Try to calculate median
median(msleep$sleep_rem)
median() function by including the second argument na.rm=TRUE. This argument tells median() to ignore (“rm” for remove) NA’s when performing calculations. See this code in action below, and then modify it to calculate the mean of this column instead.
na.rm is not a function!!! It is an ARGUMENT we can use with summary statistics functions.median(msleep$sleep_rem, na.rm = TRUE)
mean(msleep$sleep_rem, na.rm = TRUE)vore is a character column - it contains strings rather than numbers. Let’s examine this column using the function unique(), which returns all unique values in a given array - What are the unique vores in this dataset?
msleep$vore column.unique(msleep$vore)table() to get a count of all unique values in a categorical (or factor) variable. Use table() to determine how many of each vore category is in the dataset. This function takes one argument: the array to make a table of. Here, that array is the msleep$vore column.
table() doesn’t even consider NA values. Very kind!table(msleep$vore)summary() to summarize arrays, and even entire data frames. Run this function twice: first on the awake column and then on msleep itself (without indexing any columns), and behold the information!
summary(msleep$awake)
summary(msleep)You’ll notice in the full summary() output for msleep, there isn’t much useful information for the character columns. You may have expected all the categories in these columns to be revealed. In fact, R does not treat character variables this way. To force R to consider a vector specifically as containing categories, you need to convert it into a new type of data called factor, which is a special type of variable in R for, you guessed it, working with categories.
The code below shows how the column vore would be summarized when coerce to a factor with the function as.factor(). Factors are extremely important, and also extremely irritating. We will learn more about dealing with them later!
# Adding spacing helps to see which function is which and where all the parentheses go
summary( as.factor(msleep$vore) )
In this section we’ll use functions and logical operators to explore and answer questions about the msleep dataset.
msleep dataset?
name column.%in% operator to ask if a value is in an array!"Arctic fox" %in% msleep$namemsleep dataset?
genus column."Ailuropoda" %in% msleep$genusorder column with unique(), and then ask what the length of that array is. It is extremely important to do one step at a time and print as you go to make sure you’re on the right track!!unique_orders <- unique(msleep$order)
length(unique_orders)unique_orders <- unique(msleep$order)
length(unique_orders) > 20sleep_total) greater than 12?
mean(msleep$sleep_total) > 12brainwt) of all mammals equal to 7?
mean(msleep$brainwt, na.rm = TRUE) == 7awake and sleep_total columns.
# This is TRUE, so mammals are awake on average as much or more time than they are asleep
mean(msleep$awake) >= mean(msleep$sleep_total)ratio <- msleep$awake / msleep$sleep_total
median(ratio)nrow() function in your code!nrow(msleep) > 100bodywt column?
5 %in% msleep$bodywtsd()) of body weights greater than or equal to 300?
sd(msleep$bodywt) >= 300sleep_rem or the smallest value in awake?
min(), and go one step at a time.NA``'s! Themin()` function needs another argument to deal with this.min_sleep_rem <- min(msleep$sleep_rem, na.rm=TRUE)
min_awake <- min(msleep$awake)
# Answer is TRUE: smallest awake hours is greater than smallest REM hours
min_awake > min_sleep_remmsleep (use the name) column. If the condition is TRUE, the variable value should be “present”. If the condition is FALSE, the variable value should be “absent”.
ifelse() and the %in% operator!var <- ifelse("Cheetah" %in% msleep$name, "present", "absent")
varmsleep is greater than 2000. If the condition is TRUE, the variable value should be 1 (the number 1, no quotes!). If the condition is FALSE, the variable value should be 0.
var <- ifelse(max(msleep$bodywt) > 2000, 1, 0)
var
# Or more spaced out for clarity:
max_bodywt <- max(msleep$bodywt)
var <- ifelse(max_bodywt > 2000, 1, 0)
varmsleep equals 80. If the condition is TRUE, the variable value should be 10. If the condition is FALSE, the variable value should be -10.
var <- ifelse(nrow(msleep) == 80, 10, -10)
var