R
*
to multiply numbers5*6
/
to divide numbers600/2.53
^
or two asterisks (**
) to raise numbers to an exponent.138**2
# Equally valid approach
138^2
14^0.5
11+10)/3 (
After you define a variable, always print out the variable to make sure it was defined correctly! You can reveal the value of a variable simply by typing and running the variable’s name:
## Gives no output on its own, but does something *internally*.
y <- 8.5
## But if you run y by itself in the interactive Console, its value will appear!
y
Remember that the computer is this annoying person (AP):
You: “Can you calculate 5 times 2 for me?”
AP: “Sure I can do that!”
You: “…”
AP: “…”
You: “Ugh, can you TELL ME what 5 times 2 is?”
AP: “It’s 10”
You: “Well why didn’t you just tell me the first time?!”
AP: “You didn’t ask me to TELL you 5 times 2. You just asked me to CALCULATE 5 times 2.”
banana_price
whose value is 0.55 (aka 55 cents).
<-
to assign the value 0.55 to the variable name banana_price
.<- 0.55 banana_price
banana_price
variable (since these questions don’t communicate with each other!!). Use R as a calculator to determine the cost of 17 bananas. Perform this operation using the banana_price
variable itself.
<- 0.55
banana_price * 17 banana_price
num_bananas
. Assign this variable the value 17, and recalculate your answer using only the variables banana_price
and num_bananas
.
<- 0.55
banana_price <- 17
num_bananas * num_bananas banana_price
<- 0.62
banana_price <- 17
num_bananas * num_bananas banana_price
<- 2.75
persimmon_price <- 10
num_persimmons * num_persimmons persimmon_price
total_cost
. Print out this variable to be sure it’s correct.
<- 2.75
persimmon_price <- 10
num_persimmons <- persimmon_price * num_persimmons
total_cost total_cost
my_favorite_fruit
, and give it the value of…your favorite fruit! This is a string variable, so you must enclose the assigned value in quotes. Print your variable to make sure it is correct.
my_favorite_food
:).<- "papaya"
my_favorite_fruit my_favorite_fruit
sqrt()
to calculate the square root of 206.
sqrt()
sqrt(206)
log()
to calculate the natural log (this is actually the default behavior of this function) of 206.
log(206)
log()
to calculate the log in base 10 of 206.
log()
can optionally take a second argument specifying the base. For example, log(500, 6)
takes the log of 500 in base 6.log(206, 10)
class()
will tell you what type of data a given variable or value is. Use class()
to find the type of value 7.
class()
class(7)
class()
to determine the type of value "7"
.
class("7")
nchar()
function will tell you how many individual characters are in a string. The code below defines a character variable. Use the nchar()
function to determine how many characters it contains.
<- "yellow" banana_color
<- "yellow"
banana_color nchar(banana_color)
Note: The terms “array” and “vector” both refer to the same thing and are used interchangeably, but a “list” is something else entirely! These exercises don’t consider lists at all.
array_numbers
that contains the values 1, 2, 3, 4, and 5. Then, print the variable itself to make sure its contents look right.
c()
(“combine”) to define your array.c(8,9,10)
and c(8, 9, 10)
are equivalent.<- c(1, 2, 3, 4, 5)
array_numbers array_numbers
array_numbers2
that again contains the values 1, 2, 3, 4, and 5, but! this time use a colon :
instead of c()
. Again, type the variable name itself to make sure its contents look right.
<- 1:5
array_numbers2 array_numbers2
even_numbers
that contains the values 2, 4, 6, 8, and 10 using the c()
function.
<- c(2,4,6,8,10)
even_numbers even_numbers
length()
tells you how many items are in a given array. This function is analogous to the nchar()
function we previously learned: Just like we use nchar()
to ask how long a string is, we use length()
to ask how long an array is. Define an array with 5 numbers of your choosing (be brave!!), and use the function length()
to determine how many values it contains.
<- c(10, 11, 2, 19, 6)
some_numbers length(some_numbers)
example_array
by 10. Engage with and understand this code, and then modify the the code to instead multiply every item in the array by 5.
example_array
.<- c(5, 78, -3, 11, 6, 0)
example_array *5 example_array
example_array_plus15
, which you should print to ensure looks correct.
<- 4:14
example_array <- example_array + 15
example_array_plus15 example_array_plus15
sqrt()
function to quickly calculate the square root of every value in the array defined below.
sqrt()
function to perform all the calculations at once.<- 4:14
example_array sqrt(example_array)
log()
function takes a second argument to specify a base other than the default e.log()
as the first argument), or in two lines of code where you first define an array variable and then give that variable to log()
as the first argument.log()
doesn’t work well on non-positive numbers.log(-3:3, 10)
mean()
, median()
, and sd()
for calculating the mean (average), median, and standard deviation of some numbers. The code below defines an array aptly named calculate_me_please
. Use the mean()
, median()
, and sd()
functions to calculate these statistics.
0.0000004
?), but the median is not!# Define this array first by running in Console
calculate_me_please <- c(88.1, 89.5, 92.22, 85.17, 87.56, 0.0000004, 82.2, 85.1, 90.998, 97.25)
mean(calculate_me_please)
median(calculate_me_please)
sd(calculate_me_please)
56 > 7
==
110 == 12
32 <= 88
100 == (-10)^2
!=
. Generally speaking, exclamation marks mean “not.”100 != -55
75 > 75
12^4 < 13^3
TRUE
or FALSE
. This time, write a compound logical statement to ask if: 100 is equal to 100 and 66 is equal to 66.
TRUE
if BOTH conditions are TRUE
. Simply use an ampersand &
to combine logical statements.TRUE
and FALSE
to develop a comfort with &
.# This example code asks: is 1 equal to 1 and is 2 equal to 5?
1 == 1 & 2 == 5
# This is equivalent, but the parentheses are your choice!
(1 == 1) & (2 == 5)
100 == 100 & 66 == 66
100 > 87 & 66 < 12
100 > 50 & 100 < 150
50 >= 35 & 60 == -20
&
!17 < 111 & 18 > 4 & 66 != 67
In the previous questions, you used &
to evaluate whether multiple statements were all TRUE
. We can also use an operator for “or”, the pipe operator |
(located above your Enter/Return Key! Press: Shift
and \
), to evaluate whether at least one statement is TRUE
.
%>%
, which will be used more frequently in this class where we code in R. In the rest of the computing world, people think of |
as the “default” pipe symbol.|
to ask: Is 100 greater than 87 or is 66 less than 12?
100 > 87 | 66 < 12
100 > 50 | 100 < 150
50 >= 35 | 60 == -20
17 < 111 & 18 > 4) | 66 != 67 (
For this section, you will perform logical comparisons on variables rather than directly on values. The goal of this activity is to get you into the habit of working preferentially with variables that stand in for values. This strategy makes your code cleaner and easier to work with.
The first example is templated for you. Remember: Variables should always start with a letter can should only contain letters, numbers, and underscores.
# First we define variables
value1 <- 12
value2 <- -4
# Then we compare variables INSTEAD OF comparing values 12 and -4
value1 > value2
<- 7
value1 <- 5.44
value2
== value2 value1
<- 18
comp1_val1 <- 5
comp1_val2
<- 11
comp2_val1 <- 100
comp2_val2
> comp1_val2 & comp2_val1 < comp2_val2 comp1_val1
In this section, you will continue practicing logical operations (asking if something is TRUE
or FALSE
) incorporating functions into your code. The first question is templated for you.
sqrt(25) == 5
# Or with improved coding style:
value1 <- sqrt(25)
value2 <- 5
value1 == value2
nchar()
asks how many characters are in a string (how long is a string?). See this function in action below. Again, run and engage with this code below as practice!
nchar("hello")
hello_phrase <- "hi there!" # there are 9 characters in this string including space and !
# Note that the answer is 9 and NOT 12 (the number of characters in the variable)
nchar(hello_phrase)
# Does the number of characters equal 10? Nope!
nchar(hello_phrase) == 10
nchar("blazing")
to nchar("saddles")
using a logical operator that asks if two things are equal: ==
.TRUE
if done correctly (they both have 7 letters).nchar("blazing") == nchar("saddles")
# Or with improved coding style:
<- nchar("blazing")
nchar_blazing <- nchar("saddles")
nchar_saddles == nchar_saddles
nchar_blazing
# This approach also uses variables well:
<- "blazing"
blazing_str <- "saddles"
saddles_str nchar(blazing_str) == nchar(saddles_str)
string1
, and assign the value “for biologists” to a variable called string2
. Then, ask whether the number of characters in string1
is greater than or equal to the number of characters in string2
.
<- "data science"
string1 <- "for biologists"
string2 nchar(string1) >= nchar(string2)
log()
function calculates the natural log by default.&
for multiple comparisons!log(10) < log(100) & log(10) == 1.5
# Or, with variables:
<- log(10)
value1 <- log(100)
value2 <- 1.5
value3 < value2 & value1 == value3 value1
length()
to determine the length of the array, and then ask using a logical operator if the length equal 5. If done correctly, your answer should be FALSE
.
multiples_of_11 <- c(11, 22, 33, 44, 55, 66)
<- c(11, 22, 33, 44, 55, 66)
multiples_of_11 length(multiples_of_11) == 5
length()
to ask how long the array is. Then, ask if the length of the array is equal to 4.
TRUE
if done correctly.<- c("raindrops on roses", "whiskers on kittens", "bright copper kettles", "warm_woolen_mittens")
my_favorite_things length(my_favorite_things)
length(my_favorite_things) == 4
example_array
are greater than 0. Engage with and understand this code, and then modify the the code to instead ask if every item is equal to 0 (again, be careful not to delete the example_array
definition when modifying!).
TRUE
or FALSE
values.example_array <- c(5, 78, -3, 1, 6, 0)
example_array > 0
<- c(5, 78, -3, 1, 6, 0)
example_array # All are FALSE except the last entry
== 0 example_array
%in%
One of the most POWERFUL logical operators is %in%
- it’s literally the word in
enclosed by %
. You can use it to ask if a given value is present in an array.
Here is an example of this logical operator in action. Notice how helpful it is that I added the extra print statements so you know for sure what is being printed when!
# Define array
rainbow <- c("red", "orange", "yellow", "green", "blue", "indigo", "violet")
# Will return TRUE
print("Is 'red' in the rainbow array?")
"red" %in% rainbow
# Will return FALSE
print("Is 'black' in the rainbow array?")
"black" %in% rainbow
# This is _not_ The Way
# It asks: Is each value in rainbow equal to "red"?
# This code technically works, but is not at all the same as using %in%
print("Does 'red' equal the whole rainbow array?")
"red" == rainbow
%in%
operator to ask three separate logical questions:
Is “goat” in this array?
Is “marmoset” in this array?
Is “GOAT” in this array? What does the answer to this code tell you about the importance of letter case??
Hint: Case (uppercase and lowercase letters) matters. “goat” and “GOAT” are totally different.
animals <- c("goat", "turtle", "axolotl", "hyena", "coral")
<- c("goat", "turtle", "axolotl", "hyena", "coral")
animals "goat" %in% animals
"marmoset" %in% animals
"GOAT" %in% animals
:
. Then, use the %in%
operator to ask if the value -25 is in this array.
<- 50:500
many_numbers -25 %in% many_numbers
W
.misc_letters <- c("F", "Z", "U", "t", "g", "b", "R", "O", "i", "P", "e", "o", "I", "q", "M", "N", "D", "m", "y", "s")
<- "W" %in% misc_letters
is_W_there is_W_there
This section will teach you how to use the ifelse()
function. It allows you to define a variable based on a certain logical statement (thing that is TRUE
or FALSE
). Here’s the anatomy of the function, which takes three arguments:
ifelse(<logical statement>, <value if the statement is TRUE>, <value if the statement is FALSE>)
# Evaluates to "greater", since the statement is TRUE
print("example 1")
ifelse(10 > 5, "greater", "less than")
# Evaluates to "crocodile", since the statement is FALSE
print("example 2")
ifelse(10 == 12, "aligator", "crocodile")
# Save it to a variable!
print("example 3")
beast <- ifelse(10 == 12, "aligator", "crocodile")
beast
Practice using the ifelse()
function to define several variables as follows…
TRUE
, the variable value should be “yup”. If the condition is FALSE
, the variable value should be “nope”.
<- ifelse(5 > 20, "yup", "nope")
var # Always *look at* your variable to make sure it worked!
var
11
is is a number (or in R terms, its class is “numeric”). If it is a number, the variable value should be “number”. If is it not a number, the variable value should be “not a number”.
class()
! Remember also that numeric
is not a class, but "numeric"
(with quotes!) is.<- ifelse(class(11) == "numeric", "number", "not a number")
var var
"11"
(it has quotes!) is a number. If it is a number, the variable value should be “number”. If is it not a number, the variable value should be “not a number”.
<- ifelse(class("11") == "numeric", "number", "not a number")
var var
TRUE
, the variable value should be 100. If the condition is FALSE
, the variable value should be 10.
my_string <- "rumbledyhump"
<- "rumbledyhump"
my_string <- ifelse(nchar(my_string) == 200, 100, 10)
var var
TRUE
, the variable value should be 4. If this is FALSE
, the variable value should be 25.
<- ifelse(6**2 == 36, 4, 25)
var var
This set of exercises will use a data frame (aka tibble- we will learn the distinction between these terms!) called msleep
, which has been pre-loaded for you. This dataset contains information about some species of mammals as follows:
name
: The common namegenus
: The taxonomic genusvore
: Dietary strategy (carnivore, omnivore or herbivore)order
: The taxonomic orderconservation
: The species’ conservation status
cd
–> conservation-dependenten
–> endangeredlc
–> least concernnt
–> near-threatenedvu
–> vulnerablesleep_total
: Total amount of sleep, in hours, the species tends to experiencesleep_rem
: Total amount of REM sleep, in hours, the species tends to experiencesleep_cycle
: Length of sleep cycle, in hours, the species tends to experienceawake
: Amount of time spent awake, in hours, the species tends to experiencebrainwt
: Average brain weight, in kilograms, for this speciesbodywt
: Average body weight, in kilograms, for this specieshead()
, nrow()
, ncol()
, and names()
.
# What are the first six rows of the msleep dataset?
head(msleep)
# How many rows are there in the msleep dataset?
nrow(msleep)
# How many columns are there in the msleep dataset?
ncol(msleep)
# What are the column names in the msleep dataset?
names(msleep)
$
. The code below extracts the name
variable from the data frame. Run and understand this code, and then modify it to extract the awake
column.
msleep$"name"
. You can, technically, but it’s good habit not to.msleep$name
$awake msleep
median()
, min()
, max()
, and sd()
(standard deviation) on the awake
column.
mean(msleep$awake)
# calculate the median
median(msleep$awake)
# calculate the minimum
min(msleep$awake)
# calculate the maximum
max(msleep$awake)
# calculate the standard deviation
sd(msleep$awake)
sleep_rem
) these mammals get. The output will not look like you expect - instead of a number, you get the output “NA”. To start to understand why, view the full contents of this column (i.e. msleep$sleep_rem
). You will see some values “NA”, meaning there is missing data. Missing data is very common - real life science (especially biology!) is tricky, and sometimes we can’t gather as much data as we want.
Whenever R encounters NA
’s in summary statistics functions (mean()
, median()
, sd()
, etc.), it gets scared and won’t calculate anything at all. Run the code below to see how this column contains NA
s, and then how the median()
calculation doesn’t work.
# Look at data first - it has NAs!
msleep$sleep_rem
# Try to calculate median
median(msleep$sleep_rem)
median()
function by including the second argument na.rm=TRUE
. This argument tells median()
to ignore (“rm” for remove) NA’s when performing calculations. See this code in action below, and then modify it to calculate the mean of this column instead.
na.rm
is not a function!!! It is an ARGUMENT we can use with summary statistics functions.median(msleep$sleep_rem, na.rm = TRUE)
mean(msleep$sleep_rem, na.rm = TRUE)
vore
is a character column - it contains strings rather than numbers. Let’s examine this column using the function unique()
, which returns all unique values in a given array - What are the unique vores in this dataset?
msleep$vore
column.unique(msleep$vore)
table()
to get a count of all unique values in a categorical (or factor) variable. Use table()
to determine how many of each vore
category is in the dataset. This function takes one argument: the array to make a table of. Here, that array is the msleep$vore
column.
table()
doesn’t even consider NA
values. Very kind!table(msleep$vore)
summary()
to summarize arrays, and even entire data frames. Run this function twice: first on the awake
column and then on msleep
itself (without indexing any columns), and behold the information!
summary(msleep$awake)
summary(msleep)
You’ll notice in the full summary()
output for msleep
, there isn’t much useful information for the character columns. You may have expected all the categories in these columns to be revealed. In fact, R does not treat character variables this way. To force R to consider a vector specifically as containing categories, you need to convert it into a new type of data called factor, which is a special type of variable in R for, you guessed it, working with categories.
The code below shows how the column vore
would be summarized when coerce to a factor with the function as.factor()
. Factors are extremely important, and also extremely irritating. We will learn more about dealing with them later!
# Adding spacing helps to see which function is which and where all the parentheses go
summary( as.factor(msleep$vore) )
In this section we’ll use functions and logical operators to explore and answer questions about the msleep
dataset.
msleep
dataset?
name
column.%in%
operator to ask if a value is in an array!"Arctic fox" %in% msleep$name
msleep
dataset?
genus
column."Ailuropoda" %in% msleep$genus
order
column with unique()
, and then ask what the length of that array is. It is extremely important to do one step at a time and print as you go to make sure you’re on the right track!!<- unique(msleep$order)
unique_orders length(unique_orders)
<- unique(msleep$order)
unique_orders length(unique_orders) > 20
sleep_total
) greater than 12?
mean(msleep$sleep_total) > 12
brainwt
) of all mammals equal to 7?
mean(msleep$brainwt, na.rm = TRUE) == 7
awake
and sleep_total
columns.
# This is TRUE, so mammals are awake on average as much or more time than they are asleep
mean(msleep$awake) >= mean(msleep$sleep_total)
<- msleep$awake / msleep$sleep_total
ratio median(ratio)
nrow()
function in your code!nrow(msleep) > 100
bodywt
column?
5 %in% msleep$bodywt
sd()
) of body weights greater than or equal to 300?
sd(msleep$bodywt) >= 300
sleep_rem
or the smallest value in awake
?
min()
, and go one step at a time.NA``'s! The
min()` function needs another argument to deal with this.<- min(msleep$sleep_rem, na.rm=TRUE)
min_sleep_rem <- min(msleep$awake)
min_awake
# Answer is TRUE: smallest awake hours is greater than smallest REM hours
> min_sleep_rem min_awake
msleep
(use the name
) column. If the condition is TRUE
, the variable value should be “present”. If the condition is FALSE
, the variable value should be “absent”.
ifelse()
and the %in%
operator!<- ifelse("Cheetah" %in% msleep$name, "present", "absent")
var var
msleep
is greater than 2000. If the condition is TRUE
, the variable value should be 1 (the number 1, no quotes!). If the condition is FALSE
, the variable value should be 0.
<- ifelse(max(msleep$bodywt) > 2000, 1, 0)
var
var
# Or more spaced out for clarity:
<- max(msleep$bodywt)
max_bodywt <- ifelse(max_bodywt > 2000, 1, 0)
var var
msleep
equals 80. If the condition is TRUE
, the variable value should be 10. If the condition is FALSE
, the variable value should be -10.
<- ifelse(nrow(msleep) == 80, 10, -10)
var var