For this homework, you will be writing your very first RMarkdown file and practicing your base R skills.
Obtain the homework from your RStudio Cloud class project by running the following code in the R Console:
library(ds4b.materials) # Load the class library
launch_homework(2) # Launch Homework 2
Answer each question with appropriate code and comments in the given question’s R chunk. Chunks are named as q1
for question 1, q2
for question 2, etc.
You must set an RMarkdown theme and code syntax highlighting scheme of your choosing in the YAML front matter. These links will help you:
a
and b
as values 8 and 64, respectively. Use a logical operator and the function sqrt()
to ask if a
is equal to the square root of b
, and print the output. The code should print TRUE
if they are equal and FALSE
if not.
get_help("logical")
and get_help("sqrt")
in the R Console after loading the introverse
library.success_status
whose values depends on a logical comparison, using variables a
and b
from the previous question. If a
squared equals b
, success_status
should be defined with the value “eureka”. Otherwise, if a
squared does not equal b
, success_status
should be defined with the value “wompwomp”. Print the value of success_status
after it is defined.
get_help("ifelse")
in the R Console after loading the introverse
library.a
and b
variables. You defined them in question #2, and you are still coding in the same Rmarkdown file. We always want to avoid repeating code since it often leads to unexpected bugs.mammals_vector
to contain these strings: “camel”, “civet”, “mink”, “bat”, “ferret”, and “pangolin” (these are all mammals known to be vectors for either SARS-CoV-2 or one of its close relatives SARS and MERS). Use the logical operator %in%
to determine if the string “elephant” is in the array. If “elephant” is in the array, your code should print TRUE
, and FALSE
otherwise.
get_help("logical")
in the R Console after loading the introverse
library.dinosaurs
to contain the three strings: “eagle”, “ostrich”, and “sparrow” (wait, those are all birds?! That’s right, birds are dinosaurs!). Then, define two other variables called mammals_length
and dinosaurs_length
. The variable mammals_length
should be defined as the result of running length()
on the mammals_vector
array you defined in the previous question. The variable dinosaurs_length
should similar be defined as the result of running length()
on the dinosaurs
array. Finally, use a logical operator to compare these variables. If mammals_length
is larger than dinosaurs_length
(in other words, the mammals array is longer than the dinosaurs array), your code should print TRUE
. Otherwise, your code should print FALSE
.
get_help("length")
in the R Console after loading the introverse
library.length()
! Scroll to the bottom of these instructions to see what I mean by making sure code produces the length variables!nchar()
function to determine how many characters are in each string in the dinosaurs
array, and save the result of this operation to a variable nchar_dinosaurs
. Then, use a logical operator to ask if each value is equal to 7. Your code should print a new array containing TRUE
or FALSE
values representing whether this condition was met for each value.
get_help("nchar")
in the R Console after loading the introverse
library.:
) to create a variable called q6
as an array of numbers in order -12 through 17. From this variable q6
, calculate the mean, median, standard deviation, and sum (with functions mean()
, median()
, sd()
,and sum()
, respectively) of this array. Each calculation should be saved to a variable called, respectively: q6_mean
, q6_median
, q6_sd
, and q6_sum
. Finally, print these four variables. You do not need to print q6
.
get_help("mean")
in the R Console after loading the introverse
library.q6
. Use a logical operator to ask if all values in this array are less than or 0 (aka, is q6
less than or equal to 0?). The result of this code will be an array itself containing TRUE
and FALSE
values corresponding to whether the \(\leq0\) condition was met for each value in q6
. Save this resulting array to a variable called q6_leq0
(“leq” = “less than or equal to”), and print the value of q6_leq0
.
get_help("logical")
in the R Console after loading the introverse
library.all()
using the introverse
help docs, by running get_help("all")
in the R Console, after you’ve loaded the introverse
library. Then, use this function to determine whether all values of q6
are less than or equal to 0. Your code should print a single TRUE
if all values are less than or equal to 0, and a single FALSE
otherwise.
q6
. For this question, find the sum of the absolute values of this array. To achieve this, you will need to first get the absolute value of the array q6
(using the abs()
function), and then you will need to sum those values (using the sum()
function). Your code chunk should reveal the final sum.
get_help("sum")
or get_help("abs")
in the R Console after loading the introverse
library.abs()
and then sum()
on q6
in that order.There is a super useful function called paste()
which allows you to strings (characters) together to create a new string. See how this function works below:
# combine the three strings "string 1", "and", and "string 2" into ONE string
<- paste("string 1", "and", "string 2")
combined_string
# reveal value
combined_string
## [1] "string 1 and string 2"
# you can also include variables:
<- "hello there"
a_string <- paste("string 1", a_string, "string 2")
combined_string2
# reveal value
combined_string2
## [1] "string 1 hello there string 2"
# Let's use paste() to directly write a sentence (HINT!!)
# (don't worry about having a period at the end of sentence!)
<- 10
variable_value paste("The value of the variable is", variable_value)
## [1] "The value of the variable is 10"
Use the paste()
function and the variable q6_mean
you previously defined to print a sentence that says: “The mean of q6 is ….” where …. is the the q6_mean
value. Your code should NOT include a number directly (don’t copy/paste the value of the mean). You may ONLY refer to this number value with the q6_mean
variable.
get_help("paste")
in the R Console after loading the introverse
library.For the remaining questions, you will consider the built-in R dataset iris
, a famous dataset with 150 rows and 5 columns that provides physical measurements for 150 iris specimens from three different species.
For this question, write code to determine the following information about iris
. You need to use a single function for each task (there will be four lines of code in the end), and all output should be printed from the chunk.
nrow()
)ncol()
)names()
)summary()
)cov_sepal_length
and cov_petal_length
, respectively. Then, use a logical operator to ask whether sepal lengths have more (>
) variation than petal lengths. Your code should print TRUE
if sepal lengths have more spread, and FALSE
otherwise.mean_petal_width
. Then, use paste()
to print a sentence that reads: "The mean petal width in the iris dataset is ..."
, where ...
is replaced with the value in mean_petal_width
. Again, make sure your code is only using variables and not values directly.iris
variable Species
is a factor variable, meaning R recognizes it as containing categories. R refers to factor categories with the term levels. By contrast, the variable Sepal.Width
is numeric, not a factor, so R doesn’t recognize that it has categories (which is reasonable! it’s numeric!). The R function levels()
can be used to quickly see the categories (levels!) in a factor variable. For this question, practice using the levels()
by running it twice: on the Species
column and on the Sepal.Width
column. Your code should simply print the result of running the levels()
function on those two columns.
To learn a little more about factors, let’s coerce some columns into and away from factors. To do this, we’ll create a new dataset called iris2
containing some coerced columns. _Copy and paste the following code into the R chunk for this question (this code should be the first code in your answer!!). This code creates a new data frame iris2
that contains the same information as iris
, but Species
is no longer factor (it is now character), and Sepal.Width
is no longer numeric (it is now a factor).
# Create a copy of iris in iris2
<- iris
iris2
# Coerce Species into a character. Previously, it was a factor.
$Species <- as.character(iris2$Species)
iris2
# Coerce Sepal.Width into a factor. Previously, it was numeric.
$Sepal.Width <- as.factor(iris2$Sepal.Width) iris2
Again write code with levels()
to see the categories in the iris2
columns Species
and Sepal.Width
, and notice (on your own - no need to write anything) how the output differs. Finally, write two more line of code in this chunk to calculate the mean of the Sepal.Width
column in iris
AND in iris2
, which should also be printed out. In ~1 sentence of written text below the code chunk, explain why IN YOUR OWN WORDS this calculation didn’t work as well as you might have hoped (hint: the output from levels()
is informative!).
The following code chunks provide examples of how NOT TO and how TO answer a hypothetical homework question using code. Notice how there are many ways to correctly achieve the right result! Welcome to coding :)
Hypothetical homework question: What is the length of an array with values 4, 6, and 10?
NO! This code is not acceptable, largely because it is not code - it is just the number 3. There is no code used at all to answer the question. Even if it is “easy” to do something in your head, you must use code to answer it. It also will not produce any output from a script since it is not formally printed.
3
## [1] 3
NO! This code is not acceptable, because even though code was used at some point, the code itself does not directly produce the output. This example represents a situation where you might run length(c(4,6,10))
in the R Console, or run length(c(4,6,10))
in the chunk but later delete or comment it out. You might look at the output and say, “the code calculated 3, so the answer is 3.” But, this is not ok, since the code itself must reveal the number 3.
#length(c(4,6,10))
3
## [1] 3
NO! This is getting much closer, since we now have the fully code needed to generate the output. However, nothing is actually printed, so the output is not revealed. It is critical to always reveal the output!
<- length(c(4,6,10)) answer
YES! This “fixes” the previous “NO!” example by revealing the output.
<- length(c(4,6,10))
answer answer
## [1] 3
YES! Only code is used to produce the output.
# YES!
length(c(4,6,10))
## [1] 3
YES! Only code is used to produce the output. This strategy uses improved coding style by defining a variable for the array, and then using length()
to calculate the length of the variable.
# YES!
<- c(4,6,10)
my_array length(my_array)
## [1] 3
YES! Only code is used to produce the output. This strategy uses improved coding style by defining a variable for the array, and then defining another variable to contain the output from the length()
calculation.
# YES!
<- c(4,6,10)
my_array <- length(my_array)
the_length the_length
## [1] 3
Unless the instructions explicitly ask you to define certain variables, all “YES!” code versions here are equivalent and entirely accceptable!