Instructions

For this assignment you will be recreating plots shown in these instructions made from the sparrows dataset, which we saw during in-class exercises using ggplot2. The sparrows dataset is part of the ds4b.materials package, so by loading that library, you are loading that dataset.

Obtaining and setting up the homework

  • Obtain the homework from your RStudio Cloud class project by running the following code in the R Console:

    library(ds4b.materials) # Load the class library
    launch_homework(4)      # Launch Homework 4


  • Answer each question with appropriate code and comments in the given question’s R chunk. Chunks are named as plot1 for plot 1, plot2 for plot 2, etc.

  • You must set an RMarkdown theme and code syntax highlighting scheme of your choosing in the YAML front matter. These links will help you:

    • Choose your favorite theme among the pre-packaged themes (ignore everything below “Even More Themes”) shown at this link
    • Choose your favorite syntax highlighting among these options at this link
  • Make sure your Rmd knits without errors before submitting. If it does not produce an HTML output, this means it does not knit. DO NOT SKIP THIS STEP! Ensuring code runs without errors is MORE IMPORTANT than writing code in the first place.

    • If there are errors in your code, you should comment out the code so that it does not actually run. This is BETTER than keeping the buggy code in there without commenting out - it shows me you attempted the code, but understood that it didn’t work properly. Partial credit will come to you! But, if you leave buggy code in, then the Rmd will not knit and there will be deductions.
  • As always, you are encouraged to work together and use the class Slack to help each other out, but you must submit YOUR OWN CODE.

Recreating the plots

Explore the dataset interactively in the Console before you begin plotting, and even then, keep looking at it!! You cannot work with data that you aren’t looking at, and nobody ever expects you to. You might want to run View(sparrows) in Console (not in the RMarkdown!!) to keep a full view of the dataset open at all times.


  1. For each question, you should write code to recreate the plot. Save the plot to a variable and THEN reveal the plot by typing out the variable. The purpose of this instruction is to make you practice saving plots to variables! For example (plots are small, for demo only):

    ## YES!!
    # Make and save plot
    example_plot <- ggplot(iris) +
      aes(x = Sepal.Length) +
      geom_histogram()
    
    example_plot # reveal plot

    ## YES!!
    # OR, assign *forwards* if you prefer that style:
    # Make and save plot
    ggplot(iris) +
      aes(x = Sepal.Length) +
      geom_histogram() -> example_plot
    
    example_plot # reveal plot

    ## NO!! THIS DOESN'T SAVE TO A VARIABLE!! NO!!
    ggplot(iris) +
      aes(x = Sepal.Length) +
      geom_histogram()


  1. Your code must be spaced out onto separate lines as we have learned in class. There will be deductions for plotting code that is all on one line. The purpose of this instruction is to force you build this organizational habit as early as possible. For example…

    ## YES!!
    ggplot(iris) +
      aes(x = Sepal.Length) +
      geom_histogram() -> example_plot
    
    ## NO!!
    ggplot(iris) + aes(x = Sepal.Length) + geom_histogram() ->    example_plot


  1. When recreating the plots, you DO NOT NEED to specifically recreate:

    • Exact colors and fills that appear in the plots. If the plots contain use “just a color” or “just a fill”, you can use literally any color/fill you want, as long as you are specifying the same type of feature. Have fun!
    • Exact point or line size. For example, if a plot technically contains points that are size=2 but yours do not, that’s OK! No deductions! You can’t really eyeball this stuff.


  1. When recreating the plots, you DO NEED to make sure you recreate:

    • Matching axes labels and plot titles (hint: there are titles!)
    • Matching the geom used and its styling
    • The specific aesthetic mappings
    • The plot theme. For all plots, ONLY of the following four themes is used. You can see what the themes look like here and/or here
      • theme_classic()
      • theme_minimal()
      • theme_bw()
      • theme_gray() (This is the default theme. If the plot uses this theme, you do NOT need to code it - it’s the default!)
      • (Seriously, NONE OF THE PLOTS USE: theme_linedraw() or theme_light(). Those do NOT APPEAR HERE!)

  2. Ensure your code contains comments!! Specifically, any line of code that you do not immediately and like muscle-memory understand should have a comment. This helps you to develop the skill of commenting and understand what your own code does.

    • Please note: Many beginners feel “stupid” writing comments. Sometimes they think, “If I were good at coding, I wouldn’t need comments. Comments make me look like I’m not smart enough to code.” This is a fundamental (but completely normal!) misunderstanding! The best coders are the ones who make their code as easy as possible to read and re-use. Using comments is a sign that you are a STRONG PROGRAMMER who knows how to write clear code. Not using comments is a sign that you enjoy pain and you love to forget what your code does 30 seconds after you wrote it.

  3. Don’t make this harder than it has to be. Every type of plot you have to make on this homework was introduced in lecture and/or the ggplot2 interactive exercises. Unless the question explicitly teaches you new code, you have seen the code you need to use before.

  4. You can place aes() wherever you want, as long as the plot works! You can include aesthetics on their own, within the ggplot() call, or within the relevant geom function. There is lots of flexibility for how you code aesthetic mappings, so use this opportunity to explore your coding style preference.

  5. The setup R chunk in the RMarkdown template pre-specifies a default figure size of 6 inches wide and 4 inches tall. Don’t interfere with this setting!! The plots you need to recreate were also made to be 6 inches wide and 4 inches tall.



Plots to recreate



Plot 1


Hints:

  • See how “Plot 1” is written twice? The first big/bold one is the question header for the homework, and the second one is part of the plot. All plots in this homework are structured like this.
  • Indeed that is a trendline you see! For this trendline only, uniquely on the homework, make sure you match its specific color. You don’t have to match other colors exactly. (The color is "black").



Plot 2


Hint: You should use the argument bins with your geom to exactly match this plot. Count the number of bins and use that number!



Plot 3


Hints:

  • Notice that both distributons are visible even though they are overlapping. Make sure yours also features transparency!
  • This plot places the legend at the bottom of the plot. This is actually a modification to the plot’s theme. Because ggplot plots are just added components on top of one another, you will need to change the legend position AFTER you set its theme (think about why this MUST BE THE CASE, and experiment with it!). The legend can be re-positioned by adding on the following code: theme(legend.position = "bottom"). To learn more (but not to submit for this questoin!), see what happens if you use “top”, “left”, or “right” instead of “bottom”.



Plot 4


  • Make sure your order of variables agrees with this plot!! You will need to use the function fct_relevel(), which was introduced in the ggplot2 exercises. Learn more by asking the introverse for help: get_help("fct_relevel")!!
  • This plot features both a fill and color!



Plot 5




Plot 6


Hints:

  • See the black point with small lines in each distribution? This represents the mean and standard error (SE) of each distribution. In this case, the standard errors are very small, so the lines are a bit tricky to see, but they are there! These points are conveniently, amazingly, and automatically added with the plot component, stat_summary(). This is a special ggplot2 function which can easily add a summary statistic onto a plot. No arguments are necessary because by default, it plots mean and SE.
  • It will reveal a warning message: “No summary function supplied, defaulting to mean_se(),” which means you used it CORRECTLY!!!!!
  • You will see the black point is larger than the other points. You should also make sure your plot has these relative sizes, but the exact numbers can differ from mine.
  • You will want to use the argument width with your geom. Your value does not need to exactly match that used to make the plot, but it should definitely differ from the default.



Plot 7


Hint: These points have both a color and fill!



Plot 8


Hint: Carefully consider the aesthetic mappings (aes()), and you will figure this out. It’s all about the mappings!! Think…

What is on the X? What is on the Y? Is there a color or fill (hint: there’s a fill!)? Is that color/fill “just a fill” or mapped to a variable (hint: it’s mapped!)? Amazingly, this information alone is enough to get you almost all the way to the finish line, other than theme and labels. The grammar of graphics is amazing!!



Plot 9


Hint: Compare this plot to the last one, and don’t overthink!! The code is almost exactly identical.



Plot 10


Hint: Count the bins!