Homework 4: Probability distributions (100 pts)

BIO5312

Due 9/26/17 by 5:30 pm

Enter your name here


Preamble

To use this homework, you must install the packages purrr and readr. Please do so before proceeding to the assignment.


Part One: Binomial distribution

The questions in this section concern the following scenario:

In a species of garden spider, the female attempts to eat the male after mating. More generally, research has shown that the female successfully eats the male 41% of the time. Further assume that all females of this species strictly mate 10 times in their entire lives, no more no less.

Further note that the phrase “R distribution functions” refers to the family of functions dxxx(), pxxx(), rxxx(), and qxxx().


1. (5 pts) Plot the probability mass function (PMF) of this binomial distribution. To aid in this task, the code provided below constructs the probability data frame that you should plot.

## Similar code seen in 9/19 class
data.pmf <- tibble(k = 0:10, prob = map_dbl(0:10, dbinom, 10, 0.41))
### Code to plot PMF goes here.


2. (5 pts) Plot the cumulative distribution function (CDF) of this binomial distribution. To make this plot, generate 1000 random numbers from this binomial distribution.

### Code to plot CDF goes here.


3. (10 pts) What is the probability that a randomly-chosen female eats exactly 7 males over her lifetime? Answer this question using two approaches:

### Enter code for direct calculations here
### Enter code for R distribution function here


4. (5 pts) What is the probability that a randomly-chosen female eats at least 4 males over her lifetime? Solve this question using R distribution function(s).

### Enter code here


5. (5 pts) What is the probability that a randomly-chosen female does not eat 3 males? Solve this question using R distribution function(s).

### Enter code here



Part Two: Normal distribution

The questions in this section concern the following scenario:

There are two closely related species of crab, aptly named A and B. The mass (all units grams) of crab species A is normally distributed as N(35.4, 7), and the mass of crab species B is normally distributed as N(42, 10.2). In addition, a very interesting factoid about these crabs is that species B crabs are much more common than are species A crabs. Specifically, if you chose a random A or B crab, there is a 72% chance it would belong to species B.


1. (5 pts) Plot the probability distribution function (PDF) for the crab species A mass distribution.

### Code to plot PDF goes here.


2. (5 pts) Plot the cumulative distribution function (CDF) for the crab species B mass distribution. To make this plot, generate 5000 random numbers from this normal distribution.

### Code to plot CDF goes here.


3. (5 pts) Congratulations, you have a new species A pet crab that weights 32 g! What is the z-score for this crab?

### Enter code here


4. (5 pts) What is the probability that a randomly-chosen crab from species B weighs more than 46 g?

### Enter code here


5. (5 pts) What is the probability that a randomly-chosen crab from species B weighs between 40-50 grams?

### Enter code here


6. (5 pts) What is the probability that a randomly-chosen crab from species B weighs either 35-38 grams or 44-48 grams?

### Enter code here


7. (5 pts) How short does a species A crab have to be to be in the bottom 20% of crab A weights?

### Enter code here


8. (10 pts) Congratulations again! You have a yet another new pet crab! Unfortunately, your excitement is tempered because you don’t know what species the crab is. All you know is that this crab weighs less than 30 grams. Calculate the probability that the crab is species A as well as the probability that the crab is species B. Based on these probabilities, what species do you think your crab is (provide answer below code)?

### Enter code here

My crab belongs to species A/B.



Part Three: Sampling distribution

The platypus genome sequence was completed in 2008 and has plenty of strange features. For this section, you will use data for the GC-content (percent of a gene that is nucleotide G or C, as opposed to A or T) across all platypus genes. Specifically, you will visualize and compare sampling distributions of the mean for platypus gene GC-content. Note that the GC-content of platypus genes has a mean of 51.16% and a standard deviation of 8.27%.

Four sampling distributions of the mean have been prepared for you in the following datasets (you can download these from the course website). Please read them into R using the readr package.


Question (25 pts) Visualize each of these four sampling distributions as a histogram and compute the mean and standard error for each sampling distribution. Provide the latter values (be sure to also show all code in the appropriate chunk!) in the table below. When you are done, describe the patterns you observe (both in the histograms and the computed values) as the size of the sampling distribution of the mean increases. Are patterns what you expect? Why or why not? Be sure to explain why patterns you expect are present and why patterns you might not expect are present.


#### User readr to read in all datasets here 
### All code for N=25 sampling distribution goes here.
### All code for N=75 sampling distribution goes here.
### All code for N=500 sampling distribution goes here.
### All code for N=2000 sampling distribution goes here.

Table of means and SE

N mean standard error
25
75
500
2000

Describe your results here in 3-5 sentences.