Enter your name here
1. (6 points) Consider the following scenario and answer the questions below:
Two researchers independently carry out separate clinical trials to test the same null hypothesis, that COX-2 selective inhibitors (which are used to treat arthritis) have no effect on the risk of cardiac arrest. They use the same population for their study, but one experimenter uses a sample size of 60 subjects whereas the other uses a sample size of 100. Assume that all other aspects of the studies, including significance levels (i.e. \(\alpha\)), are the same between the two studies.
Answer goes here.
Answer goes here.
Answer goes here.
2. (6 points) Consider the following scenario: You perform a one-sample t-test with a sample size of \(n=100\) and a significance level \(\alpha=0.1\). Answer the following questions in this context:
Answer goes here.
Answer goes here.
Answer goes here.
3. (2 points) Assume, for a given test, that the null hypothesis is actually true. Which one of the following statements is true?
Answer goes here.
4. (2 points) Assume, for a given test, that the null hypothesis is actually false. Which one of the following statements is true?
Answer goes here.
5. (4 points) In which of the following cases is a paired analysis (as opposed to an independent analysis) more appropriate? Note: there may be more than one.
Answer goes here.
Questions in this section concern the dataset mammalian_life_history.csv
, which can be downloaded from the course website. This dataset contains information about various mammalian species, whose Order, Family, and Genus groupings are all recorded. Life history information for each mammalian species includes:
mass_g
, the average adult mass, in gramsnewborn_g
, the average newborn mass, in gramswean_mass_g
, the average mass at the time of weaning, in gramsAFR_mo
, the age of first reproduction for females, in monthsgestation_mo
, the duration of gestation, in monthsweaning_mo
, the duration of weaning, in monthsmax_life_mo
, the age if the oldest individual ever recorded, in monthslitter_size
, the average litter sizelitters_per_year
, the average number of litters per yearThis dataset contains missing values (coded as NA) that you will need to remove before performing any hypothesis testing. There are several ways to remove missing data. The recommended way for this assignment is the following approach:
na.omit()
. This function will remove all rows in the data that contain NA. It is best to remove NA’s after subsetting data to avoid removing excessive rows (i.e. where a variable “we don’t care about” was NA, but our variable of interest had a real value.)Assume \(\alpha=0.05\) for all tests. Unless otherwise stated, use the function t.test()
to perform hypothesis testing. For each hypothesis test, you will need to do the following:
Before you begin, read the dataset into R:
### read in csv
1. (20 pts) Perform a two-tailed one-sample t-test to address the question: Do adult squirrels weigh, on average, 275 g? Note that squirrels belong to the genus Spermophilus. Perform the hypothesis testing in two ways (note that you only need to state hypotheses, check assumptions, and report results/conclusions once):
t.test()
### Code to check assumptions goes here.
### Code to perform t-test and calculate CI using R as a calculator goes here.
### Code to to perform t-test with the function t.test() goes here.
H0: State the null hypothesis here
HA: State the alternative hypothesis here
Write your results and conclusions here, including a brief statement about results from checking assumptions
.
2. (5 pts) Make a figure depicting the probability density function of the null distribution used in the hypothesis test in question 1 above, overlayed on the Standard Normal distribution (similar to slides in class 5 showing different t distributions).
Helpful hints:
stat_function()
(as in class4 and HW4).stat_function()
arguments to the plot call, as in: ggplot(data, aes(...)) + stat_function(first function) + stat_function(second function)
dt()
may be used to calculate densities for the Student’s t distribution and takes one argument: the degrees of freedom.c(-4,4)
.color="red"
in the relevant stat_function()
), and keep the normal distribution line in black. Do not fill the distributions.After you make the figure, explain any differences you see between the t distribution plotted and the standard normal.
#### Code to plot null distribution goes here.
Describe differences here in 2-3 sentences.
3. (10 pts) Compute a 90% confidence interval for the test from question 1, using R as a calculator below. (Hint: to compute a 95% CI, we use \(t_{0.025}\), with the appropriate degrees of freedom, to determine the limits. Think about which \(t\) to use instead for a 90% CI). Report your CI in the form a += b
below.
#### R code goes here
Report your CI here
4. (15 pts) Perform a two-sided two-sample (independent) t-test to address the question: Do even-toed ungulates (Order Artiodactyla) and whales (Order Cetacea) have, on average, different litter sizes? Carry out this test using the function t.test()
.
### All R code goes here.
H0: State the null hypothesis here
HA: State the alternative hypothesis here
Write your results and conclusions here, including a brief statement about results from checking assumptions
.
5. (5 pts) Based on the output from t.test()
, answer the following questions. Do not perform any additional t-tests, but minor calculations are fine.
#### R code goes here.
The P-value would have been: Answer goes here.
The P-value would have been: Answer goes here.
6. (15 pts). Perform an two-sided paired t-test to address the question: Do species in family Bovidae (cows and similar) have different, on average, gestation times and weaning times? Carry out this test using the function t.test()
.
### All R code goes here.
H0: State the null hypothesis here
HA: State the alternative hypothesis here
Write your results and conclusions here, including a brief statement about results from checking assumptions
.
7 (10 pts). Perform the same hypothesis test as in question 6 (only the call to t.test()
), but this time run it as an independent two-sample t-test (still two-sided). Compare the test output between this approach and the paired approach taken in question 6, focusing on differences between the resulting P-value and confidence intervals. Explain why these values differ between approaches and this difference tells you about the relative power of using either approach to compare paired data.
### All R code goes here.
Compare test results here, in 3-5 sentences.