\clearpage
# PREPARATION
```{r oppts, include=FALSE}
# set global chunk options...
# this changes the defaults so you don't have to repeat yourself
knitr::opts_chunk$set(comment = NA,
cache = TRUE,
echo = TRUE,
warning = FALSE,
message = FALSE,
fig.align = "center", # center all figures
fig.width = 6, # set default figure width to 4 inches
fig.height = 4) # set default figure height to 3 inches
```
## Load Packages
* Make sure the packages are **installed** *(Package tab)*
```{r libraries}
library(tidyverse) # Loads several very helpful 'tidy' packages
library(rio) # Read in datasets
library(furniture) # Nice tables (by our own Tyson Barrett)
library(educ6600) # install with remotes::install_github("tysonstanley/educ6600")
```
## Ihno's Dataset for Section C's
Import Data, Define Factors, and Compute New Variables
* Make sure the **dataset** is saved in the same *folder* as this file
* Make sure the that *folder* is the **working directory**
> NOTE: I added the second line to convert all the variables names to lower case. I still kept the `F` as a capital letter at the end of the five factor variables.
```{r ihno}
data_clean <- import("Ihno_dataset.xls") %>%
dplyr::rename_all(tolower) %>%
dplyr::mutate(genderF = factor(gender,
levels = c(1, 2),
labels = c("Female",
"Male"))) %>%
dplyr::mutate(majorF = factor(major,
levels = c(1, 2, 3, 4,5),
labels = c("Psychology",
"Premed",
"Biology",
"Sociology",
"Economics"))) %>%
dplyr::mutate(reasonF = factor(reason,
levels = c(1, 2, 3),
labels = c("Program requirement",
"Personal interest",
"Advisor recommendation"))) %>%
dplyr::mutate(exp_condF = factor(exp_cond,
levels = c(1, 2, 3, 4),
labels = c("Easy",
"Moderate",
"Difficult",
"Impossible"))) %>%
dplyr::mutate(coffeeF = factor(coffee,
levels = c(0, 1),
labels = c("Not a regular coffee drinker",
"Regularly drinks coffee"))) %>%
dplyr::mutate(hr_base_bps = hr_base / 60) %>%
dplyr::mutate(anx_plus = rowsums(anx_base, anx_pre, anx_post)) %>%
dplyr::mutate(hr_avg = rowmeans(hr_base, hr_pre, hr_post)) %>%
dplyr::mutate(statDiff = statquiz - exp_sqz)
```
## Other Datasets for Section B's
```{r data}
## Load data from educ6600 package
data("schizo")
data("GRE")
data("test_scores")
data("child_vars")
data("memory")
data("data_wait")
data("data_food")
data("data_undergrad")
data("data_memory")
```
\clearpage
# Chapter 7. Independent Samples *t*-Test for Means
## Section C
### 7C-1. Independent Samples *t*-Test for Mean `hr_base` by `genderF`
**TEXTBOOK QUESTION:** *Perform a two-sample t test to determine whether there is a statistically significant difference in **baseline heart rate** between the **men and the women** of Inho's class. Do you have **homogeneity of variance**? Report your results as they might appear in a journal article. Include the 95% CI for this gender difference.*
#### Assumtion Check: Homogeneity of Variance
**DIRECTIONS:** Before performing the test, check to see if the assumption of homogeneity of variance is met using **Levene's Test**. For a independent samples *t*-test for means, the men and women need to have the same amount of spread (SD) in their baseline hear rates.
> **NOTE:** Use the `car:leveneTest()` function to do this. Inside the funtion you need to specify at least three things (sepearated by commas):
> * the formula: `continuous_var ~ grouping_var` (replace with your variable names)
> * the dataset: `data = .` to pipe it from above
> * the center: `center = "mean"` since we are comparing means
```{r}
```
\clearpage
#### Perform the *t*-Test for Means in 2 Indep Groups
**DIRECTIONS:** Test if men and women have different baseline heart rates using the `t.test()` function.
>Use the same `t.test()` funtion we have used in the prior chapters. This time you need to speficy a few more options:
> * the formula: `continuous_var ~ grouping_var` (replace with your variable names)
> * the dataset: `data = .` to pipe it from above
> * independent vs. paired: `paired = FALSE` (this is the default)
> * is homogeneity satified: `var.equal = TRUE` (NOT the default)
> * confidence level: `conf.level = #` (defults to .95)
```{r}
# indep groups t-test for means: hr_base by genderF
```
**ANSWER:** Interpret the results of the t-test: Does the test suggest men and women have idfferent baseline heart rates?
\clearpage
### 7C-5. Independent Samples *t*-Test for Mean `hr_post` by `coffeeF`
**TEXTBOOK QUESTIONS:** *Perform a two-sample t test to determine whether **coffee drinkers** exhibited significantly higher **postquiz heart rates** than nondrinkers at the .05 level. Is this t test significant at the .01 level? Find the **99%** confidence interval for the difference of the two population means and explain its connection to your decision regarding the null hypothesis at the **.01 level**.*
#### Assumtion Check: Homogeneity of Variance
**DIRECTIONS:** Just like the last question, run **Levene's test** first.
```{r}
```
#### Perform the *t*-Test for Means in 2 Indep Groups
**DIRECTIONS:** Make sure to change the confidence level to **99%**.
```{r}
# indep groups t-test for means: hr_post by coffeeF
```
# Chapter 8
### No R for Chapter 8.
\clearpage
# Chapter 12. ANOVA
## Section B
### 12B-4 Introduce One-Way ANOVA
**TEXTBOOK QUESTION:** *A social psychologist wants to know how long people will wait before responding to cries for help from an unknown person and whether the gender or age of the person in need of help makes any difference. One at a time, subjects sit in a room waiting to be called for an experiment. After a few minutes they hear cries for help from the next room, which are actually on a tape recording. The cries are in either an adult male's, an adult female's, or a child's voice; seven subjects are randomly assigned to each condition. The dependent variable is the number of seconds from the time the cries begin until the subject gets up to investigate or help. (a) Calculate the F ratio. (b) Find the critical F ($\alpha = .05$). (c) What is your statistical conclusion? (d) Present the results of the ANOVA in a summary table. (e) Calculate $\eta^2$ using Formula 12.10. *
```{r Q12b4_data}
# Display the raw dataset: wide format
data_wait
```
First, the data must be restructured from **wide** to **long** format, so that each observation is on its own line. All categorical variables must be declared as fators. We also must add an distinct indicator variable.
```{r Q12b4_restructure}
# convert the dataset: wide --> long
data_wait_long <- data_wait %>%
tidyr::gather(key = caller_type, # new var name = groups
value = delay_time, # new var name = measurements
child, woman, man) %>% # all old variable names
dplyr::mutate(id = row_number()) %>% # create a sequential id variable
dplyr::select(id, caller_type, delay_time) %>% # reorder the variables
dplyr::mutate_at(vars(id, caller_type), factor) # declare factors
data_wait_long %>% head(n = 10) # display the top 10 rows only
```
\clearpage
Second, check the summary statistics for each group.
```{r Q12b4_summary, results="asis"}
# Raw data: summary table
data_wait_long %>%
dplyr::group_by(caller_type) %>% # divide into groups
furniture::table1(delay_time, # gives M(SD)
output = "markdown") # add chunk option: results="asis"
```
Third, plot the data to eyeball the potential effect. Remember the center line in each box represents the median, not the mean.
```{r Q12b4_boxplot, fig.width=4, fig.height=2}
# Raw data: boxplots
data_wait_long %>%
ggplot(aes(x = caller_type,
y = delay_time)) +
geom_boxplot() +
geom_point()
```
```{r Q12b4_plot, fig.width=4, fig.height=2}
# Raw data: plot M(SD)
data_wait_long %>%
ggplot(aes(x = caller_type,
y = delay_time)) +
stat_summary()
```
\clearpage
### Tutorial - Fitting One-way ANOVA Models with `afex::aov_4()`
The `aov_4()` function from the `afex` package fits ANOVA models (oneway, two-way, repeated measures, and mixed design). It needs at least two arguments:
1. formula: `continuous_var ~ group_var + (1|id_var)` *one observation per subject and `id_var` is distinct for each subject*
2. dataset: `data = .` *we use the period to signify that the datset is being piped from above*
Here is an outline of what your syntax should look like when you **fit and save a one-way ANOVA**. Of course you will replace the dataset name and the variable names, as well as the name you are saving it as.
> **NOTE:** The `aov_4()` function works on data in LONG format only. Each observation needs to be on its one line or row with seperate variables for the group membership (categorical factor or `fct`) and the continuous measurement (numberic or `dbl`).
```{r EX_aov_4, eval=FALSE}
# One-way ANOVA: fit and save
aov_name <- data_name %>%
afex::aov_4(continuous_var ~ group_var + (1|id_var),
data = .)
```
------------------------------
By running the name you saved you model under, you will get a brief set of output, including a measure of **Effect Size**.
> **NOTE:** The `ges` is the *generalized eta squared*. In a one-way ANOVA, the eta-squared effect size is the same value, ie. generalized $\eta_g$ and partial $\eta_p$ are the same.
```{r EX_aov_4_brief, eval=FALSE}
# Display basic ANOVA results (includes effect size)
aov_name
```
------------------------------
To fully fill out a standard ANOVA table and compute other effect sizes, you will need a more complete set of output, including the **Sum of Squares** components, you will need to add `$Anova` at the end of the model name before running it.
> **NOTE:** IGNORE the first line that starts with `(Intercept)`! Also, the 'mean sum of squares' are not included in this table, nor is the **Total** line at the bottom of the standard ANOVA table. You will need to manually compute these values and add them on the homework page. Remember that `Sum of Squares (SS)` and `degrees of freedom (df)` add up, but `Mean Sum of Squreas (MS)` do not add up. Also: `MS = SS/df` for each term.
```{r EX_aov_4_fuller, eval=FALSE}
# Display fuller ANOVA results (includes sum of squares)
aov_name$Anova
```
\clearpage
**DIRECTIONS:** Fit an one-way ANOVA model for the differences in mean `wait_time` for each of the three independent `caller_type` groups with the `afex::aov_4()` function and save the results under the name `aov_wait_time`.
```{r Q12b4_model}
# One-way ANOVA: fit and save
```
Remember, since you are saving your model to a name *(`aov_wait_time`)*, there will not be any output, except a message about setting contrasts to `contr.sum`.
-------------------------
**DIRECTIONS:** Request the omnibus $F$ value by typing the name you saved your fitted model as above (`aov_wait_time`). This time you need only remove the pound symbol at the start of the line in the code chunk below.
```{r Q12b4_results}
# Display basic ANOVA results (includes effect size)
#aov_wait_time
```
------------------------------------------
**DIRECTIONS:** Request the more complete summary table by adding `$Anova` at the end of the name you saved your fitted model as above. This time you need only remove the pound symbol at the start of the line in the code chunk below.
```{r Q12b4c_table}
# Display fuller ANOVA results (includes sum of squares)
#aov_wait_time$Anova
```
\clearpage
### 12B-5 Another One-Way ANOVA
**TEXTBOOK QUESTION:** *A psychologist is interested in the relationship between color of food and appetite. To explore this relationship, the researcher bakes small cookies with icing of one of three different colors (green, red, or blue). The researcher offers cookies to subjects while they are performing a boring task. Each subject is run individually under the same conditions, except for the color of the icing on the cookies that are available. Six subjects are randomly assigned to each color. The number of cookies consumed by each subject during the 30-minute session is shown in the following table. (a) Calculate the F ratio. (b) Find the critical F ($\alpha = .01$). (c) What is your statistical decision with respect to the null hypothesis? (d) Present your results in the form of a summary table.*
```{r Q12b5_data}
# Display the raw dataset: wide format
data_food
```
First, the data must be restructured from **wide** to **long** format, so that each observation is on its own line. All categorical variables must be declared as fators. We also must add an distinct indicator variable.
```{r Q12b5_restructure}
# convert the dataset: wide --> long
data_food_long <- data_food %>%
tidyr::gather(key = icing_color, # new var name = groups
value = cookies_ate, # new var name = measurements
green, red, blue) %>% # all old variable names
dplyr::mutate(id = row_number()) %>% # create a sequential id variable
dplyr::select(id, icing_color, cookies_ate) %>% # reorder the variables
dplyr::mutate_at(vars(id, icing_color), factor) # declare factors
data_food_long %>% head(n = 10)
```
\clearpage
**DIRECTIONS:** Request the summary statistics for each group using the `table1()` function from the `furniture` package, after piping a `dplyr::group_by(group_var)` step.
```{r Q12b5_summary, results="asis"}
```
------------------------------------------
**DIRECTIONS:** Plot the raw data for each group using the `stat_summary()` layer in `ggplot(aes(x = group_var, y = contin_var))`.
```{r Q12b5_plot}
# Raw data: plot M(SD)'s
```
\clearpage
**DIRECTIONS:** Fit an one-way ANOVA model for the difference in mean `cookies_ate` for each of the three independent `icing_color` groups with the `afex::aov_4()` function and save the results under the name `aov_food_time`.
```{r Q12b5_aov}
# One-way ANOVA: fit and save
```
------------------------------------------
**DIRECTIONS:** Request the $F$ value by typing the name you saved your fitted model as above.
```{r Q12b5_aov_basic}
# Display basic ANOVA results (includes effect size)
```
------------------------------------------
**DIRECTIONS:** Request the more complete summary table by adding `$Anova` at the end of the name you saved your fitted model as above.
```{r Q12b5_aov_fuller}
# Display fuller ANOVA results (includes sum of squares)
```
\clearpage
### 12B-6 The Effect of Larger Mean Values
**TEXTBOOK QUESTION:** *Suppose that the data in Exercise 5 had turned out differently. In particular, suppose that the number of cookies eaten by subjects in the green condition remains the same, but each subject in the red condition ate 10 more cookies than in the previous data set, and each subject in the blue condition ate 20 more. (a) Calculate the F ratio. Is the new F ratio significant at the .01 level? (b) Which part of the F ratio has changed from the previous exercise and which part has remained the same? (c) Put your results in a summary table to facilitate comparison with the results of Exercise 5. (d) Calculate estimated $\omega^2$ with Formula 12.12 and adjusted $\eta^2$ with Formula 12.14. Are they the same? Explain.*
BEFORE you restructured from **wide** to **long** format, abb 10 to the red counts and add 20 to the blue counts.
```{r Q12b6_restructure}
# Revised wide dataset
data_food_long2 <- data_food %>%
dplyr::mutate(red = 10 + red) %>% # NEW VALUES = 10 + OLD !!!
dplyr::mutate(blue = 20 + blue) %>% # NEW VALUES = 20 + OLD !!!
tidyr::gather(key = icing_color, # new var name = groups
value = cookies_ate, # new var name = measurements
green, red, blue) %>% # all old variable names
dplyr::mutate(id = row_number()) %>% # create a sequential id variable
dplyr::select(id, icing_color, cookies_ate) %>% # reorder the variables
dplyr::mutate_at(vars(id, icing_color), factor) # declare factors
data_food_long2 %>% head(n = 10)
```
\clearpage
**DIRECTIONS:** Request the summary statistics for each group using the `table1()` function from the `furniture` package, after piping a `dplyr::group_by(group_var)` step.
```{r Q12b6_summary, results="asis"}
# Raw data: summary table
```
------------------------------------------
**DIRECTIONS:** Plot the raw data for each group using the `stat_summary()` layers in `ggplot(aes(x = group_var, y = contin_var))`.
```{r Q12b6_plot}
# Raw data: plot M(SD)
```
\clearpage
**DIRECTIONS:** Fit an one-way ANOVA model for the difference in mean `cookies_ate` for each of the three independent `icing_color` groups with the `afex::aov_4()` function and save the results under the name `aov_food_time2`.
```{r Q12b6_model}
# One-way ANOVA: fit and save
```
------------------------------------------
**DIRECTIONS:** Request the $F$ value by typing the name you saved your fitted model as above.
```{r Q12b6_results}
# Display basic ANOVA results (includes effect size)
```
------------------------------------------
**DIRECTIONS:** Request the more complete summary table by adding `$Anova` at the end of the name you saved your fitted model as above.
```{r Q12b6_table}
# Display fuller ANOVA results (includes sum of squares)
```
\clearpage
## Section C
### 12C-1 Does Post-Quiz Heart Rate Differ by Difficulty Level?
**TEXTBOOK QUESTION:** *Perform a one-way ANOVA to test whether the different experimental conditions had a significant effect on postquiz heart rate. Request descriptive statistics and an HOV test. Calculate eta squared from your ANOVA output, and present your results in APA style.*
**DIRECTIONS:** Request the summary statistics for each group using the `table1()` function from the `furniture` package, after piping a `dplyr::group_by(group_var)` step.
```{r Q12c1_summary, results="asis"}
# Raw Data: summary table
```
------------------------------------------
**DIRECTIONS:** Plot the raw data for each group using the `stat_summary()` layer in `ggplot(aes(x = group_var, y = contin_var))`.
```{r Q12c1_plot, fig.height=2.75, fig.width=5}
# Raw data: plot M(SD)
```
\clearpage
**DIRECTIONS:** Use the `leveneTest()` function from the `car` package to test if the data give any evidence of a violation of *Homoegeity of Variance (HOV)*.
> **NOTE:** We learned how to do this in [chapter 7](https://sarbearschwartz.github.io/Quant_I/t-test-for-the-difference-in-2-means-independent-samples.html#assumtion-check-homogeneity-of-variance)
```{r Q12c1_hov}
# Levene's Test of HOV
```
------------------------------------------
**DIRECTIONS:** Fit an one-way ANOVA model for the difference in mean `hr_post` for each of the three independent `exp_condF` groups *(make sure to use the factor version)* with the `afex::aov_4()` function and save the results under the name `aov_hr_post` for future use.
> **NOTE:** The identification variable is called `sub_num` in this dataset, not `id`.
```{r Q12c1_aov}
# One-way ANOVA: fit and save
```
------------------
**DIRECTIONS:** Request the $F$ value by typing the name you saved your fitted model as above.
```{r Q12c1_aov_basic}
# Display basic ANOVA results (includes effect size)
```
\clearpage
### 12C-2a Do the Math and Stat Quiz Scores Differ by College Major?
**TEXTBOOK QUESTION:** *Using college major as the independent variable, perform a one-way ANOVA to test for significant differences in both mathquiz and statquiz . Request descriptive statistics and an HOV test. Based on the HOV test, for which DV should you consider performing an alternative ANOVA test? For whichever DV yields a p value between .05 and .1, report its results as a trend. For whichever DV yields a p value less than .05, calculate the corresponding value of eta squared, and report the ANOVA results, along with the means for the groups, in APA style.*
```{r Q12c2_summary, results="asis"}
# Raw Data: summary table
data_clean %>%
dplyr::group_by(majorF) %>%
furniture::table1(mathquiz, statquiz, # gives M(SD)
output = "markdown") # add chunk option: results="asis")
```
\clearpage
**DIRECTIONS:** Plot the raw data for each group using the `stat_summary()` layer in `ggplot(aes(x = group_var, y = contin_var))`. Do this TWICE, once with `y = mathquiz` and then again with `y = statquiz`.
```{r Q12c2_math_plot, fig.height=2.75, fig.width=5}
# Raw data: plot M(SD) - Math Quiz
```
```{r Q12c2_stat_plot, fig.height=2.75, fig.width=5}
# Raw data: plot M(SD) - Stat Quiz
```
\clearpage
#### Math Quiz - All Five Majors
**DIRECTIONS:** Use the `car::leveneTest()` to test for violations of *HOV*.
```{r Q12c2_mathQ_hov}
# Levene's Test of HOV
```
----------------------------
**DIRECTIONS:** Fit an one-way ANOVA model using `afex::aov_4()`.
> **NOTE:** Because some of the students are missing the `mathquiz` variable, you will need to preceed the `aov_4()` step with `dplyr::filter(complete.cases(mathquiz, majorF))` in the pipeline.
```{r Q12c2_mathQ_aov}
# One-way ANOVA: fit and save
```
---------------------
**DIRECTIONS:** Request the $F$ value by typing the name you saved your fitted model as above.
```{r Q12c2_mathQ_aov_basic}
# Display basic ANOVA results (includes effect size)
```
\clearpage
#### Stat Quiz - All Five Majors
**DIRECTIONS:** Use the `car::leveneTest()` to test for violations of *HOV*.
```{r Q12c2_statQ_hov}
# Levene's Test of HOV
```
---------------------
**DIRECTIONS:** Fit an one-way ANOVA model using `afex::aov_4()`.
```{r Q12c2_statQ_aov}
# One-way ANOVA: fit and save
```
---------------------
**DIRECTIONS:** Request the $F$ value by typing the name you saved your fitted model as above.
```{r Q12c2_statQ_aov_basic}
# Display basic ANOVA results (includes effect size)
```
\clearpage
### 12C-3 Remove Two Majors and Repeat
#### Math Quiz - Only Three Majors
**TEXTBOOK QUESTION:** *Repeat Exercise 2 after using Select Cases to eliminate all of the psychology and premed students.*
> **NOTE:** You will need to preceed Levene's Test with `dplyr::filter(majorF %in% c("Biology", "Sociology", "Economics"))` in the pipeline in order to subset the data.
**DIRECTIONS:** Use the `car::leveneTest()` to test for violations of *HOV*.
```{r Q12c3_math_hov}
# Levene's Test of HOV
```
--------------------------
**DIRECTIONS:** Fit an one-way ANOVA model using `afex::aov_4()`.
> **NOTE:** Here you will need both the filter step for subsetting majors and the filter step to restrict to complete cases. The order of the two `dplyr::filter()` steps does not matter.
```{r Q12c3_math_aov}
# One-way ANOVA: fit and display
```
\clearpage
#### Stat Quiz - Only Three Majors
**DIRECTIONS:** Use the `car::leveneTest()` to test for violations of *HOV*.
> **NOTE:** You will need to preceed Levene's Test and the ANOVA with `dplyr::filter(majorF %in% c("Biology", "Sociology", "Economics"))` in the pipeline in order to subset the data.
```{r Q12c3_stat_hov}
# Levene's Test of HOV
```
--------------------------
**DIRECTIONS:** Fit an one-way ANOVA model using `afex::aov_4()`.
```{r Q12c3_stat_aov}
# One-way ANOVA: fit and display
```
\clearpage
### 12C-5 Phobia Group vs. Difference (Pre-Post) Heart Rate
**TEXTBOOK QUESTION:** *Use Recode to create a grouping variable from phobia , such that Group 1 contains those with phobia ratings of 0, 1, or 2; Group 2 = 3 or 4; and Group 3 = 5 or more (you might call the new variable Phob_group ). Then use Transform to create another new variable, hr_diff , that equals hr_pre minus hr_base . Perform a one-way ANOVA on hr_diff using Phob_group as the factor. Request descriptive statistics. Report the results in APA style, including the means of the three groups. Explain what this ANOVA demonstrates, in terms of the variables involved.*
```{r Q12c5_data}
data_new <- data_clean %>%
dplyr::mutate(phob_group = case_when(phobia <3 ~ 1,
phobia %in% c(3, 4) ~ 2,
phobia >= 5 ~ 3)) %>%
dplyr::mutate(phob_group = factor(phob_group,
levels = c(1, 2, 3),
labels = c("Low", "Moderate", "High"))) %>%
dplyr::mutate(hr_diff = hr_pre - hr_base)
```
\clearpage
**DIRECTIONS:** Request the summary statistics for each group using the `table1()` function from the `furniture` package, after piping a `dplyr::group_by(group_var)` step.
```{r Q12c5_summary, results='asis'}
# Raw data: summary table
```
-------------------------
**DIRECTIONS:** Use the `car::leveneTest()` to test for violations of *HOV*.
```{r Q12c5_hov}
# Levene's Test of HOV
```
-------------------------
**DIRECTIONS:** Fit and save a one-way ANOVA model using `afex::aov_4()`.
```{r Q12c5_aov}
# One-way ANOVA: fit and save
```
-------------------------
**DIRECTIONS:** Request the $F$ value by typing the name you saved your fitted model as above.
```{r Q12c5_aov_basic}
# Display basic ANOVA results (includes effect size)
```
# Chapter 13. Multiple Comparisons
## Section C
### 13C-1a One-Way ANOVA: LSD and Tukey as Post Hoc tests
**TEXTBOOK QUESTION:** *(A) Redo the one-way ANOVA requested in exercise #1 in Section C of the previous chapter, selecting both LSD and Tukey as Post Hoc tests. For postquiz heart rate, which pairs of experimental conditions differ significantly from each other, according to each test? Can you justify using the results of the LSD test?*
**DIRECTIONS:** Using the ANOVA model saved as `aov_hr_post` previously, request all pair wise post hoc comparisons, by first piping `emmeans::emmeans(~ group_var)` followed by `pairs(adjust = "none")` to utilize Fisher's LSD correction for multiple comparisons.
```{r Q13c1a_fisherlsd}
# Pairwise post hoc: Fisher's LSD adjustment for multiple comparisons
```
--------------------------------
**DIRECTIONS:** Repeat the above, but use `pairs(adjust = "tukey")` to utilize Tukey's HSD correction for multiple comparisons.
```{r Q13c1a_tukeyhsd}
# Pairwise post hoc: Tukey's HSD adjustment for multiple comparisons
```
\clearpage
### 13C-1c Contrast: Impossible vs. Others
**TEXTBOOK QUESTION:** *(C) Perform a contrast to compare the "impossible" condition with the other three for postquiz heart rate. How does the significance of this contrast compare to the one-way ANOVA? Explain. Looking at the means for the four conditions, design a contrast that you think would capture a large proportion of between-group variance.*
#### Tutorial - Contrast Statements for One-way ANOVA models
There are two steps to conduct a contrast comparison:
1. `emmeans(~ group_var)` - Calculate the *Estimated Marinal Means*
2. `contrast()` - Determine if each pair is *significantly different*
Inside the contrast statement, list the named sets of linear contrast weights. We will only be doing one-at-a-time, but we must still use a nested `list`.
> **NOTE:** You must provide one weight ($c_i$) for each of the $k$ groups. If you wish to ignore a group, that group's weight is $c_i = 0$. The sum total of all the weights must be zero ($\sum c_i = 0$), so use positive and negative numbers.
```{r EX_contrast, eval=FALSE}
# Contrast statement : Impossible vs. Rest
aov_name %>%
emmeans::emmeans(~ group_var) %>%
emmeans::contrast(list("your contrast name" = c(c_1, c_2, ... , c_k)))
```
--------------------
**DIRECTIONS:** Using the sample recipe code chunk above as an outline, perform a contrast to compare the "impossible" condition with the other three for postquiz heart rate.
```{r Q13c1c_contrast}
# Contrast statement : Impossible vs. Rest
```
\clearpage
### 13C-2a Post Hoc Pairwise: Tukey and Bonferroni
**TEXTBOOK QUESTION:** *Redo the one-way ANOVA requested in Section C, exercise 2 of the previous chapter just for the mathquiz variable, selecting both Tukey and Bonferroni as Post Hoc tests in each case. Why is it problematic to use HSD with major as the factor in this dataset? Given the results of the post hoc tests, does the Tukey or Bonferroni test seem to have greater power when testing all possible pairs of means?*
**DIRECTIONS:** Fit an one-way ANOVA model for the difference in mean `mathquiz` for each `majorF` and save the results under the name `aov_math_major`.
```{r Q13c2a_aov}
# One-way ANOVA: fit and save
```
------------------------
**DIRECTIONS:** Request all pairwise post hoc comparisons TWICE, once via Tukey's HSD with the `adjust = "tukey"` option and a second time with `adjust = "bon"` within the `pairs()` function, applied after piping a `emmeans(~ group_var)` step to the ANOVA model.
```{r Q13c2a_tukey}
# Pairwise post hoc: Tukey's HSD adjustment for multiple comparisons
```
```{r Q13c2a_bon}
# Pairwise post hoc: Bonferroni adjustment for multiple comparisons
```
### 13C-2b Contrast: (Biology and Sociology) vs. other three majors
**TEXTBOOK QUESTION:** *Redo the one-way ANOVA requested in Section C, exercise 2 of the previous chapter just for the statquiz variable and request a contrast that compares the average of the Biology and Sociology majors to the average of the other three majors. Would this contrast be significant if it had been planned? Would this contrast be significant according to Scheffe's test?*
**DIRECTIONS:** Fit an one-way ANOVA model using `afex::aov_4()` and add via pipes both `emmeans::emmeans(~ group_var)` and `contrast()` with appropriate weights.
```{r Q13c2b_contrast}
# Contrast statement: Bio and Soc vs. rest
```
\clearpage
### 13C-4a One-Way ANOVA: prequiz anxiety by Phobia Group - LSD and Bonferroni
**TEXTBOOK QUESTION:** *Perform a one-way ANOVA on the prequiz anxiety measurement ( anx_pre ) using the grouping variable you created in Section C, exercise 5 of the previous chapter (based on phobia ratings). Select both LSD and Bonferroni as your post hoc tests. Which pairs differ significantly for each test?*
**DIRECTIONS:** Fit an one-way ANOVA model for the difference in mean `anx_pre` for each `phob_group` and save the results under the name `aov_anx_phob`.
```{r Q13c4a_aov}
# One-way ANOVA: fit and save
```
--------------------------
**DIRECTIONS:** Request all pairwise post hoc comparisons TWICE, once via Fisher's LSD with the `adjust = "none"` option and a second time with `adjust = "bon"` within the `pairs()` function, applied after piping a `emmeans(~ group_var)` step to the ANOVA model.
```{r Q13c4a_emmeans}
# Pairwise post hoc: Fisher's LSD adjustment for multiple comparisons
```
```{r Q13c4a_emmeans_bon}
# Pairwise post hoc: Bonferroni adjustment for multiple comparisons
```
\clearpage
### 13C-4b Contrast: Students (low or moderate) phobia vs. high
**TEXTBOOK QUESTION:** *Perform a contrast that compares students who had reported low or moderate phobia with those reporting high phobia. Calculate the effect size for this contrast. Is it small, medium, or large?*
**DIRECTIONS:** Starting with the previously fitted `aov_anx_phob` ANOVA model, add via pipes both `emmeans::emmeans(~ group_var)` and `contrast()` with appropriate weights.
```{r Q13c4b_contrast}
# Contrast statement: high vs. rest
```
# Chapter 14. Two-Way ANOVA
## Section B
### 14B-7a 3x4 Two ANOVA
**TEXTBOOK QUESTION:** *A college is conducting a study of its students' expectations of employment upon graduation. Students are sampled by class and major area of study and are given a score from 0 to 35 according to their responses to a questionnaire concerning their job preparedness, goal orientation, and so forth. The data appear in the following table. (a) Perform a two-way ANOVA and create a summary table.*
```{r Q14b7a_restructure}
# convert the dataset: wide --> long
data_undergrad_long <- data_undergrad %>%
dplyr::mutate(class = factor(class,
levels = c(1, 2, 3, 4),
labels = c("Freshmen",
"Sophomores",
"Juniors",
"Seniors"))) %>%
tidyr::gather(key = major,
value = expect_employ,
humanities, science, business) %>%
dplyr::mutate(id = row_number()) %>%
dplyr::select(id, class, major, expect_employ)
data_undergrad_long %>% head(n = 12)
```
\clearpage
#### Tutorial - Fitting Two-way ANOVA Models with `afex::aov_4()`
The only difference between a one-way and two-way ANOVA's syntax is the inclusion of a second grouping variable in the formula.
> **NOTE:** The astric (`*`) is used to designate the interaction and main effects between two factors. `group_var1*group_var2` is short for `group_var1 + group_var2 + group_var1:group_var2`. The colon (`:`) designates an interaction.
```{r EX_aov_4_2way, eval=FALSE}
# Two-way ANOVA: fit and save
aov_name <- data_name %>%
afex::aov_4(continuous_var ~ group_var1*group_var2 + (1|id_var),
data = .)
```
-----------------------
**DIRECTIONS:** Fit a two-way ANOVA model for the difference in mean `expect_employ` for each of the combinations between the four-level `class` factor and three-level `major` factor with the `afex::aov_4()` function and save the results under the name `aov_employ`.
```{r Q14b7a_aov}
# Two-way ANOVA: fit and save
```
------------------------------------------
**DIRECTIONS:** Request the more complete summary table by adding `$Anova` at the end of the name you saved your fitted model as above.
```{r Q14b7a_aov_fuller}
# Display fuller ANOVA results (includes sum of squares)
```
\clearpage
### 14B-7b Plot Cell Means
**TEXTBOOK QUESTION:** *(B) Draw a graph of the cell means. Does the interaction obscure the interpretation of the main effects?*
#### Tutorial - Cell Means: Displaying in a Grid and Plotting
For a two-way ANOVA, we often would like to see a grid of the means for all combinations of the two grouping factors. This is may be achieved by the following steps:
1. `group_by` - group observations by both of the grouping variables
2. `summarise` - compute the mean of each combination subgroup
3. `spread` -spread the means into a grid pattern
```{r EX_cell_means_grid, eval = FALSE}
# Raw data: 2-way table of means (i.e. cell means)
data_long %>%
dplyr::group_by(group_var1, group_var2) %>%
dplyr::summarise(mean = mean(continuous_var)) %>%
tidyr::spread(key = group_var1,
value = mean)
```
--------------------------------
Here is the 3x4 grid of cell means, giving the average for each of the 12 combinations of `class` and `major`.
```{r Q14b7b_means}
# Raw data: 2-way table of means (i.e. cell means)
data_undergrad_long %>%
dplyr::group_by(class, major) %>%
dplyr::summarise(mean = mean(expect_employ)) %>%
tidyr::spread(key = class,
value = mean)
```
\clearpage
To incorporated a second grouping variable into the plot, we can use `shape` and/or `color`. I prefer to use both to ensure that the color distinction is not completely lost when photo coping or if a reader is color blind.
> **NOTE:** The inclusion of `stat_summary(position=position_dodge(width=0.25))` within the `stat_summary()` function causes the points to be slightly offset so that points are not drawn on top of each other.
```{r Q14b7b_plot}
# Raw data: plot M(SD)
data_undergrad_long %>%
ggplot(aes(x = class,
y = expect_employ,
shape = major,
color = major)) +
stat_summary(position=position_dodge(width=0.25))
```
\clearpage
### 14B-7c Pairwise Post Hoc with Tukey's HSD
**TEXTBOOK QUESTION:** *(C) Use Tukey's HSD to determine which pairs of class years differ significantly.*
**DIRECTIONS:** Request the summary statistics for `expect_employ` within each `class` using the `table1()` function from the `furniture` package, after piping a `dplyr::group_by(group_var)` step.
```{r Q14b7c_means, results="asis"}
```
---------------------
**DIRECTIONS:** Plot the raw data for each `class` using the `stat_summary()` layer in `ggplot(aes(x = group_var, y = contin_var))`.
```{r Q14b7c_plot, fig.width=4, fig.height=2.5}
# Raw data: plot M(SD)
```
\clearpage
#### Tutorial - Pairwise Post Hoc comparisons for a One-way ANOVA
There are two steps to conduct all possible pairwise comparisons:
1. `emmeans(~ group_var)` - Calculate the *Estimated Marinal Means*
2. `pairs()` - Determine if each pair is *significantly different*
Within the `pairs()` function there are several options for controling for multiple comparisons, including:
* `adjust = "none"` - Fisher's LSD
* `adjust = "tukey"` - Tukey's HSD
* `adjust = "bon"` - Bonferroni
```{r EX_tukey, eval=FALSE}
# Pairwise post hoc: Tukey's HSD adjustment for multiple comparisons
aov_name %>%
emmeans::emmeans(~ group_var) %>% # Calculate Estimated Marinal Means
pairs(adjust = "tukey") # Is each pair signif different?
```
---------------------------
**DIRECTIONS:** Request all pairwise post hoc comparisons via Tukey's HSD with the `adjust = "tukey"` option in the `pairs()` function, applied after piping a `emmeans(~ group_var)` step to the ANOVA model.
```{r Q14b7c_tukey}
# Pairwise post hoc: Tukey's HSD adjustment for multiple comparisons
```
\clearpage
### 14B-7d 2x2 Contrast Statements to Test Extremes
**TEXTBOOK QUESTION:** *For just the freshmen and seniors, calculate the three possible interaction contrasts. Which, if any, would be significant according to Scheffe's test?*
The following code chunk will display the means for all combinations of the two grouping variables so that you can figure out which order to set up the contrast weights ($c_i$'s).
```{r Q14b7d_emmeans}
# Request all emmeans: see ORDER for contrast weights to be entered below
#aov_employ %>%
# emmeans::emmeans(~ class*major)
```
```{r Q14b7d_contrast1}
# 2x2 Contrast statement (Freshmen vs. Seniors): Humanities vs. Science
# aov_employ %>%
# emmeans::emmeans(~ class*major) %>%
# emmeans::contrast(list("fr-sr X Hum-Sc" = c( 0, 0, 0, 0,
# 1, 0, 0, -1,
# -1, 0, 0, 1)))
```
```{r Q14b7d_contrast2}
# 2x2 Contrast statement (Freshmen vs. Seniors): Humanities vs. Business
# aov_employ %>%
# emmeans::emmeans(~ class*major) %>%
# emmeans::contrast(list("fr-sr X Hum-bus" = c(1, 0, 0, -1,
# -1, 0, 0, 1,
# 0, 0, 0, 0)))
```
```{r Q14b7d_contrast3}
# 2x2 Contrast statement (Freshmen vs. Seniors): Science vs. Business
# aov_employ %>%
# emmeans::emmeans(~ class*major) %>%
# emmeans::contrast(list("fr-sr X Hum-bus" = c(1, 0, 0, -1,
# 0, 0, 0, 0,
# -1, 0, 0, 1)))
```
\clearpage
### 14B-8a 2x2 Two-Way ANOVA
**TEXTBOOK QUESTION:** *The data from Exercise 12B8 for a fourgroup experiment on attitudes and memory are reproduced below. Considering the relationships among the four experimental conditions, it should be obvious that it makes sense to analyze these data with a two-way ANOVA. (A) Perform a two-way ANOVA and create a summary table of your results. (Note : You can use the summary table from Exercise 12B8 as the basis for a new table.)*
```{r Q14b8a_restructure}
# convert the dataset: wide --> long
data_memory_long <- data_memory %>%
tidyr::gather(key = warning_attitude,
value = recall) %>%
tidyr::separate(warning_attitude,
into = c("warning", "attitude"),
remove = FALSE) %>%
dplyr::mutate(id = row_number()) %>%
dplyr::mutate_at(vars(id, warning_attitude, warning, attitude), factor) %>%
dplyr::select(id, warning_attitude, warning, attitude, recall)
data_memory_long
```
\clearpage
**DIRECTIONS:** Fit a two-way ANOVA model for the difference in mean `recall` for each of the combinations between the two-level `warning` factor and two-level `attitude` factor with the `afex::aov_4()` function and save the results under the name `aov_memory_2way`.
```{r Q14b8a_aov}
# Two-way ANOVA: fit and save
```
-----------------
**DIRECTIONS:** Request the more complete summary table by adding `$Anova` at the end of the name you saved your fitted model as above.
```{r Q14b8a_aov_fuller}
# Display fuller ANOVA results (includes sum of squares)
```
\clearpage
### 14B-8b One-Way ANOVA: one 4 level factor
**TEXTBOOK QUESTION:** *(B) Compare your summary table to the one you produced for Exercise 12B8.*
> **NOTE:** We did not do Exercise 12B8, but we can do it here.
**DIRECTIONS:** Fit an one-way ANOVA model for the difference in mean `recall` for each of the four independent `warning_attitude` groups with the `afex::aov_4()` function and save the results under the name `aov_memory_1way`.
```{r Q14b8b_aov}
# One-way ANOVA: fit and save
```
-----------------------
**DIRECTIONS:** Request the more complete summary table by adding `$Anova` at the end of the name you saved your fitted model as above.
```{r Q14b8b_aov_fuller}
# Display fuller ANOVA results (includes sum of squares)
```
-----------------------
### 14B-8c Plot Means to Aid Interpretation
**TEXTBOOK QUESTION:** * (C) What conclusions can you draw from the two-way ANOVA?*
```{r Q14b8c_means, results="asis"}
data_memory_long %>%
dplyr::group_by(warning_attitude) %>%
furniture::table1(recall, # gives M(SD)
output = "markdown") # add chunk option: results="asis"
```
\clearpage
## Section C
### 14C-1a 5x2 ANOVA: Major and Gender on Math Quiz
**TEXTBOOK QUESTION:** *Using college major and gender as your independent variables, perform a two-way ANOVA on mathquiz . Request descriptive statistics and an HOV test. Calculate the ordinary eta squared for each factor, and report your results in APA style.*
**DIRECTIONS:** Fit a two-way ANOVA model for the difference in mean `mathquiz` for each of the combinations between the five-level `majorF` factor and two-level `genderF` factor with the `afex::aov_4()` function and save the results under the name `aov_math_2way`.
```{r Q14c1a_aov}
# Two-way ANOVA: fit and save
```
-----------------------------
**DIRECTIONS:** Request the more complete summary table by adding `$Anova` at the end of the name you saved your fitted model as above.
```{r Q14c1a_aov_fuller}
# Display fuller ANOVA results (includes sum of squares)
```
\clearpage
### 14C-1b Follow-up Comparisons: by major only
**TEXTBOOK QUESTION:** *Given the ANOVA results, perform an appropriate follow-up test. Explain your results in terms of the descriptive statistics.*
**DIRECTIONS:** Request the summary statistics for each group using the `table1()` function from the `furniture` package, after piping a `dplyr::group_by(group_var)` step.
```{r Q14c1b_summary, results='asis'}
# Raw data: summary table
```
-----------------------------------
**DIRECTIONS:** Fit an one-way ANOVA model using `afex::aov_4()`. Add on via pipes both `emmeans::emmeans(~ group_var)` and `pairs()`. Make sure to indicate `adjust = "tukey"` wintin the pairs command.
```{r Q14c1b_tukey}
# One-way ANOVA: fit and pairwise with Tukey's HSD
```
\clearpage
### 14C-4a 2x3 ANOVA: Phobia Group and Gender on Math Quiz
**TEXTBOOK QUESTION:** *Using the phobia grouping variable you cre4. ated for computer exercise 5 in Chapter 12 and gender as your IVs, perform a two-way ANOVA on mathquiz . Request the appropriate post hoc test and a plot of the cell means, and report the results in APA style.*
**DIRECTIONS:** Plot the raw data for each group using the `stat_summary()` layer in `ggplot(aes(x = group_var1, y = contin_var))`. Utilize the `shape` and `color` options for `group_var2`. Also consider dodging the position of the groups to avoid overplotting.
```{r Q14c4a_plot}
# Raw data: plot M(SD)
```
------------------------------------------
Here is the 2x3 grid of cell means, giving the average for each of the 6 combinations of `genderF` and `phob_group`.
```{r Q14c4a_means}
# Raw data: 2-way table of means (i.e. cell means)
data_new %>%
dplyr::group_by(phob_group, genderF) %>%
dplyr::summarise(mean = mean(mathquiz, na.rm = TRUE)) %>%
tidyr::spread(key = phob_group,
value = mean)
```
\clearpage
**DIRECTIONS:** Fit a two-way ANOVA model for the difference in mean `mathquiz` for each of the combinations between the five-level `phob_group` factor and two-level `genderF` factor with the `afex::aov_4()` function and save the results under the name `aov_math_phob_gender`.
```{r Q14c4a_aov}
# Two-way ANOVA: fit and save
```
-----------------
**DIRECTIONS:** Request the $F$ value by typing the name you saved your fitted model as above.
```{r Q14c4a_aov_basic}
# Display basic ANOVA results (includes effect size)
```
---------------------
**DIRECTIONS:** Request all pairwise post hoc comparisons via Fisher's LSD with the `adjust = "none"` option in the `pairs()` function, applied after piping a `emmeans(~ group_var)` step to the ANOVA model above.
```{r Q14c4a_fisherlsd}
# Pairwise post hoc: Fisher's LSD adjustment for multiple comparisons
```
\clearpage
### 14C-4b Repeat without the Moderate Group
**TEXTBOOK QUESTION:** *Repeat part a (except for the post hoc test) after deleting the moderate phobia group from the analysis. What type of interaction do you see in the plot? Test the simple main effect of phobia for each gender. Do you need to follow up any of the simple main effects with pairwise comparisons? Explain.*
**DIRECTIONS:** Repeat the previous ANOVA model, but preceed it by a `dplyr::filter(phob_group != "Moderate")` step in the pipeline and save the results under the name `aov_math_phob2_gender`.
```{r Q14c4b_aov}
# Two-way ANOVA: fit and save
```
--------------------------
**DIRECTIONS:** Request the $F$ value by typing the name you saved your fitted model as above.
```{r Q14c4b_aov_basic}
# Display basic ANOVA results (includes effect size)
```
\clearpage
### 14C-5a 2x3 ANOVA: Coffee Drinking and Phobia Group on Post Quiz Heart Rate
**TEXTBOOK QUESTION:** *Using the phobia grouping variable you created for computer exercise #5 in Chapter 12 (do not drop any phobia groups for this exercise) and coffee (regular coffee drinker or not) as your IVs, perform a two-way ANOVA on the postquiz heart rate. Request an HOV test, observed power, and a plot of the cell means. (A) Does the HOV test give you cause for concern? Explain the ANOVA results in terms of the plot you created.*
Here is the 2x3 grid of cell means, giving the average for each of the 6 combinations of `coffeeF` and `phob_group`.
```{r Q14c5a_means}
# Raw data: 2-way table of means (i.e. cell means)
data_new %>%
dplyr::group_by(coffeeF, phob_group) %>%
dplyr::summarise(mean = mean(hr_post)) %>%
tidyr::spread(key = phob_group,
value = mean)
```
------------------------
**DIRECTIONS:** Plot the raw data for each group using the `stat_summary()` layer in `ggplot(aes(x = group_var1, y = contin_var))`. Utilize the `shape` and `color` options for `group_var2`. Also consider dodging the position of the groups to avoid overplotting.
```{r Q14c5a_boxplots}
# Raw data: plot M(SD)'s
```
\clearpage
**DIRECTIONS:** Use the `car::leveneTest()` to test for violations of *HOV*. Since this is a two-way ANOVA situation, be sure to include the correct formula: `contin_var = group_var1*group_var2`.
```{r Q14c5a_hov}
# Levene's Test of HOV
```
---------------------
**DIRECTIONS:** Fit a two-way ANOVA model for the difference in mean `hr_post` for each of the combinations between the five-level `phob_group` factor and two-level `coffeeF` factor with the `afex::aov_4()` function and save the results under the name `aov_hrpost_phob_coffee`.
```{r Q14c5a_aov}
# Two-way ANOVA: fit and save
```
--------------------------
**DIRECTIONS:** Request the $F$ value by typing the name you saved your fitted model as above.
```{r Q14c5a_aov_basic}
# Display basic ANOVA results (includes effect size)
```
\clearpage
### 14C-5b Follow-up Comparisons
**TEXTBOOK QUESTION:** *Request an appropriate post hoc test to follow-up your ANOVA results, and report the results. Calculate the ordinary eta squared for each main effect; how large is each effect? ~~Does the observed power make sense in each case?~~*
**DIRECTIONS:** Request the more complete summary table by adding `$Anova` at the end of the name you saved your fitted model as above.
```{r Q14c5b_aov_fuller}
# Display fuller ANOVA results (includes sum of squares)
```
-----------------------
**DIRECTIONS:** Request all pairwise post hoc comparisons with the `pairs()` function, applied after piping a `emmeans(~ group_var)` step to the ANOVA model. Only do this for significant main effects with at least three factor levels.
```{r}
```
**Do NOT worry about observed power!**