Mediation Analysis

class: center, middle, inverse, title-slide

# Mediation Analysis
## 🤗
### Tyson S. Barrett <br> StatStudio <br> <br> Fall 2017

---

# What is Real?

.huge[
[https://www.youtube.com/watch?v=ym6NEuUYHuE](https://www.youtube.com/watch?v=ym6NEuUYHuE)
]

---
class: inverse

# What is mediation analysis?
# How do we interpret it?
# Current Issues (and Solutions)
# How do we use it? (in R)

---
background-image: url(BaseMediation.png)
background-size: contain
background-repeat: no-repeat

# What is Mediation Analysis?

.footnote[
X = predictor, independent variable, exogenous variable &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; a = path from X to M    <br> 
M = mediator, intermediate variable, endogenous variable &nbsp; &nbsp;&nbsp;&nbsp; b = path from M to Y (controlling for X) <br> 
Y = outcome, endogenous variable &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; c' = path from X to Y (controlling for M)
]

---
# Different than Moderation (Interactions)

.pull-left[
## Mediation Analysis Model

The effect of X is transmitted through M.
![](BaseMediation.png)
]

.pull-right[
## Moderation (Interaction)

The effect of X depends on the level of Moderator.
![](fig_moderation.jpg)
]

---
# Mediation Analysis?

.pull-left[
.large[.large[
- Built on .dcoral[theory, prior literature, and other observations]
- Has similar .nicegreen[assumptions to regression]
- Built up of .bluer[2+ regression models]
- Can combine with moderation
]]]

.pull-right[
.large[.large[
- Helps explain .dcoral[how] X affects Y
- Provides additional .bluer[targets of intervention]
- Helps explain strange results (conflicting results)
- Provides a more .nicegreen[holistic view] of the relationships
]]]

---
# Built on Theory

.huge[Why even mention it?]

.large[.large[
Mediation analysis has similar .dcoral[assumptions] to regression but they are somewhat more pronounced in mediation
]]

.pull-left[
.large[
- Well-behaved residuals (normality and homoskedasticity)
- .bluer[*No omitted influences*]
]]

.pull-right[
.large[
- .nicegreen[*No measurement error*]
- Correct functional form 
]]

.large[.large[
These are (often) difficult to assess and correct
]]

---
# No Omitted Influences

.huge[
.dcoral[Two Main Principles:]
]

.large[.large[
1. If something is .nicegreen[related to X (or M) and Y], it needs to be in the .nicegreen[path b and c' model]
2. If something is .bluer[related to X and M], it needs to be in the .bluer[path a model]
]]

.large[
Example: If we are assessing religiosity (X) and heavy drinking behavior (Y), what are some variables that should be included?
]

---
# Slight Caveat

.huge[
.dcoral[*If* X is randomized] (e.g., treatment or control), then statistical theory says no other variables are related to X.
]

.huge[
But .bluer[we cannot randomize M] (at least in a single study) so even if we can get a causal relationship from X to M and X to Y, we cannot obtain causal M to Y.
]

---
# No Measurement Error

.huge[
Measurement error is always a problem (unless we use latent variable methods):
]
.large[.large[
- But can be .dcoral[more pronounced] in mediation analysis

- If M has measurement error, it not only messes with M's estimate but also X's estimate

- Difficult to know in many situations how extensive measurement error is
]]

???
The other assumptions are all more like that of regression

---
# Quick Review of Linear Regression

---
# Mediation is 2+ Regression Models

.huge[Mediation uses a .dcoral[series of regressions] and .nicegreen[combines results] to draw conclusions about the overall model]

![](BaseMediation.png)

---
class: inverse, center, middle

# Break Time
### Take a short break but be thinking about mediation models you have seen in your field

---
# Mediation Frameworks

.pull-left[
.huge[
1. Ordinary Least Squares (OLS) and Generalized Linear Models (GLM) Regression

2. Structural Equation Modeling (SEM)
]
]

.pull-right[
![](fig_perspectives.jpg)
]

.footnote[These are very related, but distinct, approaches]

---
# Two Frameworks

.pull-left[
## .dcoral[OLS/GLM Regression]

.large[
- Multiple regressions, fit separately and then combined
- Provides great flexibility (assumptions are lighter)
- Provides model fit for each sub-model but not the entire model
- Continuous, binary, categorical, count, proportion, and other variable types
]]

.pull-right[
## .nicegreen[SEM]

.large[
- Multiple regressions fit simultaneously
- More restrictive assumptions
- Provides more information regarding overall model fit
- Mostly continuous variables (can handle binary, ordinal in some cases)
]]

???
Here are two examples of OLS/GLM fitted mediation models

---
background-image: url(fig_application_model.jpg)
background-size: contain
background-repeat: no-repeat

???
There are many, many others using SEM or Regression frameworks.

---
class: inverse, middle, center
# Interpretation of Mediation

---
# Interpretation of Mediation

.huge[Mediation models provides lots of information:]

.large[.large[
1. Individual path estimates
2. Indirect Effect estimates
3. Direct Effect estimates
4. Total Effect estimates
]]

---
# Interpretation of Mediation

.huge[Mediation models provides lots of information:]

<div id="htmlwidget-667fe88c15af8698569f" style="width:100%;height:auto;" class="datatables html-widget"></div>
<script type="application/json" data-for="htmlwidget-667fe88c15af8698569f">{"x":{"filter":"none","data":[["1","2","3","4"],["Individual paths","Indirect Effect","Direct Effect","Total Effect"],["a, b, and c' paths","a path estimates * b path estimates","c' estimate","a * b + c'"]],"container":"<table class=\"display\">\n  <thead>\n    <tr>\n      <th> <\/th>\n      <th>Estimate<\/th>\n      <th>What<\/th>\n    <\/tr>\n  <\/thead>\n<\/table>","options":{"dom":"d","order":[],"autoWidth":false,"orderClasses":false,"columnDefs":[{"orderable":false,"targets":0}]}},"evals":[],"jsHooks":[]}</script>

---
# Complete or Partial Mediation?

.huge[
Many resources suggest ways of looking at this
]
.large[.large[
- I recommend not focusing on this but feel free to check
- Its based on whether `c'` is significant or not (while `a * b` is significant)

I think it paints an .dcoral[incomplete picture] of the model because:
]]
.large[
1. It only focuses on significance, not effect size
2. To really make this conclusion, we need really large sample sizes
3. It is almost always "partial" mediation
]

???
A better approach is looking at the effect sizes -- how big is the indirect effect size compared to the direct or total effect sizes?

---
# Continuous Mediators and Outcomes

.huge[
When the .nicegreen[mediator and outcome are both continuous] (an roughly normal), interpretation is straightforward
]

.large[
1. Paths are in terms of the corresponding endogeous variable's units
2. Indirect effects are in the outcome's units
3. Direct effects are in the outcome's units
4. Total effect is in the outcome's units
]

--
.large[
Example: If X is continuous, the indirect effect is 2.5, and the outcome is in quality of life rating, what is the interpretation?
]

.large[.large[
.dcoral[But what if mediator(s) and/or outcome(s) are categorical?]
]]

---
class: inverse, center, middle

# Mediation Analysis with Categorical Variables

---
# .coral[Generalized] Linear Models

.large[.large[These **generalize** the regression framework to more data situations.]]

.large[.large[
To do so:

1. Can use a different **distribution** 📊

2. Uses a **link** function ⛓
]]

.large[Examples: Logistic Regression, Poisson Regression]

---
background-image: url(fig_binarymediator.jpg)
background-size: contain
background-repeat: no-repeat

# Use GLMs with Mediation Analysis

???
### But...
This presents a new challenge in interpreting the results

---
# Interpretation with Categorical Mediator/Outcome

.large[.large[
A few options:

1. Intepret the .nicegreen[individual pathways and note the percent of mediation]. This approach is commonly used in the literature.

2. New: .dcoral[Marginal Mediation Analysis]. Is being prepared right now (shows serious promise for these situations).
]]

---
### Interpret Individual Pathways

.large[Three Steps:
1. Fit individual GLM regressions for all pathways (`a`, `b`, `c'` and `c`)
2. Discuss basic effect size information for each pathway
3. Evaluate the change from `c` to `c'` as a proportion of `c` -- `\(\frac{c-c'}{c}\)`. This is a representation of how much of the total effect is mediated.
]

### Marginal Mediation Analysis

.large[
- Uses Average Marginal Effects
- Interpretation and steps for use are exactly like mediation with continuous mediators/outcomes (can interpet individual paths, indirect and direct effect sizes)
- Uses bootstrapping to get confidence intervals (recommended in most situations)
]

---
class: inverse, middle, center

# How to use it?

---
# Before Talking About Syntax

.huge[
I recommend two books to get more information about mediation topics

1. .dcoral[Statistical Mediation Analysis] by MacKinnon

2. .dcoral[Introduction to Mediation, Moderation, and Conditional Process Analysis] by Hayes
]

---
class: inverse, middle, center

## Break Time
### If you do not care about learning how to do these analyses in R then feel free to take off (thanks for attending 😄)

---
# Mediation Analysis in `R`

.large[
.large[If you are not an `R` user you can ignore the syntax .nicegreen[but pay attention to the logic of it]]

.large[We'll use a fake data set about two popular TV shows--The Office and Parks and Recreation.]
]

<br>

.coral[.large[Note: We'll be ignoring some assumptions (like the fact the data are nested).]]

---
# Dataset

<div id="htmlwidget-108e0224b087a9435555" style="width:100%;height:auto;" class="datatables html-widget"></div>
<script type="application/json" data-for="htmlwidget-108e0224b087a9435555">{"x":{"filter":"none","data":[["1","2","3","4","5","6","7","8","9","10","11","12","13","14","15","16","17","18","19","20","21","22","23","24","25","26","27","28","29","30","31","32","33"],["Michael","Pam","Jim","Dwight","Stanley","Phyllis","Creed","Meredith","Oscar","Angela","Kevin","Kelley","Ryan","Toby","Andy","Jan","April","Andy","Leslie","Ron","Tom","Donna","Ben","Chris","Gary (Larry, Jerry)","Jean Ralphio","Mona Lisa","Ann","Kyle","Shauna Molwaytweep","Ethel","Councilman Howser","Tammy II"],[2,3,3,5,4,4,1,3,5,4,2,3,2,4,3,4,1,1,5,3,2,2,5,4,3,1,1,5,3,4,2,5,5],[3,8,8,6,7,8,2,5,7,5,6,5,2,1,5,6,6,2,8,8,5,7,8,6,5,1,1,8,5,6,5,6,5],[8,7,8,8,4,4,4,4,7,7,2,5,5,6,7,6,4,2,7,7,5,6,5,8,3,2,1,8,2,5,2,6,3],[0,1,1,0,1,1,0,0,0,0,0,0,0,0,0,1,1,1,0,0,0,0,0,0,1,0,0,0,1,0,0,null,0],[0,1,0,0,0,1,0,1,0,1,0,1,0,0,0,1,1,0,1,0,0,1,0,0,0,0,1,1,1,0,0,1,0],["White","White","White","White","Black","White","White","White","Mexican American","White","White","Indian","White","White","White","White","Mexican American","White","White","White","Indian","Black","White","White","White","White","White","White","White","White","White","Black","White"],[55,35,70,70,70,70,45,40,50,50,45,40,40,60,60,80,25,15,45,55,35,70,65,70,40,10,10,40,35,45,40,60,40],[0,2,2,0,1,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,null,0,0,0,0],[1,0,0,0,0,0,1,1,0,0,0,0,1,0,0,0,1,0,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0],[1,1,1,1,1,1,1,2,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,2,2,1,1,1,1,1,1],[1,1,1,0,0,1,0,0,0,0,0,0,0,0,0,0,0,1,0,1,0,0,0,1,0,0,0,0,0,0,0,0,0]],"container":"<table class=\"display\">\n  <thead>\n    <tr>\n      <th> <\/th>\n      <th>nam<\/th>\n      <th>prod1<\/th>\n      <th>ment1<\/th>\n      <th>phys<\/th>\n      <th>marr<\/th>\n      <th>gend<\/th>\n      <th>race<\/th>\n      <th>inco<\/th>\n      <th>chil<\/th>\n      <th>subs<\/th>\n      <th>alco<\/th>\n      <th>spor<\/th>\n    <\/tr>\n  <\/thead>\n<\/table>","options":{"pageLength":5,"columnDefs":[{"className":"dt-right","targets":[2,3,4,5,6,8,9,10,11,12]},{"orderable":false,"targets":0}],"order":[],"autoWidth":false,"orderClasses":false,"lengthMenu":[5,10,25,50,100]}},"evals":[],"jsHooks":[]}</script>

---
# Start with .coral[Cross-Tabulations]

.large[.large[
Check for small cells, understand missingness
]]

```
                      
                      ─────────────────────────────────────────────────────
                                                 SubsUse 
                                           No          Yes         P-Value
                                           n = 25      n = 7              
                       ------------------- ----------- ----------- -------
                       Income              51.8 (16.0) 32.1 (17.5) 0.008  
                       Productivity        3.5 (1.2)   1.6 (0.8)   <.001  
                       Physical_Health     5.4 (2.1)   4.0 (2.2)   0.145  
                       Married: Yes        8 (32%)     1 (14.3%)   0.656  
                       Race                                        0.558  
                          White            20 (80%)    6 (85.7%)          
                          Black            2 (8%)      0 (0%)             
                          Mexican American 1 (4%)      1 (14.3%)          
                          Indian           2 (8%)      0 (0%)             
                      ─────────────────────────────────────────────────────
```

---
# And with .coral[Correlations]

.large[.large[
Check for high correlations (can cause multi-collinearity problems)
]]

```
                      
                      ──────────────────────────────────────────────────────
                                          [1]           [2]           [3]  
                       [1]Income          1.00                             
                       [2]Productivity    0.573 (<.001) 1.00               
                       [3]Physical_Health 0.609 (<.001) 0.516 (0.002) 1.00 
                      ──────────────────────────────────────────────────────
```

---
# SEM Framework

.large[

```r
library(lavaan)

model = "
prod1 ~ a*subs
inco ~ b*prod1 + c1*subs

ind := a * b
dir := c1
tot := a * b + c1"

fit_sem = sem(model, data = df)
parameterEstimates(fit_sem)
fitMeasures(fit_sem)
```
]

---

```
        Parameter Estimates
```

```
            lhs op    rhs label     est     se      z pvalue ci.lower ci.upper
        1 prod1  ~   subs     a  -2.005  0.457 -4.386  0.000   -2.902   -1.109
        2  inco  ~  prod1     b   6.035  2.304  2.620  0.009    1.520   10.550
        3  inco  ~   subs    c1  -7.869  7.614 -1.034  0.301  -22.791    7.053
        4 prod1 ~~  prod1         1.153  0.284  4.062  0.000    0.597    1.710
        5  inco ~~   inco       201.978 49.723  4.062  0.000  104.522  299.434
        6  subs ~~   subs         0.167  0.000     NA     NA    0.167    0.167
        7   ind :=    a*b   ind -12.103  5.382 -2.249  0.025  -22.651   -1.556
        8   dir :=     c1   dir  -7.869  7.614 -1.034  0.301  -22.791    7.053
        9   tot := a*b+c1   tot -19.973  6.651 -3.003  0.003  -33.009   -6.936
```

```
        Fit Statistics
```

```
                       npar                fmin               chisq                  df 
                      5.000               0.000               0.000               0.000 
                     pvalue      baseline.chisq         baseline.df     baseline.pvalue 
                         NA              29.361               3.000               0.000 
                        cfi                 tli                nnfi                 rfi 
                      1.000               1.000               1.000               1.000 
                        nfi                pnfi                 ifi                 rni 
                      1.000               0.000               1.000               1.000 
                       logl   unrestricted.logl                 aic                 bic 
                   -183.589            -183.589             377.177             384.660 
                     ntotal                bic2               rmsea      rmsea.ci.lower 
                     33.000             369.064               0.000               0.000 
             rmsea.ci.upper        rmsea.pvalue                 rmr          rmr_nomean 
                      0.000                  NA               0.000               0.000 
                       srmr        srmr_bentler srmr_bentler_nomean                crmr 
                      0.000               0.000               0.000               0.000 
                crmr_nomean          srmr_mplus   srmr_mplus_nomean               cn_05 
                      0.000               0.000               0.000               1.000 
                      cn_01                 gfi                agfi                pgfi 
                      1.000               1.000               1.000               0.000 
                        mfi                ecvi 
                      1.000               0.303
```

---
# OLS Framework (Using Marginal Mediation Analysis)

.large[

```r
library(MarginalMediation)

patha  = glm(prod1 ~ subs, data = df)
pathbc = glm(inco ~ prod1 + subs, data = df)

mma(pathbc, patha,
    ind_effects = c("subs-prod1"),
    boot = 500)
```
]

---

```
      
      calculating a paths... b and c paths... Done.
                                                                                 
```

```
      ┌───────────────────────────────┐
      │  Marginal Mediation Analysis  │
      └───────────────────────────────┘
      A marginal mediation model with:
         1 mediators
         1 indirect effects
         1 direct effects
         500 bootstrapped samples
         95% confidence interval
         n = 33 
      
      Formulas:
         ◌ inco ~ prod1 + subs
         ◌ prod1 ~ subs 
      
      Regression Models: 
      
           inco ~ 
                               Est      SE   Est/SE P-Value
              (Intercept) 30.52837 9.12314  3.34626 0.00221
              prod1        6.03508 2.41608  2.49788 0.01821
              subs        -7.86921 7.98516 -0.98548 0.33227
      
           prod1 ~ 
                               Est      SE   Est/SE P-Value
              (Intercept)  3.57692 0.21730 16.46039 0.00000
              subs        -2.00549 0.47182 -4.25054 0.00018
      
      Unstandardized Mediated Effects: 
      
         Indirect Effects: 
      
           inco ~ 
                             Indirect     Lower    Upper
              subs => prod1 -12.10332 -24.32765 -1.15705
      
         Direct Effects: 
      
           inco ~ 
                     Direct     Lower  Upper
              subs -7.86921 -26.49848 6.8156
      
      
      Standardized Mediated Effects: 
      
         Indirect Effects: 
      
           inco ~ 
                            Indirect    Lower    Upper
              subs => prod1 -0.67622 -1.35919 -0.06464
      
         Direct Effects: 
      
           inco ~ 
                     Direct    Lower   Upper
              subs -0.43965 -1.48048 0.38079
```

---
class: inverse, middle, center

# Some Final Considerations

---
# Diagnostics

.huge[Depends on type of model used but the basics:]

.large[.large[
- Model fit (BIC, Chi-Square, R-Squared)
- Multi-collinearity
- Prediction Accuracy
]]

---
class: inverse, middle, center

# Questions?