class: center, middle, inverse, title-slide # Mediation Analysis ## 🤗 ### Tyson S. Barrett
StatStudio
Fall 2017 --- # What is Real? .huge[ [https://www.youtube.com/watch?v=ym6NEuUYHuE](https://www.youtube.com/watch?v=ym6NEuUYHuE) ] --- class: inverse # What is mediation analysis? # How do we interpret it? # Current Issues (and Solutions) # How do we use it? (in R) --- background-image: url(BaseMediation.png) background-size: contain background-repeat: no-repeat # What is Mediation Analysis? .footnote[ X = predictor, independent variable, exogenous variable a = path from X to M <br> M = mediator, intermediate variable, endogenous variable b = path from M to Y (controlling for X) <br> Y = outcome, endogenous variable c' = path from X to Y (controlling for M) ] --- # Different than Moderation (Interactions) .pull-left[ ## Mediation Analysis Model The effect of X is transmitted through M. ![](BaseMediation.png) ] -- .pull-right[ ## Moderation (Interaction) The effect of X depends on the level of Moderator. ![](fig_moderation.jpg) ] --- # Mediation Analysis? .pull-left[ .large[.large[ - Built on .dcoral[theory, prior literature, and other observations] - Has similar .nicegreen[assumptions to regression] - Built up of .bluer[2+ regression models] - Can combine with moderation ]]] -- .pull-right[ .large[.large[ - Helps explain .dcoral[how] X affects Y - Provides additional .bluer[targets of intervention] - Helps explain strange results (conflicting results) - Provides a more .nicegreen[holistic view] of the relationships ]]] --- # Built on Theory .huge[Why even mention it?] .large[.large[ Mediation analysis has similar .dcoral[assumptions] to regression but they are somewhat more pronounced in mediation ]] -- .pull-left[ .large[ - Well-behaved residuals (normality and homoskedasticity) - .bluer[*No omitted influences*] ]] .pull-right[ .large[ - .nicegreen[*No measurement error*] - Correct functional form ]] -- .large[.large[ These are (often) difficult to assess and correct ]] --- # No Omitted Influences .huge[ .dcoral[Two Main Principles:] ] .large[.large[ 1. If something is .nicegreen[related to X (or M) and Y], it needs to be in the .nicegreen[path b and c' model] 2. If something is .bluer[related to X and M], it needs to be in the .bluer[path a model] ]] -- .large[ Example: If we are assessing religiosity (X) and heavy drinking behavior (Y), what are some variables that should be included? ] --- # Slight Caveat .huge[ .dcoral[*If* X is randomized] (e.g., treatment or control), then statistical theory says no other variables are related to X. ] .huge[ But .bluer[we cannot randomize M] (at least in a single study) so even if we can get a causal relationship from X to M and X to Y, we cannot obtain causal M to Y. ] --- # No Measurement Error .huge[ Measurement error is always a problem (unless we use latent variable methods): ] .large[.large[ - But can be .dcoral[more pronounced] in mediation analysis - If M has measurement error, it not only messes with M's estimate but also X's estimate - Difficult to know in many situations how extensive measurement error is ]] ??? The other assumptions are all more like that of regression --- # Quick Review of Linear Regression <img src="MediationAnalysis_files/figure-html/unnamed-chunk-1-1.png" style="display: block; margin: auto;" /> --- # Mediation is 2+ Regression Models .huge[Mediation uses a .dcoral[series of regressions] and .nicegreen[combines results] to draw conclusions about the overall model] ![](BaseMediation.png) --- class: inverse, center, middle # Break Time ### Take a short break but be thinking about mediation models you have seen in your field --- # Mediation Frameworks .pull-left[ .huge[ 1. Ordinary Least Squares (OLS) and Generalized Linear Models (GLM) Regression 2. Structural Equation Modeling (SEM) ] ] .pull-right[ ![](fig_perspectives.jpg) ] .footnote[These are very related, but distinct, approaches] --- # Two Frameworks .pull-left[ ## .dcoral[OLS/GLM Regression] .large[ - Multiple regressions, fit separately and then combined - Provides great flexibility (assumptions are lighter) - Provides model fit for each sub-model but not the entire model - Continuous, binary, categorical, count, proportion, and other variable types ]] -- .pull-right[ ## .nicegreen[SEM] .large[ - Multiple regressions fit simultaneously - More restrictive assumptions - Provides more information regarding overall model fit - Mostly continuous variables (can handle binary, ordinal in some cases) ]] ??? Here are two examples of OLS/GLM fitted mediation models --- background-image: url(fig_application_model.jpg) background-size: contain background-repeat: no-repeat ??? There are many, many others using SEM or Regression frameworks. --- class: inverse, middle, center # Interpretation of Mediation --- # Interpretation of Mediation .huge[Mediation models provides lots of information:] .large[.large[ 1. Individual path estimates 2. Indirect Effect estimates 3. Direct Effect estimates 4. Total Effect estimates ]] --- # Interpretation of Mediation .huge[Mediation models provides lots of information:]
--- # Complete or Partial Mediation? .huge[ Many resources suggest ways of looking at this ] .large[.large[ - I recommend not focusing on this but feel free to check - Its based on whether `c'` is significant or not (while `a * b` is significant) I think it paints an .dcoral[incomplete picture] of the model because: ]] .large[ 1. It only focuses on significance, not effect size 2. To really make this conclusion, we need really large sample sizes 3. It is almost always "partial" mediation ] ??? A better approach is looking at the effect sizes -- how big is the indirect effect size compared to the direct or total effect sizes? --- # Continuous Mediators and Outcomes .huge[ When the .nicegreen[mediator and outcome are both continuous] (an roughly normal), interpretation is straightforward ] .large[ 1. Paths are in terms of the corresponding endogeous variable's units 2. Indirect effects are in the outcome's units 3. Direct effects are in the outcome's units 4. Total effect is in the outcome's units ] -- .large[ Example: If X is continuous, the indirect effect is 2.5, and the outcome is in quality of life rating, what is the interpretation? ] -- .large[.large[ .dcoral[But what if mediator(s) and/or outcome(s) are categorical?] ]] --- class: inverse, center, middle # Mediation Analysis with Categorical Variables --- # .coral[Generalized] Linear Models .large[.large[These **generalize** the regression framework to more data situations.]] -- .large[.large[ To do so: 1. Can use a different **distribution** 📊 2. Uses a **link** function ⛓ ]] .large[Examples: Logistic Regression, Poisson Regression] --- background-image: url(fig_binarymediator.jpg) background-size: contain background-repeat: no-repeat # Use GLMs with Mediation Analysis ??? ### But... This presents a new challenge in interpreting the results --- # Interpretation with Categorical Mediator/Outcome .large[.large[ A few options: 1. Intepret the .nicegreen[individual pathways and note the percent of mediation]. This approach is commonly used in the literature. 2. New: .dcoral[Marginal Mediation Analysis]. Is being prepared right now (shows serious promise for these situations). ]] --- ### Interpret Individual Pathways .large[Three Steps: 1. Fit individual GLM regressions for all pathways (`a`, `b`, `c'` and `c`) 2. Discuss basic effect size information for each pathway 3. Evaluate the change from `c` to `c'` as a proportion of `c` -- `\(\frac{c-c'}{c}\)`. This is a representation of how much of the total effect is mediated. ] -- ### Marginal Mediation Analysis .large[ - Uses Average Marginal Effects - Interpretation and steps for use are exactly like mediation with continuous mediators/outcomes (can interpet individual paths, indirect and direct effect sizes) - Uses bootstrapping to get confidence intervals (recommended in most situations) ] --- class: inverse, middle, center # How to use it? --- # Before Talking About Syntax .huge[ I recommend two books to get more information about mediation topics 1. .dcoral[Statistical Mediation Analysis] by MacKinnon 2. .dcoral[Introduction to Mediation, Moderation, and Conditional Process Analysis] by Hayes ] --- class: inverse, middle, center ## Break Time ### If you do not care about learning how to do these analyses in R then feel free to take off (thanks for attending 😄) --- # Mediation Analysis in `R` .large[ .large[If you are not an `R` user you can ignore the syntax .nicegreen[but pay attention to the logic of it]] .large[We'll use a fake data set about two popular TV shows--The Office and Parks and Recreation.] ] <br> .coral[.large[Note: We'll be ignoring some assumptions (like the fact the data are nested).]] --- # Dataset
--- # Start with .coral[Cross-Tabulations] .large[.large[ Check for small cells, understand missingness ]] ``` ───────────────────────────────────────────────────── SubsUse No Yes P-Value n = 25 n = 7 ------------------- ----------- ----------- ------- Income 51.8 (16.0) 32.1 (17.5) 0.008 Productivity 3.5 (1.2) 1.6 (0.8) <.001 Physical_Health 5.4 (2.1) 4.0 (2.2) 0.145 Married: Yes 8 (32%) 1 (14.3%) 0.656 Race 0.558 White 20 (80%) 6 (85.7%) Black 2 (8%) 0 (0%) Mexican American 1 (4%) 1 (14.3%) Indian 2 (8%) 0 (0%) ───────────────────────────────────────────────────── ``` --- # And with .coral[Correlations] .large[.large[ Check for high correlations (can cause multi-collinearity problems) ]] ``` ────────────────────────────────────────────────────── [1] [2] [3] [1]Income 1.00 [2]Productivity 0.573 (<.001) 1.00 [3]Physical_Health 0.609 (<.001) 0.516 (0.002) 1.00 ────────────────────────────────────────────────────── ``` --- # SEM Framework .large[ ```r library(lavaan) model = " prod1 ~ a*subs inco ~ b*prod1 + c1*subs ind := a * b dir := c1 tot := a * b + c1" fit_sem = sem(model, data = df) parameterEstimates(fit_sem) fitMeasures(fit_sem) ``` ] --- ``` Parameter Estimates ``` ``` lhs op rhs label est se z pvalue ci.lower ci.upper 1 prod1 ~ subs a -2.005 0.457 -4.386 0.000 -2.902 -1.109 2 inco ~ prod1 b 6.035 2.304 2.620 0.009 1.520 10.550 3 inco ~ subs c1 -7.869 7.614 -1.034 0.301 -22.791 7.053 4 prod1 ~~ prod1 1.153 0.284 4.062 0.000 0.597 1.710 5 inco ~~ inco 201.978 49.723 4.062 0.000 104.522 299.434 6 subs ~~ subs 0.167 0.000 NA NA 0.167 0.167 7 ind := a*b ind -12.103 5.382 -2.249 0.025 -22.651 -1.556 8 dir := c1 dir -7.869 7.614 -1.034 0.301 -22.791 7.053 9 tot := a*b+c1 tot -19.973 6.651 -3.003 0.003 -33.009 -6.936 ``` ``` Fit Statistics ``` ``` npar fmin chisq df 5.000 0.000 0.000 0.000 pvalue baseline.chisq baseline.df baseline.pvalue NA 29.361 3.000 0.000 cfi tli nnfi rfi 1.000 1.000 1.000 1.000 nfi pnfi ifi rni 1.000 0.000 1.000 1.000 logl unrestricted.logl aic bic -183.589 -183.589 377.177 384.660 ntotal bic2 rmsea rmsea.ci.lower 33.000 369.064 0.000 0.000 rmsea.ci.upper rmsea.pvalue rmr rmr_nomean 0.000 NA 0.000 0.000 srmr srmr_bentler srmr_bentler_nomean crmr 0.000 0.000 0.000 0.000 crmr_nomean srmr_mplus srmr_mplus_nomean cn_05 0.000 0.000 0.000 1.000 cn_01 gfi agfi pgfi 1.000 1.000 1.000 0.000 mfi ecvi 1.000 0.303 ``` --- # OLS Framework (Using Marginal Mediation Analysis) .large[ ```r library(MarginalMediation) patha = glm(prod1 ~ subs, data = df) pathbc = glm(inco ~ prod1 + subs, data = df) mma(pathbc, patha, ind_effects = c("subs-prod1"), boot = 500) ``` ] --- ``` calculating a paths... b and c paths... Done. ``` ``` ┌───────────────────────────────┐ │ Marginal Mediation Analysis │ └───────────────────────────────┘ A marginal mediation model with: 1 mediators 1 indirect effects 1 direct effects 500 bootstrapped samples 95% confidence interval n = 33 Formulas: ◌ inco ~ prod1 + subs ◌ prod1 ~ subs Regression Models: inco ~ Est SE Est/SE P-Value (Intercept) 30.52837 9.12314 3.34626 0.00221 prod1 6.03508 2.41608 2.49788 0.01821 subs -7.86921 7.98516 -0.98548 0.33227 prod1 ~ Est SE Est/SE P-Value (Intercept) 3.57692 0.21730 16.46039 0.00000 subs -2.00549 0.47182 -4.25054 0.00018 Unstandardized Mediated Effects: Indirect Effects: inco ~ Indirect Lower Upper subs => prod1 -12.10332 -24.32765 -1.15705 Direct Effects: inco ~ Direct Lower Upper subs -7.86921 -26.49848 6.8156 Standardized Mediated Effects: Indirect Effects: inco ~ Indirect Lower Upper subs => prod1 -0.67622 -1.35919 -0.06464 Direct Effects: inco ~ Direct Lower Upper subs -0.43965 -1.48048 0.38079 ``` --- class: inverse, middle, center # Some Final Considerations --- # Diagnostics .huge[Depends on type of model used but the basics:] .large[.large[ - Model fit (BIC, Chi-Square, R-Squared) - Multi-collinearity - Prediction Accuracy ]] --- class: inverse, middle, center # Questions?