class: center, middle, inverse, title-slide --- class: inverse, middle, center # Any fool can know. The point is to understand. ### -- Albert Einstein -- --- class: inverse, middle, center # The problem with quotes found on the internet is that they are often not true. ### -- Abraham Lincoln -- --- background-image: url(fig_hds.jpg) background-position: 50% 80% background-size: 500px # Health Data Science --- # Health Data Science .pull-left[ .huge[ - .nicegreen[Pragmatic solutions] to research problems - Involves both .dcoral[methods and software] ]] .pull-right[ .huge[ - Focus is on .bluer[reproducibility and interpretability] - Uses interdisciplinary approaches to .nicegreen[extract actionable information] ]] --- class: inverse, middle, center # What I've Done So Far <br><br> in Health Data Science --- # What I've Done So Far <br> .pull-left[.huge[ Provided .nicegreen[methods and tools] for prevention researchers ]] -- .pull-right[.huge[ Applied these tools to .dcoral[adolescent health research] ]] --- class: inverse, center, middle # Providing Methods and Tools --- # Providing Methods and Tools .huge[Two major aspects:] .dcoral[ .pull-left[.center[ ### Developing New Methods/Approaches .large[Methods to make research more reproducible and interpretable] ]]] -- .nicegreen[ .pull-right[.center[ ### Creating Software in `R` .pull-left[.large[ Provides the ability to use new methods ]] .pull-right[.large[ Makes understanding data easier and less error-prone ]]]]] --- # Developing New Methods/Approaches -- .Huge[.dcoral[**Marginal Mediation Analysis**]] <br> -- .large[.large[ This is a synthesis of two powerful approaches: 1. .bluer[Mediation Analysis] 2. .nicegreen[Average Marginal Effects] ]] --- # What is Mediation Analysis .pull-left[ .huge[ A series of regressions:] $$ Y = c_0 + b_1 M + c'_1 X + e_1 $$ $$ M = a_0 + a_1 X + e_2 $$ .center[.huge[.huge[👉]]] ] .pull-right[ <img src="index_files/figure-html/unnamed-chunk-1-1.png" style="display: block; margin: auto;" /> ] -- $$ \text{Indirect Effect} = a \times b $$ -- $$ \text{Direct Effect} = c' $$ -- $$ \text{Total Effect} = a \times b + c' = c $$ --- # But... .large[.large[ Mediator or outcome is **categorical/non-normal**: - Lacks intuitive interpretation (e.g., total effect doesn't equal total effect?) - No defined effect sizes (indirect or total) or meaningful confidence intervals ]] --- count: false # But... .large[.large[ Mediator or outcome is **categorical/non-normal**:]]  ??? - Mediation struggles with categorical/non-normal mediators/outcomes - For example, - if X is continuous, M is binary, and Y is continuous the diagram shows how we would often model each path - So, how do you create a meaningful effect size when trying to combine logistic and linear regression? - Ultimately, to have any meaningful interpretation, **need additive** So its not good. But to reiterate, consider the next slide. --- # Why the big fuss? .pull-left[ #### Linear Model $$ Y = \beta_0 + \sum_j^p \beta_j X_j + e_i $$ .large[The marginal effect of, say,] `\(X_1\)` .large[is:] $$ \frac{\delta Y}{\delta X_1} = \beta_1 $$ ] -- .pull-right[ #### Logistic Regression $$ logit(Y) = \beta_0 + \sum_j^p \beta_j X_j + e_i $$ $$ Prob(Y = 1) = \frac{e^{\beta_0 + \sum_j^p \beta_j X_j + e_i}}{1 + e^{\beta_0 + \sum_j^p \beta_j X_j + e_i}} $$ .large[The marginal effect of, say,] `\(X_1\)` .large[is:] $$ \frac{\delta Y}{\delta X_1} = \frac{e^{\beta_0 + \sum_j^p \beta_j X_j + e_i}}{(1 + e^{\beta_0 + \sum_j^p \beta_j X_j + e_i})^2} $$ ] ??? - Consider the *marginal effect* -- the thing you want in regression -- in two situations (**feel free to ignore the math**) - Linear - Logistic - In logistic the marginal effect depends on the values of all the predictors - and it is just plain ugly compared to linear regression The takeaway --> Marginal Effect depends on all covariates Ultimately, mediation doesn't work well with categorical mediators/outcomes. But people are using those, right? So what are the common approaches? --- # Current Approaches
--- count: false # Current Approaches
--- count: false # Current Approaches
--- count: false # Current Approaches
--- # A Better Approach ## Use Average Marginal Effects .huge[.dcoral[Background:]] $$ \text{Marginal Effect (Logistic)} = \frac{e^{\beta_0 + \sum_j^p \beta_j X_j + e_i}}{(1 + e^{\beta_0 + \sum_j^p \beta_j X_j + e_i})^2} $$ -- .large[ 1. Each individual has a "marginal effect" 2. Take the *average* of the marginal effects = AME ] -- .large[.nicegreen[Interpret GLM's just like linear regression]] ??? ie, a one unit increase in the predictor is associated with an X unit increase in the outcome Benefits: 1. Represents the observed effects in the sample 2. Is additive (can be combined with other additive measures) 3. Is used in other two-part models (hurdle models) --- # Average Marginal Effects #### Definition: Continuous Variable $$ AME_k = \frac{f(\beta_k [x_k + h] + \sum_j^p \beta_j x_j) - f(\beta_k x_k + \sum_j^p \beta_j x_j)}{h} $$ <br> -- #### Definition: Dummy Coded Variable $$ AME_{k} = \frac{1}{n} \sum_i^n [ F(\beta X | x_k = 1) - F(\beta X | x_i = 0) ] $$ ??? - Simply, they are the average of the marginal effects - with the observed values in place (no crazy or impossible values) - without diving into the math here, both are taking the average effect of a one unit increase in the `\(kth\)` variable - In other words, take the predicted value at some level of the predictor, then add one to the predictor and take the prediction again. The difference is the marginal effect. - In a paper I'm working on right now, it shows the AME is a consistent estimator of the underlying latent effect (probability, count, etc.) --- # Marginal Mediation Analysis .huge[*The synthesis of Mediation and Average Marginal Effects*] -- .large[.large[ - .bluer[Average Marginal Effects for each regression that make up the complete mediation model]]] -- .large[.large[ - Uses .dcoral[bootstrapping] to understand uncertainty ]] -- .large[.large[ - Created the .nicegreen[software] to perform Marginal Mediation Analysis (discussed later)]] --- # Interpretation of Estimates .footnote[Given the units of the estimates, meta-analytic comparisons across studies can be simplified as well.] .pull-left[.large[ **The individual paths: the corresponding endogenous variable's original metric.** **The indirect effect: the outcome's original metric.** **Both the direct and total effects: the outcome's original metric.** ]] -- .pull-right[ <img src="index_files/figure-html/unnamed-chunk-6-1.png" style="display: block; margin: auto;" /> ] ??? - One of the main advantages - However, given the number of combination of variable types, it is best to consider some basic interpretation principles - Review principles and show meaning in figure 1. So with M being binary, the AME is a Marginal Probability or Risk - A one unit increase in X is associated with a .23 increase in the risk of M. 2. In other words, it is the effect of X on Y through M (it should be in the outcomes units) - A one unit increase in X is associated with a `0.23 * 1.5 = 0.345` increase in the count of Y. 3. No big surprises here since these are essentially the regression. --- # Assumptions of the Approach .left-column[ </br></br> #### The same assumptions as linear models or generalized linear models hold. </br></br></br> .coral[#### Only additional assumption with AME:] ] -- .right-column[ 1. Correct distribution (normal in linear models) 1. Proper variance (homoskedastic in linear models) 1. Linear in parameters 1. Random sample 1. No measurement error 1. No omitted influences ### Marginal effect can be described *additively* (after accounting for all the covariates) ] ??? - all same assumptions - Only additional one -- additively To assess its power, accuracy --> performed Monte Carlo Simulation --- class: inverse, middle, center # Monte Carlo Simulation # of Marginal Mediation Analysis --- # General Model  -- .large[Note: All] `\(e_i\)`.large['s are normally distributed with] `\(\mu = 0\)` .large[and] `\(\sigma = 1\)` --- # Monte Carlo Simulation ## .dcoral[The "a" Path Population Model (.bluer[Binary] Mediator)] `\(Prob(M^b = 1)\)` .large[is a latent continuous variable where] $$ \frac{Prob(M^b = 1)_i}{1 - Prob(M^b = 1)_i} = a_0 + a_1 x_i + e_i $$ .large[The observed variable,] `\(M_i\)`.large[, is defined as follows:] $$ M_i = 0 \text{ if } Prob(M = 1) < .5 $$ $$ M_i = 1 \text{ otherwise} $$ --- # Monte Carlo Simulation ## .dcoral[The "a" Path Population Model (.bluer[Count] Mediator)] `\(M^c\)` .large[is a latent count variable] $$ M^c_i = a_0 + a_1 x_i + e_i $$ .large[The observed variable,] `\(M_i\)`.large[, is defined as follows:] $$ M_i = Po(\lambda = M^c_i) $$ --- # Monte Carlo Simulation ## .dcoral[The "b" and "c'" Path Population Model (for both the binary and count mediators)] $$ Y_i = b_0 + b_1 M_i + c'_1 x_c + e_i $$ ??? - The known population model consist of a binary M and continuous (approx normal) outcome - A path has a latent probability variable that maps onto a discrete mediator - B and C path is essentially a multiple linear regression --- # Monte Carlo Simulation .huge[.dcoral[Conditions] Varied for the simulation:] .large[.large[ - Varying sample size (50 - 1000) - Varying effect size of each path (small, medium, large) - Varying Mediator's distribution (binomial, poisson) ]] -- .large[.large[Each condition will have 500 replications]] ??? - The conditions tested will be broad for basic understanding of the method's behavior - Sample size: basic for what is needed for logistic - Effect sizes range from small to big - Next few (proportion and distribution) are because logistic is involved - Bootstrap size (hopefully 500 is sufficient) Outcomes will be: 1. bias (i.e., is the mean of the estimates at the population mean?), 2. power (i.e., how often does the null properly get rejected?), 3. confidence interval coverage (i.e., does the confidence interval cover the proper interval?), and 4. how closely `\(a \times b + c'\)` is to `\(c\)` (i.e., does the indirect plus the direct effect equal the total effect?) --- # Simulation Results <br><br> .center[ ## Indirect + Direct = Total .large[(within rounding error, even at low sample sizes)] <br> .huge[*Decomposing the effect doesn't change it*] ] --- background-image: url(sim_fig_acc.png) background-position: 85% 90% background-size: 800px # Estimation <br> Consistency --- background-image: url(sim_fig_power.png) background-position: 50% 85% background-size: 900px # Statistical Power ??? Definitions: Logistic (OR) 1. Small = 1.58 2. Medium = 3.44 3. Large = 6.73 Count 1. Small = 1.34 2. Medium = 1.82 3. Large = 3.01 --- background-image: url(sim_fig_ci.png) background-position: 85% 90% background-size: 750px # CI Coverage --- # Creating Software ### In `R` .pull-left[.large[ Provides the ability to .dcoral[use new methods] ]] .pull-right[.large[ Makes .nicegreen[understanding data] easier and less error-prone ]] -- <br> .center[ ### .dcoral[Two of my software packages:] .large[ The `MarginalMediation` R package The `furniture` R package ]] --- # `MarginalMediation` .large[.large[ New package to use Marginal Mediation Analysis ]] ```r library(MarginalMediation) pathbc = glm(marijuana ~ home_meals + gender + age + asthma, data = nhanes_2010, family = "binomial") patha1 = glm(home_meals ~ gender + age + asthma, data = nhanes_2010, family = "poisson") patha2 = glm(age ~ gender + asthma, data = nhanes_2010, family = "poisson") fit = mma(pathbc, patha1, patha2, ind_effects = c("genderFemale-home_meals", "age-home_meals", "asthmaNo-home_meals", "genderFemale-age", "asthmaNo-age"), boot = 500) ``` --- ``` ┌───────────────────────────────┐ │ Marginal Mediation Analysis │ └───────────────────────────────┘ A marginal mediation model with: 2 mediators 5 indirect effects 3 direct effects 500 bootstrapped samples 95% confidence interval n = 1417 ── Unstandardized Effects ── ── Indirect Effects ── A-path B-path Indirect Lower Upper genderFemale-home_meals -1.34842 -0.00973 0.01312 0.00374 0.02428 age-home_meals -0.05713 -0.00973 0.00056 -0.00009 0.00142 asthmaNo-home_meals -0.00410 -0.00973 0.00004 -0.00715 0.00605 genderFemale-age -0.21555 0.00066 -0.00014 -0.00284 0.00197 asthmaNo-age 0.35030 0.00066 0.00023 -0.00300 0.00428 ── Direct Effects ── Direct Lower Upper genderFemale 0.10430 0.04259 0.15461 age 0.00066 -0.00628 0.00748 asthmaNo -0.00172 -0.07786 0.07155 ----- ``` --- # `furniture` .large[ Table 1 (The usual "Table 1" in publications)] .pull-left[ ```r library(furniture) table1(nhanes, Age, GeneralHealth, Sex, Cancer, Asthma, splitby = ~Overweight, output = "latex2", test = TRUE) ``` ] .pull-right[  ] --- # `furniture` .large[ Table C (for Correlation)] ```r library(furniture) tableC(nhanes, Age, ModeActivity, VigActivity, Meals, output = "latex2", type = "pearson") ```  --- background-image: url(RJournal_snapshot.png) background-position: 50% 75% background-size: 950px # `furniture` .large[R Journal paper introducing the software and discusses the benefits to using a .dcoral[reproducible approach to making tables]] --- class: inverse, center, middle count: false # Childhood and Adolescent Health --- # Childhood and Adolescent Health .huge[.dcoral[ One of my recent projects in this area:] - Adolescent Religiosity and Substance Use ] --- # Adolescent Religiosity and Substance Use .large[.large[ Replication of Ford and Hill (2012) that used categorical mediators and outcomes 1. Demonstrates the use of Marginal Mediation Analysis 2. Demonstrates the interpretability increase in using MMA over the (good) approach of Ford and Hill 3. Highlights important (potentially) causal pathways leading to substance use ]] ??? Ford and Hill in *Substance Use and Misuse* --- background-image: url(fig_application_model.jpg) background-position: 50% 85% background-size: 800px # Adolescent Religiosity and Substance Use --- # Results ### Proportion of Effect that is Mediated
--- background-image: url(fig_relig_sa.jpg) background-position: 50% 85% background-size: 800px --- class: inverse, middle, center # Future Directions --- # Future Directions .pull-left[.large[.large[ ### Continue: - My work with Marginal Mediation Analysis - Software development that helps researchers ]]] -- .pull-right[.large[.large[ ### Increase: - My methodological work regarding Mixture Modeling - My research line regarding childhood substance use ]]] --- # Conclusions .huge[ Ultimately, my work is based on .dcoral[**increasing understanding and interpretability**] through: 1. Method and tool development 2. Prevention research ] --- class: inverse, center, middle # Thank you. <!-- Changing math font size --> <script type="text/x-mathjax-config"> MathJax.Hub.Config({ "HTML-CSS": { scale: 300, linebreaks: { automatic: true } }, SVG: { linebreaks: { automatic:true } }, CommonHTML: { scale: 150, linebreaks: { automatic: true } }, displayAlign: "center" }); </script>