class: center, middle, inverse, title-slide # Center and Spread ## Cohen Chapter 3
.small[EDUC/PSY 6600] --- class: center, middle ## "You can, for example, never foretell what any one man will do, but you can say with precision what an average number will be up to. *Individuals vary*, but percentages remain constant. So says the statistician." ### -- Sherlock Holmes, *The Sign of Four* --- background-image: url(figures/fig_dist_examples.png) background-position: 50% 90% background-size: 750px # Distributions Examples --- background-image: url(figures/fig_3centers.png) background-position: 50% 70% background-size: 1000px # Three Measures of Center --- background-image: url(figures/fulcrum.png) background-position: 50% 80% background-size: 850px # Mean vs. Median .large[.large[ .nicegreen[Median]: the center point, half of values are on each side, not affected by the skew, the "typical value" .dcoral[Mean]: the "balance" point, pulled to the side of the skew, not typical <br><br><br> ]] -- .large[If distribution is symmetrical: mean = median] --- background-image: url(figures/fig_dist_income_2010.png) background-position: 50% 70% background-size: 1000px --- # Distributions and Numbers .pull-left[ .large[ - The MEDIAN is **resistant** & doesn't change much - The MEAN is **influenced** & changes more! - Average does NOT mean typical - Average moves when we remove the high point ]] -- .pull-right[ <img src="u01_Ch3_CenterSpread_files/figure-html/unnamed-chunk-1-1.png" style="display: block; margin: auto;" /> ] --- # Distributions and Numbers .pull-left[ .large[ - The MEDIAN is **resistant** & doesn't change much - The MEAN is **influenced** & changes more! - Average does NOT mean typical - Average moves when we remove the high point - Median doesn't move when we remove the high point ]] -- .pull-right[ <img src="u01_Ch3_CenterSpread_files/figure-html/unnamed-chunk-2-1.png" style="display: block; margin: auto;" /> ] --- background-image: url(figures/fig_three_spreads.jpg) background-position: 50% 70% background-size: 1100px # Three Measures of Spread --- # Best Summary of the Data? .huge[ "... the perfect estimator does not exist." -- Rand Wilcox, 2001 ] -- .pull-left[ .large[ ## .bluer[Median and SIR] Skewed data or outliers ]] .pull-right[ .large[ ## .nicegreen[Mean and SD] Symmetrical and no outliers ]] -- <br> .large[.large[ A .dcoral[graph gives the best overall picture of a distribution] ]] --- background-image: url(figures/fig_sd_properties.jpg) background-position: 50% 70% background-size: 1100px # Properties of the Mean and SD --- # Skewness .pull-left[ .large[ - Degree of .dcoral[symmetry] in distribution - Can detect **visually** (histogram, boxplot) - Skewness statistic - Based on cubed deviations from the mean - Divided by SE of skewness - `\(> \pm 2\)` is a sign of skewed data ]] -- .pull-right[ .large[ $$ Skewness = \frac{N}{N - 2}\frac{\sum_{i=1}^n (X_i - \bar{X})^3}{(N - 1)s^3} $$ - Interpreting skewness statistic - positive value = positive (right) skew - negative value = negative (left) skew - zero value = no skew ]] -- <img src="u01_Ch3_CenterSpread_files/figure-html/unnamed-chunk-3-1.png" style="display: block; margin: auto;" /> --- # Kurtosis $$ Kurtosis = \frac{N(N+1)}{(N - 2)(N - 3)}\frac{\sum_{i=1}^n (X_i - \bar{X})^4}{(N - 1)s^4} - 3 \frac{(N - 1)(N - 1)}{(N - 2)(N - 3)} $$ .pull-left[ .large[ - Degree of .dcoral[flatness] in distribution - Harder to detect visually - Kurtosis statistic - Based on deviations from the mean (raised to 4th power) - Divided by SE of kurtosis - `\(> \pm 2\)` is a sign of problems with kurtosis ]] -- .pull-right[ .large[ - Interpreting kurtosis statistic - positive value = leptokurtic (peaked) - negative value = platykurtic (flat) - zero value = mesokurtic (normal) ]] --- background-image: url(figures/fig_kurtosis.png) background-position: 50% 70% background-size: 1000px # Kurtosis --- background-image: url(figures/fig_5sum_2.png) background-position: 50% 50% # Five-Number Summary --- background-image: url(figures/fig_5sum_3.png) background-position: 50% 50% # Five-Number Summary - Median --- background-image: url(figures/fig_5sum_4.png) background-position: 50% 50% # Five-Number Summary - Quartiles --- background-image: url(figures/fig_5sum_5.png) background-position: 50%50% # Boxplots (Modified) - Lines --- background-image: url(figures/fig_5sum_6.png) background-position: 50% 50% # Boxplots (Modified) - IQR and SIQR --- background-image: url(figures/fig_boxplot_hist.png) background-position: 50% 70% background-size: 1000px # Boxplot vs. Histogram --- # Boxplots by Group <img src="u01_Ch3_CenterSpread_files/figure-html/unnamed-chunk-4-1.png" style="display: block; margin: auto;" /> --- # Density Plots <img src="u01_Ch3_CenterSpread_files/figure-html/unnamed-chunk-5-1.png" style="display: block; margin: auto;" /> --- # Quantile-Quantile (Q-Q) Plot <img src="u01_Ch3_CenterSpread_files/figure-html/unnamed-chunk-6-1.png" style="display: block; margin: auto;" /> --- class: inverse, center, middle # Let's Apply This To the Cancer Dataset <br> (on Canvas) --- # Read in the Data ```r library(tidyverse) # Loads several very helpful 'tidy' packages library(rio) # Read in SPSS datasets library(furniture) # Nice tables (by our own Tyson Barrett) library(psych) # Lots of nice tid-bits ``` ```r cancer_raw <- rio::import("cancer.sav") ``` -- ### And Clean It ```r cancer_clean <- cancer_raw %>% dplyr::rename_all(tolower) %>% dplyr::mutate(id = factor(id)) %>% dplyr::mutate(trt = factor(trt, labels = c("Placebo", "Aloe Juice"))) %>% dplyr::mutate(stage = factor(stage)) ``` --- ## Frequency Tables with `furniture::tableF()` .pull-left[ ```r cancer_clean %>% furniture::tableF(age, n = 8) ``` ``` ────────────────────────────────── age Freq CumFreq Percent CumPerc 27 1 1 4.00% 4.00% 42 1 2 4.00% 8.00% 44 1 3 4.00% 12.00% 46 2 5 8.00% 20.00% ... ... ... ... ... 68 1 20 4.00% 80.00% 69 1 21 4.00% 84.00% 73 1 22 4.00% 88.00% 77 2 24 8.00% 96.00% 86 1 25 4.00% 100.00% ────────────────────────────────── ``` ] .pull-right[ ```r cancer_clean %>% furniture::tableF(trt) ``` ``` ───────────────────────────────────────── trt Freq CumFreq Percent CumPerc Placebo 14 14 56.00% 56.00% Aloe Juice 11 25 44.00% 100.00% ───────────────────────────────────────── ``` ] --- ## Extensive Descriptive Stats `psych:describe()` ```r cancer_clean %>% dplyr::select(age, weighin, totalcin, totalcw2, totalcw4, totalcw6) %>% psych::describe() ``` ``` vars n mean sd median trimmed mad min max range skew age 1 25 59.64 12.93 60.0 59.95 11.86 27 86.0 59.0 -0.31 weighin 2 25 178.28 31.98 172.8 176.57 21.05 124 261.4 137.4 0.73 totalcin 3 25 6.52 1.53 6.0 6.33 0.00 4 12.0 8.0 1.80 totalcw2 4 25 8.28 2.54 8.0 8.10 2.97 4 16.0 12.0 1.01 totalcw4 5 25 10.36 3.47 10.0 10.19 2.97 6 17.0 11.0 0.49 totalcw6 6 23 9.48 3.49 9.0 9.21 2.97 3 19.0 16.0 0.77 kurtosis se age -0.01 2.59 weighin 0.07 6.40 totalcin 4.30 0.31 totalcw2 1.14 0.51 totalcw4 -1.00 0.69 totalcw6 0.53 0.73 ``` --- ## Smaller Set with `furniture::table1()` .pull-left[ For the Entire Sample ```r cancer_clean %>% furniture::table1(trt, age, weighin) ``` ``` ───────────────────────────────── Mean/Count (SD/%) n = 25 trt Placebo 14 (56%) Aloe Juice 11 (44%) age 59.6 (12.9) weighin 178.3 (32.0) ───────────────────────────────── ``` ] .pull-right[ Breaking the Sample by a Factor ```r cancer_clean %>% dplyr::group_by(trt) %>% furniture::table1(age, weighin) ``` ``` ─────────────────────────────────── trt Placebo Aloe Juice n = 14 n = 11 age 59.8 (9.0) 59.5 (17.2) weighin 167.5 (23.0) 192.0 (37.4) ─────────────────────────────────── ``` ] --- ## Boxplot, one one `geom_boxplot()` ```r cancer_clean %>% ggplot(aes(x = "Full Sample", # x = "quoted text" y = age)) + # y = contin_var (no quotes) geom_boxplot() ``` <img src="u01_Ch3_CenterSpread_files/figure-html/unnamed-chunk-16-1.png" style="display: block; margin: auto;" /> --- ## Boxplots, by groups - (1) fill color ```r cancer_clean %>% ggplot(aes(x = "Full Sample", # x = "quoted text" y = age, # y = contin_var (no quotes) fill = trt)) + # fill = group_var (no quotes) geom_boxplot() ``` <img src="u01_Ch3_CenterSpread_files/figure-html/unnamed-chunk-17-1.png" style="display: block; margin: auto;" /> --- ## Boxplots, by groups - (2) x-axis breaks ```r cancer_clean %>% ggplot(aes(x = trt, # x = group_var (no quotes) y = age)) + # y = contin_var (no quotes) geom_boxplot() ``` <img src="u01_Ch3_CenterSpread_files/figure-html/unnamed-chunk-18-1.png" style="display: block; margin: auto;" /> --- ## Boxplots, by groups - (3) seperate panels ```r cancer_clean %>% ggplot(aes(x = "Full Sample", # x = "quoted text" y = age)) + # y = contin_var (no quotes) geom_boxplot() + facet_grid(. ~ trt) # . ~ group_var (no quotes) ``` <img src="u01_Ch3_CenterSpread_files/figure-html/unnamed-chunk-19-1.png" style="display: block; margin: auto;" /> --- ## Boxplot for a Subset - 1 requirement ```r cancer_clean %>% # Less than 172 Pound at baseline dplyr::filter(weighin < 172) %>% ggplot(aes(x = "Weigh At Baseline < 172", y = age)) + geom_boxplot() ``` <img src="u01_Ch3_CenterSpread_files/figure-html/unnamed-chunk-20-1.png" style="display: block; margin: auto;" /> --- ## Boxplot for a Subset - 2 requirements ```r cancer_clean %>% # At least 150 pounds AND not in Aloe group dplyr::filter(weighin >= 150 & trt == "Placebo") %>% ggplot(aes(x = "Placebo and at least 150 Pounds", y = age)) + geom_boxplot() ``` <img src="u01_Ch3_CenterSpread_files/figure-html/unnamed-chunk-21-1.png" style="display: block; margin: auto;" /> --- ## Boxplot for a Subset - 2 requirements (`%in%`) ```r cancer_clean %>% # In Aloe group, but only stages 2-4 dplyr::filter(trt == "Aloe Juice" & stage %in% c(2, 3, 4)) %>% ggplot(aes(x = "On Aloe Juice and Stage 2-4", y = weighin)) + geom_boxplot() ``` <img src="u01_Ch3_CenterSpread_files/figure-html/unnamed-chunk-22-1.png" style="display: block; margin: auto;" /> --- ## Boxplot for Repeated Measures ```r cancer_clean %>% tidyr::gather(key = "time", # stack the repeated measures value = "value", totalcin, totalcw2, totalcw4, totalcw6) %>% ggplot(aes(x = time, y = value)) + geom_boxplot() ``` <img src="u01_Ch3_CenterSpread_files/figure-html/unnamed-chunk-23-1.png" style="display: block; margin: auto;" /> --- ## Boxplot: COMPLICATED! ```r cancer_clean %>% dplyr::filter(weighin > 130 & stage %in% c(2, 4)) %>% tidyr::gather(key = "time", value = "value", totalcin, totalcw2, totalcw4, totalcw6) %>% ggplot(aes(x = time, y = value, fill = stage)) + geom_boxplot() + facet_grid(. ~ trt) ``` <img src="u01_Ch3_CenterSpread_files/figure-html/unnamed-chunk-24-1.png" style="display: block; margin: auto;" /> --- class: inverse, center, middle # Questions? --- class: inverse, center, middle # Next Topic ### Standard and Normal