Table1.Rmd
library(furniture)
This vignette demonstrates the main function of the
furniture
package–table1()
. This vignette is
current as of furniture
1.9.14.
The main parts of the table1()
are below:
table1(.data, ..., splitby, row_wise, test, type, output, format_number, na.rm)
It contains several useful features for summarizing your data:
medians
option, you can obtain the median and the first
quartile/third quantile.knitr::kable
).output
, format_output
, simple
and
condense
.export = "file_name"
.To illustrate, we’ll walk through the main arguments with an example on some fictitious data.
set.seed(84332)
## Create Fictitious Data containing several types of variables
df <- data.frame(a = sample(1:10000, 10000, replace = TRUE),
b = runif(10000) + rnorm(10000),
c = factor(sample(c(1,2,3,4,NA), 10000, replace=TRUE)),
d = factor(sample(c(0,1,NA), 10000, replace=TRUE)),
e = trunc(rnorm(10000, 20, 5)),
f = factor(sample(c(0,1,NA), 10000, replace=TRUE)))
We will use df
to show these main features of
table1
.
For table1
, the ellipses (the ...
), are the
variables to be summarized that are found in your data. Here, we have
a
through e
in df
.
table1(df,
a, b, c, d, e)
##
## ────────────────────────
## Mean/Count (SD/%)
## n = 5362
## a
## 4938.4 (2858.0)
## b
## 0.5 (1.0)
## c
## 1 1306 (24.4%)
## 2 1352 (25.2%)
## 3 1377 (25.7%)
## 4 1327 (24.7%)
## d
## 0 2747 (51.2%)
## 1 2615 (48.8%)
## e
## 19.5 (5.0)
## ────────────────────────
To get means/count and SD’s/percentages by a stratifying variable,
simply use the splitby
argument. The splitby can be a
quoted variable (e.g., "df"
) or can be a one-sided formula
as shown below (e.g., ~d
).
table1(df,
a, b, c,
splitby = ~d)
##
## ──────────────────────────────────────
## d
## 0 1
## n = 2747 n = 2615
## a
## 4941.4 (2863.8) 4935.2 (2852.5)
## b
## 0.5 (1.1) 0.5 (1.0)
## c
## 1 648 (23.6%) 658 (25.2%)
## 2 694 (25.3%) 658 (25.2%)
## 3 739 (26.9%) 638 (24.4%)
## 4 666 (24.2%) 661 (25.3%)
## ──────────────────────────────────────
You can get percentages by rows instead of by columns (i.e., groups)
by using the row_wise = TRUE
option.
table1(df,
a, b, c,
splitby = ~d,
row_wise = TRUE)
##
## ──────────────────────────────────────
## d
## 0 1
## n = 2747 n = 2615
## a
## 4941.4 (2863.8) 4935.2 (2852.5)
## b
## 0.5 (1.1) 0.5 (1.0)
## c
## 1 648 (49.6%) 658 (50.4%)
## 2 694 (51.3%) 658 (48.7%)
## 3 739 (53.7%) 638 (46.3%)
## 4 666 (50.2%) 661 (49.8%)
## ──────────────────────────────────────
It is easy to test for bivariate relationships, as in common in many
Table 1’s, using test = TRUE
.
table1(df,
a, b, c,
splitby = ~d,
test = TRUE)
##
## ──────────────────────────────────────────────
## d
## 0 1 P-Value
## n = 2747 n = 2615
## a 0.937
## 4941.4 (2863.8) 4935.2 (2852.5)
## b 0.241
## 0.5 (1.1) 0.5 (1.0)
## c 0.157
## 1 648 (23.6%) 658 (25.2%)
## 2 694 (25.3%) 658 (25.2%)
## 3 739 (26.9%) 638 (24.4%)
## 4 666 (24.2%) 661 (25.3%)
## ──────────────────────────────────────────────
By default, only the p-values are shown but other options exist such
as stars or including the test statistics with the p-values using the
format_output
argument.
The table can be simplified by just producing percentages for categorical variables. Further, it can be condensed by providing only a reference group’s percentages for binary variables and the means and SD’s are provided on the same line as the variable name.
##
## ──────────────────────────────────────────────
## d
## 0 1 P-Value
## n = 1801 n = 1720
## f: 1 50% 49.7% 0.903
## a 4938.7 (2874.3) 4890.0 (2839.7) 0.613
## b 0.5 (1.1) 0.5 (1.0) 0.308
## c 0.016
## 1 22.5% 25.4%
## 2 25% 25.4%
## 3 27.9% 23.5%
## 4 24.5% 25.7%
## ──────────────────────────────────────────────
If the medians and the interquartile range is desired instead of
means and SD’s, simply use the second
argument:
table1(df,
f, a, b, c,
splitby = ~d,
test = TRUE,
type = c("simple", "condensed"),
second = c("a", "b"))
##
## ──────────────────────────────────────────────
## d
## 0 1 P-Value
## n = 1801 n = 1720
## f: 1 50% 49.7% 0.903
## a 4930.0 [5106.0] 4906.0 [4931.0] 0.613
## b 0.5 [1.4] 0.5 [1.4] 0.308
## c 0.016
## 1 22.5% 25.4%
## 2 25% 25.4%
## 3 27.9% 23.5%
## 4 24.5% 25.7%
## ──────────────────────────────────────────────
Several output types exist for the table (all of the
knitr::kable
options) including html
as shown
below. Others include:
table1(df,
a, b, c,
splitby = ~d,
test = TRUE,
output = "html")
0 | 1 | P-Value | |
---|---|---|---|
n = 2747 | n = 2615 | ||
a | 0.937 | ||
4941.4 (2863.8) | 4935.2 (2852.5) | ||
b | 0.241 | ||
0.5 (1.1) | 0.5 (1.0) | ||
c | 0.157 | ||
1 | 648 (23.6%) | 658 (25.2%) | |
2 | 694 (25.3%) | 658 (25.2%) | |
3 | 739 (26.9%) | 638 (24.4%) | |
4 | 666 (24.2%) | 661 (25.3%) |
For some papers you may want to format the numbers by inserting a
comma in as a placeholder in big numbers (e.g., 30,000 vs. 30000). You
can do this by using format_number = TRUE
.
table1(df,
a, b, c,
splitby = ~d,
test = TRUE,
format_number = TRUE)
##
## ──────────────────────────────────────────────────
## d
## 0 1 P-Value
## n = 2747 n = 2615
## a 0.937
## 4,941.4 (2,863.8) 4,935.2 (2,852.5)
## b 0.241
## 0.5 (1.1) 0.5 (1.0)
## c 0.157
## 1 648 (23.6%) 658 (25.2%)
## 2 694 (25.3%) 658 (25.2%)
## 3 739 (26.9%) 638 (24.4%)
## 4 666 (24.2%) 661 (25.3%)
## ──────────────────────────────────────────────────
na.rm
In order to explore the missingness in the factor variables, using
na.rm = FALSE
does the counts and percentages of the
missing values as well.
table1(df,
a, b, c,
splitby = ~d,
test = TRUE,
na.rm = FALSE)
##
## ───────────────────────────────────────────────
## d
## 0 1 P-Value
## n = 3430 n = 3269
## a 0.479
## 4918.2 (2861.5) 4967.6 (2863.9)
## b 0.374
## 0.5 (1.1) 0.5 (1.0)
## c 0.157
## 1 648 (18.9%) 658 (20.1%)
## 2 694 (20.2%) 658 (20.1%)
## 3 739 (21.5%) 638 (19.5%)
## 4 666 (19.4%) 661 (20.2%)
## NA 683 (19.9%) 654 (20%)
## ───────────────────────────────────────────────
Here we do not have any missingness but it shows up as zeros to show that there are none there.
Finally, and very importantly, to make it easier to implement in the
tidyverse of packages, a piping option is available. This option can use
a grouped_df
object output by
dplyr::group_by()
and use the groups indicated there as
shown below.
library(dplyr)
df %>%
filter(f == 1) %>%
group_by(d) %>%
table1(a, b, c,
test = TRUE,
type = c("simple", "condensed"))
##
## ──────────────────────────────────────────────
## d
## 0 1 P-Value
## n = 900 n = 855
## a 4971.3 (2861.1) 4820.0 (2849.9) 0.267
## b 0.6 (1.0) 0.5 (1.0) 0.528
## c 0.149
## 1 22.6% 24.9%
## 2 25.3% 24.8%
## 3 27.2% 22.9%
## 4 24.9% 27.4%
## ──────────────────────────────────────────────
This includes the ability to use multiple grouping variables. The first value is the first grouping variable, then an underscore, followed by the value of the second grouping variable.
##
## ──────────────────────────────────────────────────────────────────────────────
## d, f
## 0-0 1-0 0-1 1-1 P-Value
## n = 901 n = 865 n = 900 n = 855
## a 4906.2 (2888.6) 4959.2 (2829.6) 4971.3 (2861.1) 4820.0 (2849.9) 0.68
## b 0.5 (1.1) 0.5 (1.0) 0.6 (1.0) 0.5 (1.0) 0.629
## c 0.145
## 1 22.4% 25.9% 22.6% 24.9%
## 2 24.8% 26% 25.3% 24.8%
## 3 28.6% 24% 27.2% 22.9%
## 4 24.2% 24% 24.9% 27.4%
## ──────────────────────────────────────────────────────────────────────────────
You can also adjust the variable names from within the function as so:
table1(df,
"Avar" = a, "Bvar" = b, "Cvar" = c,
splitby = ~d,
test = TRUE)
##
## ──────────────────────────────────────────────
## d
## 0 1 P-Value
## n = 2747 n = 2615
## Avar 0.937
## 4941.4 (2863.8) 4935.2 (2852.5)
## Bvar 0.241
## 0.5 (1.1) 0.5 (1.0)
## Cvar 0.157
## 1 648 (23.6%) 658 (25.2%)
## 2 694 (25.3%) 658 (25.2%)
## 3 739 (26.9%) 638 (24.4%)
## 4 666 (24.2%) 661 (25.3%)
## ──────────────────────────────────────────────
This is particularly useful when you adjust a variable within the function:
##
## ────────────────────────────────────────
## d
## 0 1 P-Value
## n = 2747 n = 2615
## A 0.507
## 0 130 (4.7%) 135 (5.2%)
## 1 2617 (95.3%) 2480 (94.8%)
## b 0.241
## 0.5 (1.1) 0.5 (1.0)
## c 0.157
## 1 648 (23.6%) 658 (25.2%)
## 2 694 (25.3%) 658 (25.2%)
## 3 739 (26.9%) 638 (24.4%)
## 4 666 (24.2%) 661 (25.3%)
## ────────────────────────────────────────
Here we changed a
to a factor within the function. In
order for the name to look better, we can assign a new name, otherwise
it would be named something like factor.ifelse.a...
.
As a final note, the "table1"
object can be coerced to a
data.frame
very easily:
tab1 <- table1(df,
a, b, c,
splitby = ~d,
test = TRUE)
as.data.frame(tab1)
## . 0 1 P.Value
## 1 n = 2747 n = 2615
## 2 a 0.937
## 3 4941.4 (2863.8) 4935.2 (2852.5)
## 4 b 0.241
## 5 0.5 (1.1) 0.5 (1.0)
## 6 c 0.157
## 7 1 648 (23.6%) 658 (25.2%)
## 8 2 694 (25.3%) 658 (25.2%)
## 9 3 739 (26.9%) 638 (24.4%)
## 10 4 666 (24.2%) 661 (25.3%)