--- title: "The freqlist function" author: "Tina Gunderson and Ethan Heinzen" output: rmarkdown::html_vignette: toc: yes toc_depth: 3 vignette: | %\VignetteIndexEntry{The freqlist function} %\VignetteEncoding{UTF-8} %\VignetteEngine{knitr::rmarkdown} --- ```{r setup, include=FALSE} knitr::opts_chunk$set(echo = TRUE, tidy.opts=list(width.cutoff=80), tidy=TRUE, comment=NA) options(width=80, max.print=1000) ``` # Overview `freqlist()` is a function meant to produce output similar to SAS's `PROC FREQ` procedure when using the `/list` option of the `TABLE` statement. `freqlist()` provides options for handling missing or sparse data and can provide cumulative counts and percentages based on subgroups. It depends on the `knitr` package for printing. ```{r message = FALSE} require(arsenal) ``` ## Sample dataset For our examples, we'll load the `mockstudy` data included with this package and use it to create a basic table. Because they have fewer levels, for brevity, we'll use the variables arm, sex, and mdquality.s to create the example table. We'll retain NAs in the table creation. See the appendix for notes regarding default NA handling and other useful information regarding tables in R. ```{r loading.data} # load the data data(mockstudy) # retain NAs when creating the table using the useNA argument tab.ex <- table(mockstudy[, c("arm", "sex", "mdquality.s")], useNA="ifany") ``` # The `freqlist` object The `freqlist()` function is an S3 generic (with methods for tables and formulas) which returns an object of class `"freqlist"`. ```{r console.output} example1 <- freqlist(tab.ex) str(example1) # view the data frame portion of freqlist output head(as.data.frame(example1)) ## or use as.data.frame(example1) ``` # Basic output using `summary()` The `summary` method for `freqlist()` relies on the `kable()` function (in the `knitr` package) for printing. `knitr::kable()` converts the output to markdown which can be printed in the console or easily rendered in Word, PDF, or HTML documents. Note that you must supply `results="asis"` to properly format the markdown output. ```{r, results = 'asis'} summary(example1) ``` You can print a title for the table using the `title=` argument. ```{r, results = 'asis'} summary(example1, title="Basic freqlist output") ``` You can also easily pull out the `freqlist` data frame for more complicated formatting or manipulation (e.g. with another function such as `xtable()` or `pander()`) using `as.data.frame(summary())`: ```{r} head(as.data.frame(summary(example1))) ``` # Using a formula with `freqlist` Instead of passing a pre-computed table to `freqlist()`, you can instead pass a formula, which will be in turn passed to the `xtabs()` function. Additional `freqlist()` arguments are passed through the `...` to the `freqlist()` table method. Note that `freqlist()` sets the `addNA=TRUE` argument by default: ```{r results='asis'} summary(freqlist(~ arm + sex + mdquality.s, data = mockstudy)) ``` One can also set NAs to an explicit value using `includeNA()`. ```{r results='asis'} summary(freqlist(~ arm + sex + includeNA(mdquality.s, "Missing"), data = mockstudy)) ``` In fact, since `xtabs()` allows for left-hand-side weights, so does `freqlist()`! ```{r results='asis'} mockstudy$weights <- c(10000, rep(1, nrow(mockstudy) - 1)) summary(freqlist(weights ~ arm + sex + addNA(mdquality.s), data = mockstudy)) ``` You can also specify multiple weights: ```{r results='asis'} mockstudy$weights2 <- c(rep(1, nrow(mockstudy) - 1), 10000) summary(freqlist(list(weights, weights2) ~ arm + sex + addNA(mdquality.s), data = mockstudy)) ``` # Rounding percentage digits or changing variable names for printing The `digits.pct=` argument takes a single numeric value and controls the number of digits of percentages in the output. The `digits.count=` argument takes a similar argument and controls the number of digits of the count columns. The `labelTranslations=` argument is a named character vector or list. Both options are applied in the following example. ```{r labelTranslations, results = 'asis'} example2 <- freqlist(tab.ex, labelTranslations = c(arm = "Treatment Arm", sex = "Gender", mdquality.s = "LASA QOL"), digits.pct = 1, digits.count = 1) summary(example2) ``` # Additional examples ## Including combinations with frequencies of zero The `sparse=` argument takes a single logical value as input. The default option is `FALSE`. If set to `TRUE`, the sparse option will include combinations with frequencies of zero in the list of results. As our initial table did not have any such levels, we create a second table to use in our example. ```{r sparse, results = 'asis'} summary(freqlist(~ race + sex + arm, data = mockstudy, sparse = TRUE, digits.pct=1)) ``` ## Options for NA handling The various `na.options=` allow you to include or exclude data with missing values for one or more factor levels in the counts and percentages, as well as show the missing data but exclude it from the cumulative counts and percentages. The default option is to include all combinations with missing values. ```{r na.options, results = 'asis'} summary(freqlist(tab.ex, na.options="include")) summary(freqlist(tab.ex, na.options="showexclude")) summary(freqlist(tab.ex, na.options="remove")) ``` ## Frequency counts and percentages subset by factor levels The `strata=` argument internally subsets the data by the specified factor prior to calculating cumulative counts and percentages. By default, when used each subset will print in a separate table. Using the `single = TRUE` option when printing will collapse the subsetted result into a single table. ```{r freq.counts, results='asis'} example3 <- freqlist(tab.ex, strata = c("arm","sex")) summary(example3) #using the single = TRUE argument will collapse results into a single table for printing summary(example3, single = TRUE) ``` ## Show only the "n" most common combinations in each table (`head()` and `sort()`) You can now sort `freqlist()` objects, and, by taking the `head()` of the summary, output the most common frequencies. This looks the prettiest with `dupLabels=TRUE`. ```{r} head(summary(sort(example1, decreasing = TRUE), dupLabels = TRUE)) ``` ## Change labels on the fly ```{r changelabs, results = 'asis'} labs <- c(arm = "Arm", sex = "Sex", mdquality.s = "QOL", freqPercent = "%") labels(example1) <- labs summary(example1) ``` You can also supply `labelTranslations=` to `summary()`. ```{r, results = 'asis'} summary(example1, labelTranslations = labs) ``` ## Using `xtable()` to format and print `freqlist()` results Fair warning: `xtable()` has kind of a steep learning curve. These examples are given without explanation, for more advanced users. ```{r results='asis'} require(xtable) # set up custom function for xtable text italic <- function(x) paste0('', x, '') xftbl <- xtable(as.data.frame(summary(example1)), caption = "xtable formatted output of freqlist data frame", align="|r|r|r|r|c|c|c|r|") # change the column names names(xftbl)[1:3] <- c("Arm", "Gender", "LASA QOL") print(xftbl, sanitize.colnames.function = italic, include.rownames = FALSE, type = "html", comment = FALSE) ``` ## Use `freqlist` in bookdown Since the backbone of `freqlist()` is `knitr::kable()`, tables still render well in bookdown. However, `print.summary.freqlist()` doesn't use the `caption=` argument of `kable()`, so some tables may not have a properly numbered caption. To fix this, use the method described [on the bookdown site](https://bookdown.org/yihui/bookdown/tables.html) to give the table a tag/ID. ```{r eval=FALSE} summary(freqlist(~ sex + age, data = mockstudy), title="(\\#tab:mytableby) Caption here") ``` # Appendix: Notes regarding table options in R ## NAs There are several widely used options for basic tables in R. The `table()` function in base R is probably the most common; by default it excludes NA values. You can change NA handling in `base::table()` using the `useNA=` or `exclude=` arguments. ```{r} # base table default removes NAs tab.d1 <- base::table(mockstudy[, c("arm", "sex", "mdquality.s")], useNA="ifany") tab.d1 ``` `xtabs()` is similar to `table()`, but uses a formula-based syntax. However, NAs must be explicitly added to each factor using the `addNA()` function or using the argument `addNA = TRUE`. ```{r} # without specifying addNA tab.d2 <- xtabs(formula = ~ arm + sex + mdquality.s, data = mockstudy) tab.d2 # now with addNA tab.d3 <- xtabs(~ arm + sex + addNA(mdquality.s), data = mockstudy) tab.d3 ``` Since the formula method of `freqlist()` uses `xtabs()`, NAs should be treated in the same way. `includeNA()` can also be helpful here for setting explicit NA values. ## Table dimname names (dnn) Supplying a data.frame to the `table()` function without giving columns individually will create a contingency table using all variables in the data.frame. However, if the columns of a data.frame or matrix are supplied separately (i.e., as vectors), column names will not be preserved. ```{r} # providing variables separately (as vectors) drops column names table(mockstudy$arm, mockstudy$sex, mockstudy$mdquality.s) ``` If desired, you can use the `dnn=` argument to pass variable names. ```{r} # add the column name labels back using dnn option in base::table table(mockstudy$arm, mockstudy$sex, mockstudy$mdquality.s, dnn=c("Arm", "Sex", "QOL")) ``` You can also name the arguments to `table()`: ```{r} table(Arm = mockstudy$arm, Sex = mockstudy$sex, QOL = mockstudy$mdquality.s) ``` If using `freqlist()`, you can provide the labels directly to `freqlist()` or to `summary()` using `labelTranslations=`.