Constructing tables in reports can be challenging. Doing it the manual way, you would summarize your data and the enter it in a word table. To make this operation reproducible (without any copy-paste operations) we need to dive into the world of table packages.
In R Markdown we can use the knitr package. This package, together with the extensions provided with kableExtra makes making tables in HTML and PDF-formats quite easy.
If you want to make tables directly in word-format, flextable is a good alternative.
Other packages can be used to get basically the same results, with variation in the way you specify the table. Some examples:
An overview of other packages can be found here.
Here I will introduce knitr kableExtra and flextable.
The kable
function is described in a newly developed book, available online called the R Markdown Cookbook. The package, kableExtra
comes with excellent vignettes for both html and pdf outputs.
When using R-Markdown, we can create reports in three formats: pdf, html and word. Unfortunately, kable
/kableExtra
is not possible to use with word-files (see below for flextable).
kable
/kableExtra
?dplyr
/tidyr
pipes.kable
offers a quick way to produce simple tables, kableExtra
can help creating advanced tables.In this example we will produce a “participant characteristics table”. This is a common “Table 1” in many reports.
Load packages with the following commands:
library(tidyverse) # for data wrangling and file download
library(knitr) # for the kable function
library(kableExtra) # for extended capabilities
Download the example file with the following command:
download.file("https://www.dropbox.com/s/g2t97j8edqvvktn/tr003_dxa.csv?raw=1",
destfile = "dxa_data.csv")
dxa_data <- read_csv("dxa_data.csv")
Before we can start to format the table, we need to decide what data to put in it. The goal here is to create a summary table of participant characteristics. We can group the participants based on inclusion/exclusion from the study and sex.
Above, we stored the data in an object called dxa_data
. Use the below code to see what it contains.
dxa_data %>%
print()
This is data from iDXA measurements that also contains weight, height, age, sex and if participants were included in the data analysis or not (include == "incl"
or include == "excl"
). First we want to use lean body mass as a percentage of the whole body mass as a measure to describe the participants. The below code does the calculation. We also select columns of interest for further data preparations using the select()
function. Store the data in a new object, so that the raw data can be used later.
dxa <- dxa_data %>%
rowwise() %>%
mutate(LBM = (lean.whole / (fat.whole + BMD.whole + lean.whole)) * 100) %>%
select(subject, age, height, weight, LBM, sex, include) %>%
print()
Using dplyr::group_by()
and dplyr::summarise()
we can now create a summary table.
When writing
dplyr::group_by()
it means that we specify the functiongroup_by()
from the packagedplyr
. The double colon is used to connect the package to the function, we use it here to specify what package we are using.
dxa %>%
ungroup() %>%
group_by(sex, include) %>%
summarise(m.age = mean(age),
s.age = sd(age),
m.height = mean(height),
s.height = sd(height),
m.weight = mean(weight),
s.weight = sd(weight),
m.lbm = mean(LBM),
s.lbm =sd(LBM)) %>%
print()
This was somewhat inefficient. We can add a line and calculate per variable.
summary_table <- dxa %>%
ungroup() %>%
pivot_longer(cols = age:LBM, names_to = "variable", values_to = "value") %>%
group_by(sex, include, variable) %>%
summarise(m = mean(value),
s = sd(value)) %>%
print()
This was more efficient! But it’s not formatted to be a table just yet. Of course there are many ways to create this table. But first, we can format the numbers to make nice presentations. For simplicity, lets say that all numbers will be printed with a single digit. We could use round(x, 1)
but this will in some cases produce a number without trailing zero. Instead we will use sprintf("%.1f", x)
where x is the number of interest. We will also combine mean and standard deviation to a single cell using the paste0()
function. paste0()
combines numbers and text into a character string, this means they will lose some information but that is OK.
summary_table <- dxa %>%
ungroup() %>%
pivot_longer(cols = age:LBM, names_to = "variable", values_to = "value") %>%
group_by(sex, include, variable) %>%
summarise(m = mean(value),
s = sd(value)) %>%
ungroup() %>%
mutate(summary = paste0(sprintf("%.1f", m),
" (",
sprintf("%.1f", s),
")")) %>%
select(sex, include, variable, summary) %>%
pivot_wider(id_cols = variable, names_from = c(sex, include), values_from = summary ) %>%
print()
The summary table is ready for making a table.
In R Markdown chunk settings, we need to set the output format to results = "asis"
. This will print the table correctly. To create the table, all we need to do is to is to pipe it to kable()
. In the kable
function we can set caption
to the represent the title of the table. col.names
lets you specify column names without changing variable names, this is handy when you want the same name to repeat many times.
```{r my_table, results="asis"}
summary_table %>%
kable(format = "html", col.names = c("Variable",
"Female excluded",
"Female included",
"Male excluded",
"Male included"),
caption = "Participant characteristics")
```
We might want to extend the table and make it a bit nicer, this is where kableExtra
comes in. Lets say that we want to include a row above Female/Male and only print this information once. After the kable()
command, we use commands from the kableExtra
package.
summary_table %>%
kable(format = "html", col.names = c(" ",
"Excluded",
"Included",
"Excluded",
"Included"),
caption = "Participant characteristics") %>%
add_header_above(c(" " = 1, "Female" = 2, "Male" = 2))
There is still some work to do here, variable names should be with a capital letter, this is most easily done in data preparation stages by changing the variables factor.
dxa %>%
ungroup() %>%
pivot_longer(cols = age:LBM, names_to = "variable", values_to = "value") %>%
group_by(sex, include, variable) %>%
summarise(m = mean(value),
s = sd(value)) %>%
ungroup() %>%
mutate(summary = paste0(sprintf("%.1f", m),
" (",
sprintf("%.1f", s),
")")) %>%
select(sex, include, variable, summary) %>%
pivot_wider(id_cols = variable, names_from = c(sex, include), values_from = summary ) %>%
# sort the rows -- create a factor level
mutate(variable = factor(variable, levels = c("age", "height", "weight", "LBM"))) %>%
mutate(Variable = c("Age (yrs)", "Stature (cm)", "Body mass (kg)", "Lean body mass (%)")) %>%
arrange(variable) %>%
select(Variable, female_excl:male_incl) %>%
kable(format = "html", col.names = c(" ",
"Excluded",
"Included",
"Excluded",
"Included"),
caption = "Table 1. Participant characteristics") %>%
add_header_above(c(" " = 1, "Female" = 2, "Male" = 2))
kableExtra lets us add footnotes using the footnote
function. An example of the whole pipe could be:
dxa %>%
ungroup() %>%
pivot_longer(cols = age:LBM, names_to = "variable", values_to = "value") %>%
group_by(sex, include, variable) %>%
summarise(m = mean(value),
s = sd(value)) %>%
ungroup() %>%
mutate(summary = paste0(sprintf("%.1f", m),
" (",
sprintf("%.1f", s),
")")) %>%
select(sex, include, variable, summary) %>%
pivot_wider(id_cols = variable, names_from = c(sex, include), values_from = summary ) %>%
# sort the rows -- create a factor level
mutate(variable = factor(variable, levels = c("age", "height", "weight", "LBM"))) %>%
arrange(variable) %>%
mutate(Variable = c("Age (yrs)", "Stature (cm)", "Body mass (kg)", "Lean body mass (%)")) %>%
select(Variable, female_excl:male_incl) %>%
kable(format = "html", col.names = c(" ",
"Excluded",
"Included",
"Excluded",
"Included"),
caption = "Table 1. Participant characteristics") %>%
add_header_above(c(" " = 1, "Female" = 2, "Male" = 2)) %>%
footnote(general = "Values are Mean (SD)")
knitr::kable
and kableExtra
The vignette describining the kableExtra package is highly recommended! If you want to extend to pdf-tables (which are really nice!). Use the \(\LaTeX\) vignette found here. When creating pdf/\(\LaTeX\) tables/reports some basic understandning of \(\LaTeX\) is good.
Flextable comes with the option of specifying tables directly in word outputs. This does not need any further formatting compared to HTML formatting.
Lets use the summary_table
to specify the flextable. Flextable also works as part of a pipe. The basic function is flextable
.
library(flextable)
summary_table %>%
flextable()
We can now add changes to the underlying table and add new header labels to produce a table similar to the one above.
library(flextable)
summary_table %>%
mutate(variable = factor(variable, levels = c("age", "height", "weight", "LBM"),
labels = c("Age (yrs)",
"Stature (cm)",
"Body mass (kg)",
"Lean body mass (%)"))) %>%
flextable() %>%
set_header_labels(variable = "",
female_excl = "Exclude",
female_incl = "Include",
male_excl = "Exclude",
male_incl = "Include")
Note that flextable does not need output = "asis"
to work in markdown files. Similar to kable, flextable can give a preview by just running the code, meaning you do not have to knit the whole document to see what changes you have done.
Lets finnish up by adding a header and a footnote.
library(flextable)
summary_table %>%
mutate(variable = factor(variable, levels = c("age", "height", "weight", "LBM"),
labels = c("Age (yrs)",
"Stature (cm)",
"Body mass (kg)",
"Lean body mass (%)"))) %>%
flextable() %>%
set_header_labels(variable = "",
female_excl = "Exclude",
female_incl = "Include",
male_excl = "Exclude",
male_incl = "Include") %>%
# Adds a header specified for all columns of the table
add_header_row(values = c("", "Female", "Female", "Male", "Male")) %>%
# Merge the new cells with the same data
# part = "header" meands that we want to add a row in the "header" part of the table.
# i = 1 means row 1
# j = 2:3 means column 2 to 3
merge_at(part = "header", i = 1, j = 2:3) %>%
merge_at(part = "header", i = 1, j = 4:5) %>%
# Add footnote
add_footer_row(values = "Values are mean and (SD)", colwidths = 5) %>%
# Make the columns widths match the content
autofit()
Nice! This table should print if the code chunk is included in a R Markdown file and the output is choosen to be “Knit to word”. See flextable documentation for more customizations.