Reproducible research

Replication can be thought of as the golden standard when the aim is to confirm scientific results. When replicating a scientific study, we collect new data, independent from the study we wish to replicate. However, scientific interpretations of data are also affected by the way we analyze them. This has raised attention recently and reproducible data analysis as argued by e.g. Peng et al. (Peng, Dominici, and Zeger 2006), should be a minimum standard in scientific reporting. This standard requires that a scientific report (e.g. journal article or master’s thesis) is directly built upon the analyses. For many, this is a big change from a work-flow where analyses are done and copied into the report. Today there are many software solutions available to create reproducible reports, answering the call for reproducible data analyses where data, code and software is available for scrutiny (Peng 2011).

In this course you will learn how to work with software designed to make analyses reproducible. We will use R Markdown to create reproducible reports.

Resources

R Markdown is a markup language for text editing. Text written with R Markdown can be easily converted to different file formats. The nice thing about this is that you can specify formatting when writing (no more word processing difficulties).

R Markdown is a dialect of Markdown with the addition that R Markdown-files (.Rmd) can contain R code that is evaluated in the creation of the report.

To get started with R Markdown, do this short tutorial. After doing it you will be able to format text in R Markdown files.

It is handy to have a cheat sheet close by when writing, here is an example1.

If you need more details, the book R Markdown The Definitive Guide is available online.

How it works

A R markdown report is basically a text document containing plain text and code. When you compile your report, the code will be evaluated and figures, calculations and so on will be performed per your specifications. The resulting file will be an html, docx or pdf file. You can choose if you would like to display your code or not but your code is always available in your source document. R Markdown is very versatile, you can make word documents, blog posts, websites and pdf documents2.

When in R Studio, you can start a new document using File > New File > R Markdown…. This will launch a file in your script window looking something like this:

---
title: "Untitled"
author: "Daniel Hammarström"
date: "2020 05 09"
output: html_document
---



## R Markdown

This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see <http://rmarkdown.rstudio.com>.

This is not an empty document and you have to remove the pre-written instructions. These instructions are quite handy though.

Basically, in code chunks you write R code, this code will be evaluated and the output will be displayed in the file you create. Between code chunks you can write markdown text. This will be displayed as ordinary text in your created document. The plain text sections can also contain code.

A code chunk is created using

```{r, eval=TRUE}
1 + 1
```

This code chunk calculates 1+1, when you compile the document, the result of this calculation will be shown below the code chunk. The same computation can be made “inline.” An inline code chunk is created using `r 1+1`, here only the result of this computation will be shown in your text.

When you compile the doucument it is called “knitting,” R uses a package called knitr made to compile R Markdown files. In the upper part of the source window, there is a button kalled Knit. When you press it, RStudio will aske you to save the Rmd file and an output file will be created.

Microsoft Word intergration

Sometimes it is usefull to “knit” to a word file. For example when you want to share a report with fellow students who are not familiar with R. R Markdown can be used as a source for word documents (.docx).

To create a word document from your Rmd-file you need a working installation of Microsoft Word. Settings for the output is specified in the YAML metadata field in the Rmd-file. This is the first section of a Rmd file, and when you want it to create a word file you specify it like this:

---
title: "A title"
author: Daniel Hammarström
date: 2020-09-05
output: word_document
---

The output: word_document tells R to create a word file. If you are not happy with the style of the word document (e.g. size and font of text) you can tell R to use a template file. Save a word file that you have knitted as reference.docx and use specify in the YAML field that you will use thiss as reference.

---
title: "A title"
author: Daniel Hammarström
date: 2020-09-05
output: 
        word_document:
                reference_docx: reference.docx
---

Edit styles (Stiler in Norwegian) used in the reference file (right click on the style and edit). For example, editing the “Title” style (Tittel in Norwegian) will change the main titel of the document. After you have edited the document, save it.

When you knit the document again, your updated styles will be used your word document.

Here you can read more about using R Markdown together with word. If you do not have word installed, you can also use Open Office. Read more about it here.

Adding references

References/citations can be added to the report using the bibliography option in the YAML field. Citations needs to be listed in a file, multiple formats are avaliable. A convenient format is bibtex. When using this format, create a text file with the ending .bib, for example, bibliography.bib.

The bibliography.bib-file needs to be activated in the YAML-field. Do it by adding this information:

---
title: "A title"
author: Daniel Hammarström
date: 2020-09-05
output: 
        word_document:
                reference_docx: reference.docx
bibliography: bibliography.bib
---

Add citations to the file in bibtex-format. Here is an example:

@Article{refID1,
   Author="Ellefsen, S.  and Hammarstrom, D.  and Strand, T. A.  and Zacharoff, E.  and Whist, J. E.  and Rauk, I.  and Nygaard, H.  and Vegge, G.  and Hanestadhaugen, M.  and Wernbom, M.  and Cumming, K. T.  and Rønning, R.  and Raastad, T.  and Rønnestad, B. R. ",
   Title="{Blood flow-restricted strength training displays high functional and biological efficacy in women: a within-subject comparison with high-load strength training}",
   Journal="Am. J. Physiol. Regul. Integr. Comp. Physiol.",
   Year="2015",
   Volume="309",
   Number="7",
   Pages="R767--779",
   Month="Oct"}

The part that says refID1 can be edited to something appropriate. This is a reference identification, you use it to get the citation into the text. When citing you do it in the form

Blood flow-restricted training leads to similar adaptations as traditional training [@refID1].

This will appear in text as:

Blood flow-restricted training leads to similar adaptations as traditional training (Ellefsen et al. 2015).

The reference will end up in the end of the document (as on this webpage).

You can gather references in bibtex format from Oria (use the BIBTEX icon) and from PubMed using TeXMed. You can also export reference in bibtex format from citation software like Endnote or Zotero. Make sure you check all references when entering them, especially MedTex gives some problems with “scandinavian” letters (å æ ä ø ö).

A note on folder structures

When you build an analysis in a R markdown file. R will use the folder that the file is in as the root directory. This directory (or folder) is the top directory in a file system. This means that R will look for your bibtex file bibliography.bib in the same directory as where you have the Rmd-file.

Think of this folder as ./ (confusing, I know! But bare with me!)

Any subfolders to the root directory can be called things like

./data/ (a folder where you keep data files),

./figures/ (a folder where you output figures from analyses).

The R markdown file, being in the root directory will have the “address” ./my_rmarkdown_file.Rmd

This has several advantages, as long as you stick to one rule: When doing an analysis, always use relative paths (“addresses” to files and folders). Never reference a folder or file by their absolute path. The absolute path for the file I’m writing in now is C:/Users/706194/Dropbox/Undervisning%20och%20kurser%20HIL/IDR3002%20Course%20notes/IDR3002/markdown.Rmd. The relative path is ./markdown.Rmd.

If you want to share your analysis, all you need to do is share the folder with all content with your friend. If you use relative paths, everything will work on your friends computer. If you use absolute paths, nothing will work, unless your friends computer uses a very similar folder structure (highly unlikely).

If you decide to move your analysis somewhere else on your computer, your scripts will only work if you used relative paths.

Read more about this in Structure of an analysis.

References

Ellefsen, S., D. Hammarstrom, T. A. Strand, E. Zacharoff, J. E. Whist, I. Rauk, H. Nygaard, et al. 2015. Blood flow-restricted strength training displays high functional and biological efficacy in women: a within-subject comparison with high-load strength training.” Am. J. Physiol. Regul. Integr. Comp. Physiol. 309 (7): R767–779.
Peng, R. D. 2011. “Reproducible Research in Computational Science.” Journal Article. Science 334 (6060): 1226–27. https://doi.org/10.1126/science.1213847.
Peng, R. D., F. Dominici, and S. L. Zeger. 2006. “Reproducible Epidemiologic Research.” Journal Article. Am J Epidemiol 163 (9): 783–89. https://doi.org/10.1093/aje/kwj093.

  1. Cheat sheets are available in R Studio: Help > Cheatsheets↩︎

  2. Make sure to look through the installation instructions to get pdf options working↩︎