10  Data

In this tutorial, our primary purpose for creating the package is to efficiently distribute data. This is not the main purpose of most R packages. Instead, they mainly distribute R functions. Both R functions, and data can be conveniently documented in an R package making data and description of variables available for the user of the package. In this section we will add a data set to the package and document it.

10.1 Where is data stored?

When we have prepared data, it will be stored as an .rda (R data) file in folder called data/. However, the data stored in data/ is the ready-for-use data, possibly created from raw data. We will keep the scripts used to generate the clean data from raw data together with the data in data-raw/. This should not be used in building the package, so we can update the Rbuildignore file:

.Rbuildignore
^datatemplate\.Rproj$
^\.Rproj\.user$
^LICENSE\.md$

data-raw/

10.2 Cleaning data and adding it to the package

For the purpose of this tutorial we will create a data set called mytemplatedata. An R script with the same name will be added to the data-raw/ folder containing all steps needed to clean up the data set.

The data in this example consists of three files of data. This is, in my experience, a common scenario. A machine has produced three similar files from three experiments. We want to combine the data from the three files into a common data set. This might be an ongoing series of experiment so we expect more data. Saving the script will make it east to update the clean data set.

data-raw/mytemplatedata.R
# Purpose: import and clean data from a series of experiments.

# Load packages
library(readxl)

# A for loop for importing files ###########

# All files from experiments
files <- list.files("data-raw/experiments/")

# A list to store data 
combined_data <- list()

# For-loop
for(i in seq_along(files)) {
      
      combined_data[[i]] <- read_excel(
        paste0("data-raw/experiments/", files[i]) # Read file in iteration i
            ) |>
            dplyr::mutate(id = paste0(i, ":", id)) # Add file info to id
            
}

# Combine files
mytemplatedata <- dplyr::bind_rows(combined_data)


# Save data 
usethis::use_data(mytemplatedata, overwrite = TRUE)

The function usethis::use_data() stores the data as an .rda file in the data/ folder. It also gives us a hint that we next need to document the data.

10.3 Documenting the data

Confusingly, data sets are documented in an R file called data.R, located in the data folder. For our data set mytemplatedata we can add the following:

data/data.R
#' My Template Data for the Template Data Package
#'
#' Data from a series of experiements.
#'
#' @format ## `mytemplatedata`
#' A data frame with 12 rows and 3 columns:
#' \describe{
#'   \item{id}{Identification of the experimental unit, experiment:unit}
#'   \item{group}{experimental group: a, b or c}
#'   \item{value}{The value from the experimental read out}
#' }
#'  
"mytemplatedata"

That’s it! We have successfully included data in our package and documented it. Remember to update any documentation before