.Rbuildignore
^datatemplate\.Rproj$
^\.Rproj\.user$
^LICENSE\.md$
data-raw/In this tutorial, our primary purpose for creating the package is to efficiently distribute data. This is not the main purpose of most R packages. Instead, they mainly distribute R functions. Both R functions, and data can be conveniently documented in an R package making data and description of variables available for the user of the package. In this section we will add a data set to the package and document it.
When we have prepared data, it will be stored as an .rda (R data) file in folder called data/. However, the data stored in data/ is the ready-for-use data, possibly created from raw data. We will keep the scripts used to generate the clean data from raw data together with the data in data-raw/. This should not be used in building the package, so we can update the Rbuildignore file:
.Rbuildignore
^datatemplate\.Rproj$
^\.Rproj\.user$
^LICENSE\.md$
data-raw/For the purpose of this tutorial we will create a data set called mytemplatedata. An R script with the same name will be added to the data-raw/ folder containing all steps needed to clean up the data set.
The data in this example consists of three files of data. This is, in my experience, a common scenario. A machine has produced three similar files from three experiments. We want to combine the data from the three files into a common data set. This might be an ongoing series of experiment so we expect more data. Saving the script will make it east to update the clean data set.
data-raw/mytemplatedata.R
# Purpose: import and clean data from a series of experiments.
# Load packages
library(readxl)
# A for loop for importing files ###########
# All files from experiments
files <- list.files("data-raw/experiments/")
# A list to store data
combined_data <- list()
# For-loop
for(i in seq_along(files)) {
combined_data[[i]] <- read_excel(
paste0("data-raw/experiments/", files[i]) # Read file in iteration i
) |>
dplyr::mutate(id = paste0(i, ":", id)) # Add file info to id
}
# Combine files
mytemplatedata <- dplyr::bind_rows(combined_data)
# Save data
usethis::use_data(mytemplatedata, overwrite = TRUE)The function usethis::use_data() stores the data as an .rda file in the data/ folder. It also gives us a hint that we next need to document the data.
Confusingly, data sets are documented in an R file called data.R, located in the data folder. For our data set mytemplatedata we can add the following:
data/data.R
#' My Template Data for the Template Data Package
#'
#' Data from a series of experiements.
#'
#' @format ## `mytemplatedata`
#' A data frame with 12 rows and 3 columns:
#' \describe{
#' \item{id}{Identification of the experimental unit, experiment:unit}
#' \item{group}{experimental group: a, b or c}
#' \item{value}{The value from the experimental read out}
#' }
#'
"mytemplatedata"That’s it! We have successfully included data in our package and documented it. Remember to update any documentation before