<- function(a, b) {
sum_two_values
<- a + b
result
return(result)
}
sum_two_values(3, 6)
[1] 9
To reduce the number of repetitive tasks performed manually or through copy-and-paste coding, R provides functions and frameworks for iterative operations. The programmer (you), could benefit greatly from building your own functions as these will reduce the risk of making mistakes, make the code more readable and easier to change (Wickham and Grolemund 2017, chap. 19).
A function in R is a defined set of operations, put together and often stored as an object to perform a specific task. In R, the user may define his/her own functions. Defining a function usually involves storing the function as an object in your environment. When you have gathered multiple functions relating to common tasks these can be organized as a package.
Let’s say we want to create a function that returns the sum of two values. The function sum_two_values
that is defined below, contains the basics of a typical function. In the function()
call we define what arguments the function will accept. In our case, a
and b
represents the two numbers we want to add together. The body of the function is where we define the operations that we want the function to perform. We add a
and b
together and assign their sum to an object called result
. Finally, we pass the result of the function to return(result)
. Using return makes it explicit what part of the results you want to “get from the function”.
<- function(a, b) {
sum_two_values
<- a + b
result
return(result)
}
sum_two_values(3, 6)
[1] 9
The name of a function is defined by saving the function in an object. When the function is defined in a script or RMarkdown file it is available in the R environment. The difference from packages is that functions defined as part of packages are available when you load the package using library()
.
Lactate threshold values can be calculated from the exercise-intensity and blood lactate value relationship gathered from a graded exercise test. In the simple case, the exercise intensity at a fixed blood lactate concentration can be used to evaluate training progressions in e.g. athletes. To find the exercise intensity at a fixed blood lactate concentration, a polynomial regression model can be used to predict lactate values at different intensities and the find the exercise intensity value closest to our desiered lactate value.
The above description can be broken down into these steps:
Using the cyclingstudy
data we can perform these steps. We will do so using the pre
time-point in participant 10.
library(tidyverse); library(exscidata); data("cyclingstudy")
# Save a data set of lactate values from participant 10, time-point pre
<- cyclingstudy %>%
dat select(subject, group, timepoint, lac.125:lac.375) %>%
pivot_longer(names_to = "watt",
values_to = "lactate",
names_prefix = "lac.",
names_transform = list(watt = as.numeric),
cols = lac.125:lac.375) %>%
filter(subject == 10,
== "pre",
timepoint !is.na(lactate)) %>% # Remove NA values
print()
# Fit the model
<- lm(lactate ~ watt + I(watt^2) + I(watt^3), data = dat)
model
# Predict lactate values over all observed watt values
# calculate the smallest distance from the fixed lactate value
<- data.frame(watt = seq(from = min(dat$watt), to = max(dat$watt), by = 0.1))
new_data
$dist <- abs(predict(model, newdata = new_data) - 4)
new_data
# Find the smallest value of predicted - fixed lacate value
%>%
new_data filter(dist == min(dist)) # Where the dist value equals the minimum dist value
If we were to do this operation for more participants and across more time-points, we should formalize the operation into a function. In the simplest form, the prospective function would only need an argument defining the input data. We can then settle for a single fixed lactate value. We will define the function and call it lt
for lactate threshold. (This might not be the best name! (Wickham and Grolemund 2017)).
If the function returns a data frame, we can use it more efficiently later som we will make sure the result of the function is a data frame.
<- function(data) {
lt
# Fit a 3 degree polynomial model
<- lm(lactate ~ watt + I(watt^2) + I(watt^3), data = data)
m
# Store a data frame with exercise intensities
<- data.frame(watt = seq(from = min(data$watt), to = max(data$watt), by = 0.01))
new_data
# Predict using the new data, predicting lactate values at each
$pred <- predict(m, newdata = new_data)
new_data
# calculate deviation from the lactate value of interest
$watt.4mmol <- abs(new_data$pred - 4)
new_data
# Create a results data frame
<- data.frame(watt.4mmol = new_data[which.min(new_data$watt.4mmol),1])
results
# Return the data frame
return(results)
}
The body of the function contains all operations needed to get the results we are after. It finally returns a data frame containing a single column named watt.4mmol
. We can now test the function on some data.
library(tidyverse); library(exscidata); data("cyclingstudy")
Warning: package 'tidyverse' was built under R version 4.4.1
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.4 ✔ readr 2.1.5
✔ forcats 1.0.0 ✔ stringr 1.5.1
✔ ggplot2 3.5.1 ✔ tibble 3.2.1
✔ lubridate 1.9.3 ✔ tidyr 1.3.1
✔ purrr 1.0.2
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
<- cyclingstudy %>%
data select(subject, group, timepoint, lac.125:lac.375) %>%
pivot_longer(names_to = "watt",
values_to = "lactate",
names_prefix = "lac.",
names_transform = list(watt = as.numeric),
cols = lac.125:lac.375) %>%
filter(!is.na(lactate),
== 10,
subject == "pre") %>%
timepoint print()
# A tibble: 8 × 5
subject group timepoint watt lactate
<dbl> <chr> <chr> <dbl> <dbl>
1 10 INCR pre 125 0.93
2 10 INCR pre 175 0.87
3 10 INCR pre 225 0.86
4 10 INCR pre 250 0.92
5 10 INCR pre 275 1.2
6 10 INCR pre 300 1.69
7 10 INCR pre 325 2.6
8 10 INCR pre 350 4.69
lt(data)
watt.4mmol
1 342.96
Nice, since we defined the function with higher resolution, we get a exercise intensity value with two decimal points. This value represent the watt value where the predicted lactate value is the closest to 4 mmol L-1.
We still need to incorporate the function into code that dose the calculation for each individual, and time-point without the need of manually filtering values for each new data set. This can be accomplished by using the function in a pipe together with group_by
and group_modify
.
Since we are using a third degree polynomial we need to make sure we have enough data to fit such a model to each participant/time-point. We will add a variable counting the number of observations and then filter tests with less then 4 observations.
The complete pipe could look something like this.
# Extract lactate values
%>%
cyclingstudy select(subject, group, timepoint, lac.125:lac.375) %>%
pivot_longer(names_to = "watt",
values_to = "lactate",
names_prefix = "lac.",
names_transform = list(watt = as.numeric),
cols = lac.125:lac.375) %>%
# Filter missing lactate values
filter(!is.na(lactate)) %>%
# Group the data set, keeping group to have this information in the final results
group_by(timepoint, subject, group) %>%
# create a new variable counting the number of observations and filter away tests with less than 4 obs.
mutate(n = n()) %>%
filter(n >= 4) %>%
# Use grouup modify to apply the function to all participants per time-point (and group)
group_modify(~ lt(.)) %>%
print()
# A tibble: 74 × 4
# Groups: timepoint, subject, group [74]
timepoint subject group watt.4mmol
<chr> <dbl> <chr> <dbl>
1 meso1 1 INCR 293.
2 meso1 2 DECR 286.
3 meso1 3 INCR 335.
4 meso1 4 DECR 244.
5 meso1 5 DECR 332.
6 meso1 6 INCR 252.
7 meso1 7 MIX 268.
8 meso1 8 MIX 294.
9 meso1 9 MIX 300.
10 meso1 10 INCR 340.
# ℹ 64 more rows
Great sucess! We have now calculated lactate threshold values for each participant, belonging to specific time-points and groups. The group_modify
function does the iterative work of applying the function for all cases, you have save yourself a lot of code, and time!
lt
function to return two fixed lactate values.lt
function to include an extra argument specifying the level of the lactate value you are interested in.