install.packages("dplyr")3 Packages are where functions live
The R ecosystem consists of packages. These are collections of functions organized in a systematic manner. Functions are created to perform a specialized task, and packages often have many function used to do e.g. analyses of a specific kind of data, or more general task such as making figures or handle data.
Later in this course we will use many different packages, for example dplyr, tidyr and ggplot2. dplyr and tidyr are packages used to transform and clean data. ggplot2 is used for making figures.
3.1 Where do packages live?
To install a package, you use the install.packages() function. You only need to do this once on your computer (unless you re-install R). You can write the following code in your console to install dplyr.
Alternatively, in RStudio, click “Packages” and “Install” and search for the package you want to install. To use a package, you have to load it into your environment. Use the library() function to load a package.
library("dplyr")The package dplyr has now loaded into your environment, this means that all the functions that are part of the dplyr package are available for R.
But where do packages comes from? When using install.packages, R will look for a package with the your name in a repository that is defined in your options.1 A repository is in this case a database from which it is possible to download and install R packages. The repository is likely a mirror of a CRAN repository.
1 The behavior of R is in many cases dictated by options, one such option is the preferred repository for downloading packages, see here for details
3.2 Alternatives to CRAN
3.2.1 Bioconductor
CRAN (Comprehensive R Archive Network) is the go-to database network for R packages, but there are alternatives. Bioconductor collects R packages used in bioinformatics and related fields, and similarly to CRAN we can download packages from Bioconductor using convenient functions. Confusingly, a package named BiocManager is available at CRAN. Using BiocManager::install()2 we can install packages from Bioconductor.
2 Notice the double colon here (::). This means that we are telling R to look for a function inside a package, specifically, look for the function called install in the BiocManager package. When using :: we do not need to load the package to access the function.
3.2.2 GitHub
GitHub is a hosts many R packages, some of which will never be submitted to CRAN for different reasons. Packages hosted on GitHub can also be downloaded and installed using a function provided by the remotes package.
First we need to install remotes
install.packages("remotes")Next we can use remotes to install a package we will use later, a package that contains datasets, the exscidata package3
3 See here for the GitHub repository where the package live.
We load the remotes package and we can directly download, and install the package.
library(remotes)
install_github("dhammarstrom/exscidata")3.2.3 Other alternatives
In addition to CRAN, BioConductor and GitHub, R packages are hosted on rOpenSci which is a platform for hosting peer-reviewed R packages. These can also be hosted on CRAN or BioConductor, and at the same time be hosted on GitHub whit a developmental version. Similarly to GitHub, Gitlab and Bitbucket provides possibilities for hosting packages which can be installed using functions in the remotes package.