Writing scientific documents with R, RStudio and GitHub

Collaborating on writing a paper or a report is hard. You send a file, get another one in return. Some files are on dropbox, some are lost. What if we had a system for collaboration that made it easy to follow the progress of of a project. Connecting R projects to git and GitHub makes this possible.

Why version control

Github is a platform for collaborative coding. As we have noted before, collaboration concerns both others and your future you! This means that having a formal system for keeping track of your projects is a good thing.

Github also provides version control. Version control can help you track changes in your entire analysis project. This is helpful when multiple files make up a complex project, including e.g. scripts, data and manuscript files. It is also helpful when multiple collaborators work together (e.g. writing a report). You will, by using version control, avoid overwriting other peoples work. With multiple changes made to the project, merging will create the latest up-to-date version. When you change a file in your analysis you will be required to describe the changes you have made, Git in turn, creates a record of your change. This also means that we have “backups” of previous versions.

A simple/complicated start

I use version control when working in R by setting up repositories on www.github.com and then cloning to my local computer. A repository is a folder containing all the files in a specific project. Using projects in RStudio makes it easy to sync local projects with the online version control.

When cloning a project, you download all files to your personal computer, you are then free to work on the project without interference from others. When you have created a new file you add the file to version control and commits the changes. This means that your change has got a unique identity in the history of your project. You may now push changes to the online repository which is the online version of your work.

When collaborating with others, pull requests makes it easy for others to make changes to your repository that you have to accept or decline. This is somewhat equivalent to suggesting a change in a word document with track changes activated.

Step by step instructions on setting up a version controlled R project.

Step 1. Create a free github account:

Go to www.github.com, press sign up and follow the instructions.

Step 2. Download git.

Git is the software responsible for actually keeping track of all your files. We need to have this installed locally to make this work. Go to https://git-scm.com/downloads and download the version compatible with your system (windows or mac). Install git by following the instructions.

Step 3. Connect git to RStudio

In RStudio, click Tools -> Global options -> Git/SVN. Click to “Enable version control interface …”. Now you have to point RStudio to your git executable. On my computer after following the standard installation of git, the executable is found in: C:/Program Files/Git/bin/git.exe. Browse to find your copy. RStudio has to be restarted to make the changes work.

Step 4. Create a new online repository

Go to www.github.com and sign in using your user name and password created in step 1. Click “New” to create a new repository. Give the repository a name. If you want you can add a description of the repository, this can be something like “statistics report 5 in IDR4000”. You can decide if you want the repository to be public or private. To share it with others without giving special permissions, it needs to be public.

Under “Initialize this repository with” you can select if you want to start your repository with a README file. This is good if you want to describe what the repository contains.

After selecting what you want, press Create repository. Copy the link on the resulting page under quick setup.

Step 5. Create a new project in RStudio

We will now connect the online repository to a RStudio project. Click the projects button in the top right corner and select new project, select a version controlled project, select Git and paste the link from github (step 4) into the field for repository URL.

Step 6. Make changes to your project

You now have an “empty” project. You can add for example a R Markdown file by writing one and saving it in your local repository. In the tab containing the Environment, History etc. you should have a tab that says Git. Here you can commit changes and push them.

Let us say that you have created a file called report.Rmd. In this file you have written an analysis. You now want to add these changes to version control. You can do this by clicking the file under Staged and then commit. You will need to write a commit message, a short description of what you have done and the press commit. The changes are now saved and version controlled. To upload these changes to your online repository, press push.

When you have a lot of changes RStudio is a bit slow if you use the interface. An alternative is to use the terminal.

A simple setup is to use the git bash terminal. Under Tools, go to Terminal and terminal options. Select Git Bash in the list “New terminals open with”. Where you have the console, there is also a tab called Terminal, if not, start a new one with Tools -> Terminal -> New Terminal.

The simplest commands are as follow:

You have made changes and want to upload them, in the terminal, write:

git add -A

This adds all changes to your next commit, next, write:

git commit -m "A commit message"

The “A commit message” is the description of the changes you have made. And finally write:

git push

This will push your changes to your repository.

Let’s say that someone else have made changes, or you have made changes online to your repository. You then want to start by downloading the latest changes to your computer. If you are using the RStudio interface, under Git press pull. If you are using the terminal, write:

git pull

When is this good to know

When writing your master thesis, it will be extremely easy to share your code with your supervisor or other students whom you collaborate with. You can just invite someone to make changes in your repository and the download them.

Resources

There are of course more functions, here are some resources for deeper understanding.