8 Version control and collaboration
In the previous chapter we underlined the importance of the project as a way of keeping data, code (and text) in an organized manner. The project concept in RStudio can easily be extended to include version control. Version control also makes collaboration easier. Most often, collaborating on writing a report, assignment or paper is hard. You send a file, get another one in return. Some files are on dropbox, some are lost. What if we had a system for collaboration that made it easy to follow the progress of a project. Connecting RStudio projects to git and GitHub makes this possible.
Before you go any further you might want to spend about 25 minutes on a couple of videos explaining the concepts of version control and git. Find the videos here
8.1 Why version control?
Github is a platform for collaborative coding. As we have noted before, collaboration concerns both others and you, in the future! This means that having a formal system for keeping track of your projects is a good thing.
Github also provides version control. Version control can help you track changes in your entire analysis or writing project. This is helpful when multiple files make up a complex project, including e.g. scripts, data and manuscript files. It is also helpful when multiple collaborators work together (e.g. writing a report). You will, by using version control, avoid overwriting other peoples work. With multiple changes made to the project, merging will create the latest up-to-date version. When you change a file in your analysis you will be required to describe the changes you have made. Git creates a record of your changes. This also means that we have “backups” of previous versions.
8.2 Three ways of hooking up to GitHub
8.2.1 Create a new repository on GitHub and clone it
Access your personal GitHub account and click New under repositories. This is equivalent to going to www.github.com/new. GitHub will ask for a repository name, a description and whether you want the repository to be public or not. You can also chose to add a Readme-file.
Names and descriptions are important, a better name and description makes it easier for you and others to find and make use of your repository. Even when making repositories for school assignments, a good name will likely make it more re-usable in the future. The same is true for the readme file. So, name the repository with a descriptive name, write a short description with the purpose of the repository and add a readme-file to the repository.
A public repository is open for everyone, private repositories have restricted access.
Once the repository is created you can clone it. This means that you will copy the content to your local machine (PC/Mac). In RStudio this is most conveniently done by starting a new RStudio project and selecting Version Control in the project menu. You will be asked to copy the address shown under “Code” on GitHub.
8.2.2 Create an online repository from a local folder
Let’s say that we have a local folder that is a RStudio Project, without version control and we want to create a online repository together with version control. We can use GitHub desktop to accomplish this or GitHub CLI.
Using the terminal and GitHub desktop:
- The first step is to make the local folder a git repository, in RStudio with the project running go to a terminal and type
git init
. The terminal will let you know that you have initialized a git repository. - Start up GitHub desktop, under File choose Add local repository and find the folder on your computer where you have your RStudio project. Once open in GitHub desktop you will see all changes and additions of new files.
- Commit your changes by writing a first commit message, and possibly a longer description of the commit.
- Click “Publish repository”, you will be asked to edit the name and description of the repository and choose whether to have the repository private or not (see above for recommendations).
- Go to GitHub.com and check if the repository is published.
Using the terminal and GitHub terminal client (CLI):
- Be sure to be in your RStudio project and use the terminal in RStudio to initiate a git repository, type
git init
in the terminal. - Also in the terminal type
gh repo create
, this will guide you through the same process as with GitHub desktop but all selections are done in the terminal.
8.2.3 Create an online repository from a local git repository
If you have already initialized a RStudio project as a git repository you can follow the steps above without the git init
command. Using git init
on an already initialized git repository will reinitialize it. This will not remove git history of the repository (see here for documentation).
8.3 Git commands and workflows
8.3.1 Add, commit and push
The day to day workflow when working on a git project involves making changes to your files and saving those changes locally, and in the version control system. By the end of the day you might also want to make sure all changes are synchronized with the online repository.
This workflow includes the git commands add
, commit
and push
.
Using the terminal git add <filename>
or git add -A
adds a specific file or all changes to a list of changes to be “committed” into version history. The equivalent operation in GitHub desktop is checking all boxes under changes. This is done automatically and you have to uncheck files or changes that you do not want to commit to history.
In the terminal we can commit changes to the git history using the command git commit -m "a commit description message"
the additional part -m "a commit...
is the required commit message. It is good to be informative if you need to find a specific change to a file. In GitHub desktop this is easily done by writing a commit message under summary in the bottom left corner once you have changes in your repository.
The last step, git push
, means that you are uploading all changes to the online repository. This will update the repository on GitHub, your version history is now up to date in your online repository. This also means that you have an online backup of your work.
In GitHub desktop we can examine the commit history of a project by looking in the History tab. Using the web interface at www.github.com we can get an overview of all commits by clicking Activity or commits in the repository view. Using the command line we can look at the the commit history by using git log
. This will list all commits and you can scroll trough them by pressing enter. To exit this list press q
.
Further descriptions of a certain state of the repository can be added using tags. Using a tag we can make a note of a certain state of the repository, for example when an assignment is ready to exam or when a journal article is ready for submission. In GitHub desktop we might want to put a tag on a specific commit. We can do this by right-clicking on a commit in the history followed by Create tag. Using the command line, tags are added with git tag
, see the git documentation for details.
8.3.2 Collaboration, pull
, clone
and fork
Collaboration is most often done with yourself in the future. The git pull
command (using the terminal) downloads all changes to your working directory. You want to do this when you have changes in the online repository that is not synchronized with the local repository. This might be the case if you have made changes to your repository on GitHub, like added a readme file. Or if you are collaborating with someone who have made changes to the repository. I work on multiple computers, sometimes on the same repository, the online repository is where a keep the most up to date version of my project.
Using GitHub desktop, we can click Fetch origin to get the latest changes from the online repository. GitHub desktop will suggest to pull these changes to the working directory after you have done this operation.
We have already covered git clone
, this essentially means downloading an online repository to your local machine. This is most easily done while initializing a new RStudio project.
A fork is a copy of someones online repository that is created as a new repository under your user. You now have access to this repository and can make changes. The repository can have its own life or be used to create changes that later are suggested as changes to the “parent repository”.
In this course you can fork a template for the portfolio exam. This is an example where your fork will have its own life.
If a fork is used to suggest changes this is done through a pull request. Using the web client (GitHub), we can click create pull request when inside a forked repository. This will take you a few steps where you are expected to describe changes to the repository. The original author will get a notification to review the pull request and can chose to incorporate the changes into the parent repository.
8.3.3 Branches
Much like a fork, we can create copies of our own repository. These are called branches. A branch might contain changes that we want to try out before we make it the “official” version of our repository. These changes can include experiments that might mess up things or break code.
Using GitHub desktop we can create a new branch by clicking Current branch in the upper left and then Create branch. In GitHub desktop it is easy to switch between branches.
Using the command line we can create a branch by typing git branch <new-branch-name>
where <new-branch-name>
is a name of the branch that you choose. Using the command git checkout <new-branch-name>
we switch to the new branch from the current branch, which is often called master
. We can use git checkout master
to get back to the original branch. If you switch between branches while working in both your work can be saved using a commit.
8.3.4 Conflicts
A conflict emerges when two versions of a files cannot be merged into one. The web client will check if, for example, two branches will be possible to merge. If you try to merge two versions of a file that have different changes made to the same line you will get a message from GitHub desktop or on the command line tool. A conflict needs to be resolved manually.
Find the file that is affected by the conflict. This is possible to do by using git status
on the command line. Next, find the lines that are affected by the conflict and edit them to what should be correct content. A conflict can be seen in the file as
<<<<<<< HEAD:filename.file
content
=======
other content
>>>>> branch:filename.file
In the above example, “content” and “other content” is the content of the two conflicting versions of the file (filename.file
). You have to pick one or write something else instead. You also need to remove markers (<<<
, ====
and >>>
). When you are satisfied with the changes commit the changes. This will resolve the conflict.
8.4 Additional great things about GitHub
GitHub has great capabilities for managing projects. You can for example:
- Post issues that are suggestions or questions regarding a repository. Issues can be categorized with labels and assigned.
- You can create to-do lists in the Projects tab (in the web interface). This could be a nice way of sharing and tracking the progress of a project.
- You can build a wiki. This is simply a collection of pages that can be used to document the repository or a project (in a wider sense) that you are working on.
- All of the above can be private and public. You can choose whom have access to your repository. This makes it easy to work on a project even if you need to keep things a secret.
8.5 When will this knowledge be handy?
When writing your master thesis, it will be extremely easy to share your code with your supervisor or other students, whit whom you collaborate. You can just invite someone to make changes in your repository and then download them. As noted several times before, your most frequent collaborator is you. Using git makes it easy to keep track of changes in your project and it keeps your most frequent collaborator from messing up your work.
Version control workflows are part of almost all technology companies, and will most certainly be part of many more types of businesses, institutions and workplaces in the future as we need to collaborate on large, complex projects. Knowing about these systems is in that sense quite handy!
8.6 Resources
There are of course more functions in git, here are some resources for deeper understanding.