Lecture 2: Version control and collaboration with Git!

Daniel Hammarström

Why version control?

  • Reproducibility and transparency Better science
  • Collaboration and robustness Better science
  • Formal structures and workflows Better science

Introduction to git

  • git is a version control software that is installed locally
  • It tracks changes to files in a specific repository (folder)
  • A version history are stored in a hidden folder .git
  • git is really good at trackning plain text files, but can also track other files…

Introduction to GitHub

  • GitHub is a collaborative platform that allows you to host version controlled repositories online
  • GitHub makes it possible to share code, collaborate on developing code, host websites, and more

A list of tools for version control


  • GitHub CLI Command line interface to GitHub
  • GitHub desktop Graphical user interface to GitHub/git

Contributing to a central repository by pull requests

Contributing to a central repository by pull requests

  • A pull request is “all or nothing” → smaller changes are easier to pull into the central repository
  • The owner of the central repository can incorporate and work on a large pull request in a separate “branch”

Contributing to a repository by branching

  • A branch can contain edits to the project that we want to do without risking breaking the main branch.
  • Changes in a branch is merged with the main branch using pull requests.

Contributing to a repository directly by “pull” and “push”

  • You could collaborate on a repository by directly pulling and push from the main branch…
  • This may be risky as parallel changes to the same files creates merge conflicts

Merge and conflicts

Merge conflicts

Local repository:
File-1.txt

## This is an example

It has some content that needs to be version controlled

Remote repository:
File-1.txt

## This is an example

It has some content that needs to be version controlled. We are adding some information in the remote repository

Local repository:
File-1.txt

## This is an example

It has some content that needs to be version controlled. Adding local changes.

Pull from remote:
File-1.txt

## This is an example

<<<<<<< HEAD
It has some content that needs to be version controlled. Adding local changes.
=======
It has some content that needs to be version controlled. We are adding some information in the remote repository.
>>>>>>> aac1016966305b6d8dd91aea5f8194fdfb929171

Merge conflicts

Local repository:
File-1.txt

## This is an example

<<<<<<< HEAD
It has some content that needs to be version controlled. Adding local changes.
=======
It has some content that needs to be version controlled. We are adding some information in the remote repository.
>>>>>>> aac1016966305b6d8dd91aea5f8194fdfb929171

Pull from remote:
File-1.txt



<<<<<<< HEAD
This is the state of the file in your copy
=======
This is what you get from the remote
>>>>>>> aac1016966305b6d8dd91aea5f8194fdfb929171

Keep a list of files that you do not want to track with .gitignore


# History files
.Rhistory
.Rapp.history

# Session Data files
.RData

# User-specific files
.Ruserdata

# produced output can be rebuilt from source
*.html
*.pdf

  • The .gitignore file let’s you decide what files to track in your history.
  • By adding e.g. *.pdf and .html to .gitignore we avoid having merge conflicts in files that can be built from source.

Git and GitHub: Best practices

  • Do not push into the main/master directly, use pull requests
  • Do not store sensitive information on github
  • Use .gitignore to avoid pushing/pulling files that do not need version control.
  • Do not push temporary files or files that are built from source. Add e.g. pdf-files to your .gitignore file.
  • Write good commit messages, they should be descriptive.
  • Commit often and work in small increments.
  • Do not use github to store large files.
  • Always “test” before commiting.