Introduction to Data Science with R

Installing software

Before we go any further, you should have working installations of the following software:

Later in the course we will use git and GitHub for version control and collaborative work.

You may also want to install a interface to git such as GitHub desktop

How to organize your files - RStudio projects

  • A big issue in doing science using computers is how to organize your files.
  • We will start by using RStudio projects. A project will help you keep track of files in one place.
  • What is a project? - A single report (source files, data, figures, etc) - A book or collection of reports (source files, figures, data, etc.) - A website/blog/course notes (…)
Create a project
  • Find the project menu in RStudio (upper right corner)
  • Select New Project, New Directory, New Project
  • Select a suitable name!

Naming projects

  • A project should be contain everything you need for a certain task (like writing a book, project report etc.)
  • The name of the project should reflect this. Files inside the project can have more general names (like figure-1.R, report.qmd).
  • “The hardest thing in Data Science is naming things” (is not the exact quote, but close enough)

Basic R in a script

  • Start up a new R script (File>New File>R Script)

  • A R script is a text file with a specific extension .R

  • A R script can be written as a program for R to evaluate.

  • We will use the R script to talk about basic R

  • Objects and assignments

  • Data types and vectors

  • Data frames and lists

  • Functions and packages

Combine code and text in quarto files

Markdown

  • Markdown is the basic syntax for editing text in quarto (qmd) files.

  • The syntax let’s you format the text in a plain text editor.

      - Headings
      - Links & Images
      - Lists
      - Footnotes
      - Tables
      - Equations
      - Page Breaks
      - Callout Blocks

Code chunks

    - `echo`
    - `warning`
    - `message`