June 6th, 2024

Modern R for Data Science

R started in 1993:

`r 2024 - 1993 =` 31 years ago

A lot has happened !

  • base vs tidyverse
  • R GUI vs RStudio

Data Science

Data Science is an emerging field at the crossing of Statistics, Computer science & Data analysis

Course objectives

  • be able to successfully import and transform data in (%>% & dplyr)
  • be able to choose and implement suitable and beautiful data visualizations (ggplot2)
  • be able to have a reproducible workflow through dynamic reporting
  • understand the difference and commonalities between:
    • software development
    • data analysis

Course organization

  1. (brief) recap on basics
  2. Dynamic reproducible reporting with Rmarkdown
  3. Data manipulation with dplyr
  4. Data visualizationwith ggplot2

Each time:

  • some key theoretical concepts
  • practicals exercise to develop your abilities and your autonomy

General advices

  • Google (or any other web search engine) & ChatGPT are your friends !

  • DRY vs WET:

    • Don’t Repeat Yourself !
    • or Write Everything Twice (you have time to spare)

=> use function()

R brush-up

  • Rstudio: use up-to-date modern tools

  • use Rstudio projects

=> live demo

  • loop and functions (DRY)

Brush up practical

open SAVOR_practical1.html and follow along…

tidyverse

Tidy + Universe

tidyverse: a collection of tidy R packages

tidy data

  1. each column represent a different variable
  2. each row represent one observation
  3. different observation types are stored in different tables (i.e. data.frame)

=> tidyverse: a collection of packages for working with/analyzing tidy data

Other ressources :