June 17th, 2019

Modern R for Data Science

R started in 1993:

`r 2019 - 1993 =` 26 years ago

A lot has happened !

  • base R vs “tidyverse”
  • R GUI vs RStudio

Data Science

Data Science is an emerging field at the crossing of Statistics, Computer science & Data analysis

source: R for Data Science

Course objectives

  • be able to successfully import and transform data in R (%>% & dplyr)
  • be able to choose and implement suitable and beautiful data visualizations (ggplot2)
  • be able to have a reproducible workflow through dynamic reporting
  • understand the difference and commonalities between:
    • software development
    • data analysis

Course organization

  1. (brief) recap on R basics
  2. Dynamic reproducible reporting with Rmarkdown
  3. Data manipulation with dplyr
  4. Data visualizationwith ggplot2

Each time:

  • some key theoretical concepts
  • practicals exercise to develop your abilities and your autonomy

General advices

  • Google (or any other web search engine) is your friend !

  • DRY vs WET:

    • Don’t Repeat Yourself !
    • or Write Everything Twice (you have time to spare)

=> use function()

R brush-up

  • Rstudio: use up-to-date modern tools

  • use Rstudio projects

=> live demo

  • loop and functions (DRY)

Brush up practical

oOen BADAS1_practical.html and follow along…

tidyverse

Tidy + Universe

Hadley Whickam <3

tidyverse: a collection of tidy R packages

tidy data

  1. each column represent a different variable
  2. each row represent one observation
  3. different observation types are stored in different tables (i.e. data.frame)

=> tidyverse: a collection of packages for working with/analyzing tidy data

Other ressources :