SAVOR: Part III

Data manipulation with dplyr and %>%

Boris Hejblum

June 10, 2025

dplyr

dplyr improves on capabilities already in :

  • re-define data “wrangling” in
  • simpler, more intuitive syntax
  • faster, better performance for large data tables

dplyr practical

Open SAVOR_practical3.html and complete exercises 1 to 3

Naming things is (still) hard

A classical example

dat <- group_by(mydata, group)
dat2 <- summarise(dat,
  mean_var = mean(var, na.rm = TRUE))

Forward pipe operator in : %>%

magrittr by Stefan Milton Bache magrittr by Stefan Milton Bache

Less things to name

A classical example

dat <- group_by(mydata, group)
dat2 <- summarise(dat,
  mean_var = mean(var, na.rm = TRUE))

Forward pipe operator

data_summarized <- mydata %>% 
  group_by(group) %>% 
  summarise(mean_var = mean(var, na.rm = TRUE))

When not to use %>%

Piping is adapted for short sequence functions calls

  • avoid pipes that are too long ⇒ create intermediate results (with good names !)
  • the pipe does not deal well with multiple inputs or multiple outputs

Note

now provides its own pipe: |>. It is very similar to %>% and in most cases can be used instead. It has the advantage of reducing code dependencies. However, dplyr directly imports the magrittr pipe operator %>% and most documentation online will be using this one.

More ressources

dplyr & %>% practical

👉 Your turn !

Open SAVOR_practical3.html and complete exercises 4 to 10