class: center, middle, inverse, title-slide .title[ # Tools for parallelizing R code easily ] .subtitle[ ## R-Toulouse ] .author[ ###
Boris Hejblum
] .date[ ### April 9
th
, 2026 ] --- <link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/11.11.1/styles/rose-pine-dawn.min.css"> <script src="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/11.9.0/highlight.min.js"></script> <script>hljs.highlightAll();</script> <style> /* ── colour palette ─────────────────────────────────── */ :root { --accent: #9FD6D3; /* teal */ --accent2: #FF5599; /* pink */ --accent3: #BCAFC2; /* teal-pink */ --dark: #0f172a; /* slate-900 */ --mid: #334155; /* slate-700 */ --light: #f8fafc; /* slate-50 */ --code-bg: #1e2d3d; } main-container { max-width: 1400px; } /* ── inverse / title slides ─────────────────────────── */ .inverse { background: linear-gradient(140deg, var(--accent2) 15%, var(--accent3) 35%, var(--accent) 55%, white 90%); color: var(--light); link-color: var(--accent); border-top: 0px solid transparent !important; /* removes it top ribbon*/ } /* remove metropolis title slide bar */ .mline { display: none !important; } .title-slide .remark-slide-number { display: none; } .inverse .remark-slide-number { display: none; } .title-slide.remark-slide-content::after { background-image: url("logo.svg"); top: 565px; right: 7px; width: 120px; height: 120px; opacity: 0.95; } .inverse h1 { color: black; font-size: 2.4em; font-weight: 800; letter-spacing: -1px; line-height: 1.15; } .inverse h2, .inverse p { color: #cbd5e1; font-weight: 300; } /* ── body slides ─────────────────────────────────────── */ .remark-slide-content { font-size: 17px; line-height: 1.6; padding: 1.5em 3em; color: var(--mid); border-top: 6px solid var(--accent1); border-bottom: 0px solid var(--accent2); } .remark-slide-content::after { content: ""; position: absolute; top: 5px; right: 5px; width: 80px; height: 80px; background-image: url("logo.svg"); background-size: contain; background-repeat: no-repeat; opacity: 0.75; /* optional */ } .remark-slide-content h2 { border-bottom: 3px solid var(--accent); padding-bottom: .25em; margin-bottom: .8em; color: var(--dark); font-size: 1.5em; } .remark-slide-content strong { color: var(--dark); } .remark-slide-content em { color: var(--accent2); } /* ── code ────────────────────────────────────────────── */ .remark-code, .remark-inline-code { font-family: 'Fira Code', 'Cascadia Code', 'Source Code Pro', monospace; font-size: 0.82em; } .remark-code-line-highlighted { background-color: rgba(20,184,166,.18); } pre { margin: .6em 0; } code.remark-inline-code { background: #e2e8f0; color: #0f766e; border-radius: 3px; padding: 1px 5px; } /* ── exercise callout ────────────────────────────────── */ .exercise { background: #f0fdf9; border-left: 5px solid var(--accent); border-radius: 0 8px 8px 0; padding: .9em 1.4em; margin: .8em 0; } .exercise p { margin: .3em 0; } .exercise code.remark-inline-code { background: #ccfbf1; } /* ── note box ─────────────────────────────────── */ .note { background: #ff55992b; border-left: 5px solid var(--accent2); border-radius: 0 8px 8px 0; padding: .7em 1.2em; margin: .6em 0; font-size: .92em; } /* ── info / note box ─────────────────────────────────── */ .info { background: #bcafc285; border-left: 5px solid var(--accent3); border-radius: 0 8px 8px 0; padding: .7em 1.2em; margin: .6em 0; font-size: .92em; } /* ── package badge ───────────────────────────────────── */ .pkg { display: inline-block; background: var(--accent); color: white !important; font-weight: 700; font-size: 0.78em; padding: 2px 9px; border-radius: 4px; font-family: monospace; letter-spacing: .3px; } /* ── comparison table ────────────────────────────────── */ .remark-slide-content table { border-collapse: collapse; width: 100%; font-size: .9em; } .remark-slide-content th { background: var(--dark); color: var(--accent2); padding: .5em .9em; font-weight: 600; text-align: left; } .remark-slide-content td { padding: .4em .9em; border-bottom: 1px solid #e2e8f0; } .remark-slide-content tr:nth-child(even) td { background: #f8fafc; } /* ── slide number ────────────────────────────────────── */ .remark-slide-number { color: var(--accent3); font-size: 13px; } /* ── pull-left / pull-right fine-tuning ──────────────── */ .pull-left { float: left; width: 47%; } .pull-right { float: right; width: 47%; } a, a > code { color: var(--accent2); } a:hover { color: var(--accent); } </style> ## First things first: required packages Install the missing packages ``` r req_pkgs <- c("future.apply", "pbapply", "mirai", "microbenchmark") install.packages(req_pkgs[!req_pkgs %in% installed.packages()[, "Package"]]) ``` Load them ``` r library(future) library(future.apply) library(pbapply) library(mirai) library(microbenchmark) ``` --- ## Introduction to parallel execution Parallel code execution enables **faster computation** by leveraging multiple CPU cores. The *total* amount of work is not reduced — but it gets done **quicker**. Many algorithms are ***"embarrassingly parallel"***: they can be decomposed into completely independent calculations. In Statistics, it is natural to *parallelize over observations and/or parameters*. Typically, **`for`-loops** are the most common and straightforward candidates for parallelization. --- ## Parallel computation — steps **Operations required to run a parallel computation:** 1. Start `\(m\)` *worker* processes (CPUs) -- 2. Send the necessary data and functions to the *workers* -- 3. Split the task into `\(m\)` sub-tasks of similar magnitude -- 4. Wait for all *workers* to finish -- 5. Gather and combine the results from the different *workers* -- 6. Stop the *worker* processes --- ## Communication protocols Several protocols for communicating between *workers* are available, depending on the OS: | Protocol | Platform | Notes | |---|---|---| | **Fork** | Unix (Mac & Linux) | Fast — shares memory with parent process | | **PSOCK** | All (incl. Windows) | Spawns fresh R sessions; more overhead | | **NNG sockets** | All | C-level; ultra-low latency (used by `mirai`) | | **MPI** | HPC clusters | For distributed multi-node computing | > `pbapply`, `future` and `mirai` expose a unified syntax regardless of the underlying backend. --- ## Parallel computation in
Three modern packages can cover the vast majority of our use-cases, with just a *few* ***straightforward*** *and* ***simple*** *lines of code*: .pull-left[ ### <span class="pkg">parallel</span> *(base
)* Ships with R ≥ 2.14. Lower-level; used internally by the packages below ### <span class="pkg">pbapply</span> Adds a **progress bar** to any `*apply` call. The `cl` argument enables transparent parallelism — great for interactive scripts. ] .pull-right[ ### <span class="pkg">future.apply</span> Drop-in `*apply` replacements — just swap `sapply` for `future_sapply`. Declare a **plan** once; the same code runs sequentially or in parallel. ### <span class="pkg">mirai</span> Minimal, ultra-low overhead. Built on **NNG** (nanomsg next generation — C sockets). True async/non-blocking tasks (plays nice with `Shiny`) ] --- class: inverse, center, middle count: false ## <span style="color: var(--myaccent3)">`future.apply`</span> ### Parallel apply --- ## First parallel function .exercise[ 👉 **Your turn!** 1. Check available cores: `future::availableCores()` 2. Declare a parallel plan — leave **at least one core free**: `future::plan(multisession, workers = XX))` 3. Use `future.apply::future_sapply()` to compute `log()` of `\(n\)` numbers in parallel 4. Wrap into a function `mylog_parallel()` 5. Compare execution times on `1:100`: `microbenchmark(mylog_parallel(1:100), mylog(1:100), log(1:100), times = 10)` ] --- ## First parallel function — solution ``` r future::availableCores()-1 #nb available cpus n_workers <- 3 mylog_parallel <- function(x) { future_sapply(1:length(x), FUN = function(i){ log(x[i]) } ) } plan(multisession, workers = n_workers) mylog_parallel(1:100) mb <- microbenchmark(mylog_parallel(1:100), mylog(1:100), log(1:100), times = 10) plan(sequential) ``` ``` #> Unit: nanoseconds #> expr min lq mean median uq max neval cld #> mylog_parallel(1:100) 7845842 7921282 8515622.1 7985549.5 9247632 9923763 10 a #> mylog(1:100) 3731 3936 129035.2 4100.0 6765 1246523 10 b #> log(1:100) 492 615 1184.9 1168.5 1681 2214 10 b ``` --- ## Parallelization is not always effective **The parallel code is much slower…** Because each individual computation is too quick,
spends more time on **inter-process communication** than on actual computation. .note[ ⚠️ **Rule of thumb:** one iteration of the loop must be long enough for parallelization to pay off. Communication overhead is roughly constant — it only becomes negligible when the task itself takes time. ] --- ## A realistic example **Bootstrap confidence intervals** for the mean of each of **200 variables** (e.g. omics measurements), using **B = 2 000 bootstraps** A single bootstrap is fast, but repeating it 200 × 2 000 times makes this *embarrassingly parallel*, ***and slow enough for parallelism to pay off***. ``` r set.seed(942026) n_obs <- 50 # observations per variable n_vars <- 200 # number of variables B <- 2000 # bootstrap resamples mydata <- matrix(rnorm(n_obs * n_vars), nrow = n_obs, ncol = n_vars) ``` -- Computing a **bootstrap CI** at 5% for one variable: 1. resample B times 2. compute the mean each time 3. finally take the 2.5 % and 97.5 % quantiles of the overall results. ``` r boot_ci <- function(col, B = 2000) { means <- replicate(B, mean(sample(col, replace = TRUE))) quantile(means, c(0.025, 0.975)) } ``` --- ## `apply` and `for`-loops .pull-left[ Experienced R users often reach for `*apply()` functions: ``` r bootci_apply <- function(x, B = 2000){ apply(X = x, MARGIN = 2, FUN = boot_ci, B = B) } system.time(bootci_apply(mydata)) ``` ``` #> user system elapsed #> 1.980 0.029 2.011 ``` ] .pull-right[ Yet, a well-written `for`-loop is *equally fast*: ``` r bootci_for <- function(x, B = 2000) { ans <- matrix(NA_real_, nrow = 2, ncol = ncol(x)) for (i in 1:ncol(x)){ ans[, i] <- boot_ci(x[, i], B) } return(ans) } system.time(bootci_for(mydata)) ``` ``` #> user system elapsed #> 1.973 0.021 1.997 ``` ] --- ## Efficient parallelization — `future.apply` .exercise[ 👉 **Your turn!** Parallelize the bootstrap CI computation using `future_sapply()`. Does it improve computation time compared to the sequential versions this time ? ] ``` r bootci_future <- function(x, B = 2000) { future_sapply(1:ncol(x), FUN = function(i) { boot_ci(x[, i], B) }, future.seed = TRUE) #ensures parallel-safe RNG } n_workers <- 3 #future::availableCores() - 1 system.time(bootci_future(mydata)) plan(multisession, workers = n_workers) system.time(bootci_future(mydata)) plan(sequential) ``` --- ## `future.apply` – benchmark .pull-left[ ``` r plan(multisession, workers = n_workers) bootci_future(mydata) # warm-up call mb <- microbenchmark( "apply" = bootci_apply(mydata), "for-loop" = bootci_for(mydata), "future.apply" = bootci_future(mydata), times = 20 ) plan(sequential) # clean-up ``` ] .pull-right[ <img src="data:image/png;base64,#parallelisation_Rtoul_files/figure-html/bootci_future_mbplot-1.png" alt="" width="504" /> ] --- class: inverse, center, middle count: false ## <span style="color: var(--myaccent3)">`pbapply`</span> ### Parallel apply **with a progress bar** --- ## `pbapply` — progress bar + parallelizm `pbapply` is a thin wrapper that adds a **progress bar** to any `*apply` call. Its `cl` argument accepts either: - an integer (number of cores) → creates a PSOCK cluster internally, or - an existing `cluster` object from `parallel::makeCluster()`. ``` r library(pbapply) ?pblapply # same signature as lapply + cl = ``` .exercise[ 👉 **Your turn!** Rewrite `bootci_apply` using `pbapply::pbsapply()` with `cl = future::availableCores() - 1`. Add it to the benchmark. Does it behave differently from `future.apply`? ] --- ## `pbapply` — solution ``` r bootci_pb <- function(x, cl, B = 2000) { pbapply::pbsapply(1:ncol(x), function(i){ boot_ci(x[, i], B) }, cl = cl) } system.time(bootci_pb(mydata, cl = n_workers)) ``` ``` #> user system elapsed #> 0.656 0.085 0.755 ``` ``` r pboptions(nout=25) #default: nout=100 pboptions(type="none") #default: type="timer" system.time(bootci_pb(mydata, cl = n_workers)) ``` ``` #> user system elapsed #> 0.672 0.094 0.769 ``` The progress bar induces some overhead, that can get noticed for fast subtasks --- ## `pbapply` – benchmark .pull-left[ ``` r plan(multisession, workers = n_workers) bootci_future(mydata) # warm-up call mb2 <- microbenchmark( "apply" = bootci_apply(mydata), "for-loop" = bootci_for(mydata), "future.apply" = bootci_future(mydata), "pbapply" = bootci_pb(mydata, cl = n_workers), times = 20 ) plan(sequential) # clean-up ``` ] .pull-right[ <img src="data:image/png;base64,#parallelisation_Rtoul_files/figure-html/unnamed-chunk-5-1.png" alt="" width="504" /> ] --- class: inverse, center, middle count: false ## <span style="color:#14b8a6">`mirai`</span> ### Minimal, async, lightning-fast --- ## `mirai` — why another package? `mirai` (Japanese for *"future"*) takes a different architectural approach: | | `future` | `mirai` | |---|---|---| | **Backend** | R sessions (PSOCK/Fork) | NNG C sockets | | **Overhead** | Moderate | Minimal | | **Style** | Synchronous (`plan` + `*apply`) | Async-first (`mirai()` returns immediately) | | **Best for** | General parallel work | High-frequency tasks, Shiny, production | | **OS** | All | All | .info[ 💡 `mirai` tasks are **non-blocking**: your R session remains free while workers compute. Results are fetched explicitly when you need them. ] --- ## `mirai` — core API **Set up workers** (analogous to `plan(multisession(workers = n))`): ```r library(mirai) daemons(n = 3) # start 3 persistent workers ``` **Launch an async task** — pass extra data as named arguments: ```r m <- sapply(1:ncol(mydata), function(i) mirai(boot_ci(mydata[, i]), .args=list(boot_ci=boot_ci, mydata=mydata, i=i))) # returns immediately — R is not blocked! ``` **Retrieve the result** — blocks only at this point: ```r sapply(m, function(mm) mm[]) # waits and returns the value ``` **Shut down workers:** ```r daemons(0) ``` --- ## `mirai` — your turn! .exercise[ 👉 **Your turn!** 1. Install and load `mirai` 2. Start daemons: `daemons(n_workers)` 3. Write `bootci_mirai()` 4. Add it to the benchmark — compare with `future.apply` and `pbapply` 5. Shut down: `daemons(0)` ] --- ## `mirai` — solution ``` r bootci_mirai <- function(x, B = 2000) { ms <- lapply(1:ncol(x), function(i) { mirai(boot_ci(col = x[, i], B = B), .args=list(boot_ci=boot_ci, x=x, i=i, B=B)) }) sapply(X = ms, FUN = function(m) m[]) } system.time(bootci_mirai(mydata)) ``` ``` #> user system elapsed #> 0.281 0.536 5.062 ``` ``` r daemons(n_workers) system.time(bootci_mirai(mydata)) ``` ``` #> user system elapsed #> 0.012 0.041 0.751 ``` ``` r daemons(0) # clean up workers ``` --- ## `mirai` — full benchmark .pull-left[ ``` r plan(multisession, workers = n_workers) bootci_future(mydata) # warm-up call bootci_pb(mydata, cl = n_workers) # warm-up call mb3_a <- microbenchmark( "apply" = bootci_apply(mydata), "for-loop" = bootci_for(mydata), "future.apply" = bootci_future(mydata), "pbapply" = bootci_pb(mydata, cl = n_workers), times = 20 ) plan(sequential) #cleanup daemons(n_workers) bootci_mirai(mydata) # warm-up call mb3_b <- microbenchmark( "mirai" = bootci_mirai(mydata), times = 20 ) daemons(0) #cleanup mb3 <- rbind.data.frame(mb3_a, mb3_b) ``` ] .pull-right[ <img src="data:image/png;base64,#parallelisation_Rtoul_files/figure-html/mirai_mbplot-1.png" alt="" width="504" /> ] --- ## Summary & conclusion | Situation | Recommended | |---|---| | Quick script, want a progress bar | `pbapply` | | General parallel work, drop-in replacement | `future` or `future.apply` | | High-throughput / async / Shiny integration | `mirai` | parallelization **can** speed up computation — but the gain depends on the ratio between: - **communication time** (fixed cost) *VS* - **computation time per task** (must dominate) --- ## Going Further More details on R packages and advanced R programming: 👉 Online lecture notes from [endeav
course: *a course on code efficiency and software development with R*](https://endeavr.borishejblum.science/r-code-parallelization) [`futurize`](https://futurize.futureverse.org/) provides an ever lighter API to leverage `future` parallelization. A list of additional, less straightforward (and not necessarly more efficient computationally),
packages for parallel computations: - [`doParallel`](https://github.com/RevolutionAnalytics/doparallel) - [`foreach`](https://github.com/RevolutionAnalytics/foreach) - [`itertools`](https://CRAN.R-project.org/package=itertools) - [`batchtools`](https://batchtools.mlr-org.com)