Tools for parallelizing R code easily

class: center, middle, inverse, title-slide

.title[
# Tools for parallelizing R code easily
]
.subtitle[
## R-Toulouse
]
.author[
### <a href="https://borishejblum.science">Boris Hejblum</a>
]
.date[
### April 9<sup>th</sup>, 2026
]

---

main-container {
  max-width: 1400px;
}

/* ── inverse / title slides ─────────────────────────── */
.inverse {
  background: linear-gradient(140deg, var(--accent2) 15%, var(--accent3) 35%, var(--accent) 55%, white 90%);
  color: var(--light);
  link-color: var(--accent);
  border-top: 0px solid transparent !important; /* removes it top ribbon*/
  
}

/* remove metropolis title slide bar */
.mline {
  display: none !important;
}

.title-slide .remark-slide-number {
  display: none;
}

.inverse .remark-slide-number {
  display: none;
}

.title-slide.remark-slide-content::after {
  background-image: url("logo.svg");
  top: 565px;
  right: 7px;
  width: 120px;
  height: 120px;
  opacity: 0.95;
}

.inverse h1 {
  color: black;
  font-size: 2.4em;
  font-weight: 800;
  letter-spacing: -1px;
  line-height: 1.15;
}
.inverse h2, .inverse p { color: #cbd5e1; font-weight: 300; }

/* ── body slides ─────────────────────────────────────── */
.remark-slide-content {
  font-size: 17px;
  line-height: 1.6;
  padding: 1.5em 3em;
  color: var(--mid);
  border-top: 6px solid var(--accent1);
  border-bottom: 0px solid var(--accent2);
}
.remark-slide-content::after {
  content: "";
  position: absolute;
  top: 5px;
  right: 5px;
  width: 80px;
  height: 80px;
  background-image: url("logo.svg");
  background-size: contain;
  background-repeat: no-repeat;
  opacity: 0.75;   /* optional */
}

.remark-slide-content h2 {
  border-bottom: 3px solid var(--accent);
  padding-bottom: .25em;
  margin-bottom: .8em;
  color: var(--dark);
  font-size: 1.5em;
}
.remark-slide-content strong { color: var(--dark); }
.remark-slide-content em     { color: var(--accent2); }

/* ── code ────────────────────────────────────────────── */
.remark-code, .remark-inline-code {
  font-family: 'Fira Code', 'Cascadia Code', 'Source Code Pro', monospace;
  font-size: 0.82em;
}
.remark-code-line-highlighted { background-color: rgba(20,184,166,.18); }
pre { margin: .6em 0; }
code.remark-inline-code {
  background: #e2e8f0;
  color: #0f766e;
  border-radius: 3px;
  padding: 1px 5px;
}

/* ── exercise callout ────────────────────────────────── */
.exercise {
  background: #f0fdf9;
  border-left: 5px solid var(--accent);
  border-radius: 0 8px 8px 0;
  padding: .9em 1.4em;
  margin: .8em 0;
}
.exercise p { margin: .3em 0; }
.exercise code.remark-inline-code { background: #ccfbf1; }

/* ── note box ─────────────────────────────────── */
.note {
  background: #ff55992b;
  border-left: 5px solid var(--accent2);
  border-radius: 0 8px 8px 0;
  padding: .7em 1.2em;
  margin: .6em 0;
  font-size: .92em;
}

/* ── info / note box ─────────────────────────────────── */
.info {
  background: #bcafc285;
  border-left: 5px solid var(--accent3);
  border-radius: 0 8px 8px 0;
  padding: .7em 1.2em;
  margin: .6em 0;
  font-size: .92em;
}

/* ── package badge ───────────────────────────────────── */
.pkg {
  display: inline-block;
  background: var(--accent);
  color: white !important;
  font-weight: 700;
  font-size: 0.78em;
  padding: 2px 9px;
  border-radius: 4px;
  font-family: monospace;
  letter-spacing: .3px;
}

/* ── comparison table ────────────────────────────────── */
.remark-slide-content table {
  border-collapse: collapse;
  width: 100%;
  font-size: .9em;
}
.remark-slide-content th {
  background: var(--dark);
  color: var(--accent2);
  padding: .5em .9em;
  font-weight: 600;
  text-align: left;
}
.remark-slide-content td {
  padding: .4em .9em;
  border-bottom: 1px solid #e2e8f0;
}
.remark-slide-content tr:nth-child(even) td { background: #f8fafc; }

/* ── slide number ────────────────────────────────────── */
.remark-slide-number { color: var(--accent3); font-size: 13px; }

/* ── pull-left / pull-right fine-tuning ──────────────── */
.pull-left  { float: left;  width: 47%; }
.pull-right { float: right; width: 47%; }

a, a > code {
  color: var(--accent2);
}

a:hover {
  color: var(--accent);
}

</style>

## First things first: required packages

Install the missing packages

``` r
req_pkgs <- c("future.apply", "pbapply", "mirai", "microbenchmark")
install.packages(req_pkgs[!req_pkgs %in% installed.packages()[, "Package"]])
```

Load them

``` r
library(future)
library(future.apply)
library(pbapply)
library(mirai)
library(microbenchmark)
```

---

## Introduction to parallel execution

Parallel code execution enables **faster computation** by leveraging multiple CPU cores.  
The *total* amount of work is not reduced — but it gets done **quicker**.

Many algorithms are ***"embarrassingly parallel"***: they can be decomposed into completely independent calculations.  
In Statistics, it is natural to *parallelize over observations and/or parameters*.

Typically, **`for`-loops** are the most common and straightforward candidates for parallelization.

---

## Parallel computation — steps

**Operations required to run a parallel computation:**

1. Start `\(m\)` *worker* processes (CPUs)

2. Send the necessary data and functions to the *workers*

3. Split the task into `\(m\)` sub-tasks of similar magnitude

4. Wait for all *workers* to finish

5. Gather and combine the results from the different *workers*

6. Stop the *worker* processes

---

## Communication protocols

Several protocols for communicating between *workers* are available, depending on the OS:

| Protocol | Platform | Notes |
|---|---|---|
| **Fork** | Unix (Mac & Linux) | Fast — shares memory with parent process |
| **PSOCK** | All (incl. Windows) | Spawns fresh R sessions; more overhead |
| **NNG sockets** | All | C-level; ultra-low latency (used by `mirai`) |
| **MPI** | HPC clusters | For distributed multi-node computing |

> `pbapply`, `future` and `mirai` expose a unified syntax regardless of the underlying backend.

---

## Parallel computation in <svg aria-hidden="true" role="img" viewBox="0 0 581 512" style="height:1em;width:1.13em;vertical-align:-0.125em;margin-left:auto;margin-right:auto;font-size:inherit;fill:currentColor;overflow:visible;position:relative;"><path d="M581 226.6C581 119.1 450.9 32 290.5 32S0 119.1 0 226.6C0 322.4 103.3 402 239.4 418.1V480h99.1v-61.5c24.3-2.7 47.6-7.4 69.4-13.9L448 480h112l-67.4-113.7c54.5-35.4 88.4-84.9 88.4-139.7zm-466.8 14.5c0-73.5 98.9-133 220.8-133s211.9 40.7 211.9 133c0 50.1-26.5 85-70.3 106.4-2.4-1.6-4.7-2.9-6.4-3.7-10.2-5.2-27.8-10.5-27.8-10.5s86.6-6.4 86.6-92.7-90.6-87.9-90.6-87.9h-199V361c-74.1-21.5-125.2-67.1-125.2-119.9zm225.1 38.3v-55.6c57.8 0 87.8-6.8 87.8 27.3 0 36.5-38.2 28.3-87.8 28.3zm-.9 72.5H365c10.8 0 18.9 11.7 24 19.2-16.1 1.9-33 2.8-50.6 2.9v-22.1z"/></svg>

Three modern packages can cover the vast majority of our use-cases, with just a *few* ***straightforward*** *and* ***simple*** *lines of code*:

.pull-left[
### <span class="pkg">parallel</span> *(base <svg aria-hidden="true" role="img" viewBox="0 0 581 512" style="height:1em;width:1.13em;vertical-align:-0.125em;margin-left:auto;margin-right:auto;font-size:inherit;fill:currentColor;overflow:visible;position:relative;"><path d="M581 226.6C581 119.1 450.9 32 290.5 32S0 119.1 0 226.6C0 322.4 103.3 402 239.4 418.1V480h99.1v-61.5c24.3-2.7 47.6-7.4 69.4-13.9L448 480h112l-67.4-113.7c54.5-35.4 88.4-84.9 88.4-139.7zm-466.8 14.5c0-73.5 98.9-133 220.8-133s211.9 40.7 211.9 133c0 50.1-26.5 85-70.3 106.4-2.4-1.6-4.7-2.9-6.4-3.7-10.2-5.2-27.8-10.5-27.8-10.5s86.6-6.4 86.6-92.7-90.6-87.9-90.6-87.9h-199V361c-74.1-21.5-125.2-67.1-125.2-119.9zm225.1 38.3v-55.6c57.8 0 87.8-6.8 87.8 27.3 0 36.5-38.2 28.3-87.8 28.3zm-.9 72.5H365c10.8 0 18.9 11.7 24 19.2-16.1 1.9-33 2.8-50.6 2.9v-22.1z"/></svg>)*
Ships with R ≥ 2.14. Lower-level; used internally by the packages below

### <span class="pkg">pbapply</span>
Adds a **progress bar** to any `*apply` call.  
The `cl` argument enables transparent parallelism — great for interactive scripts.
]

.pull-right[
### <span class="pkg">future.apply</span>
Drop-in `*apply` replacements — just swap `sapply` for `future_sapply`.  
Declare a **plan** once; the same code runs sequentially or in parallel.

### <span class="pkg">mirai</span>
Minimal, ultra-low overhead.  
Built on **NNG** (nanomsg next generation — C sockets).  
True async/non-blocking tasks (plays nice with `Shiny`)
]

---

class: inverse, center, middle
count: false

## <span style="color: var(--myaccent3)">`future.apply`</span>

### Parallel apply

---

## First parallel function

.exercise[
👉 **Your turn!**

1. Check available cores: `future::availableCores()`
2. Declare a parallel plan — leave **at least one core free**:  
   `future::plan(multisession, workers = XX))`
3. Use `future.apply::future_sapply()` to compute `log()` of `\(n\)` numbers in parallel
4. Wrap into a function `mylog_parallel()`
5. Compare execution times on `1:100`:  
   `microbenchmark(mylog_parallel(1:100), mylog(1:100), log(1:100), times = 10)`
]

---

## First parallel function — solution

``` r
future::availableCores()-1 #nb available cpus
n_workers <- 3

mylog_parallel <- function(x) {
  future_sapply(1:length(x), 
                FUN = function(i){
                  log(x[i])
                }
  )
}

plan(multisession, workers = n_workers)
mylog_parallel(1:100)
mb <- microbenchmark(mylog_parallel(1:100), mylog(1:100), log(1:100), times = 10)
plan(sequential)
```

```
#> Unit: nanoseconds
#>                   expr     min      lq      mean    median      uq     max neval cld
#>  mylog_parallel(1:100) 7845842 7921282 8515622.1 7985549.5 9247632 9923763    10  a 
#>           mylog(1:100)    3731    3936  129035.2    4100.0    6765 1246523    10   b
#>             log(1:100)     492     615    1184.9    1168.5    1681    2214    10   b
```

---

## Parallelization is not always effective

**The parallel code is much slower…**

Because each individual computation is too quick, <svg aria-hidden="true" role="img" viewBox="0 0 581 512" style="height:1em;width:1.13em;vertical-align:-0.125em;margin-left:auto;margin-right:auto;font-size:inherit;fill:currentColor;overflow:visible;position:relative;"><path d="M581 226.6C581 119.1 450.9 32 290.5 32S0 119.1 0 226.6C0 322.4 103.3 402 239.4 418.1V480h99.1v-61.5c24.3-2.7 47.6-7.4 69.4-13.9L448 480h112l-67.4-113.7c54.5-35.4 88.4-84.9 88.4-139.7zm-466.8 14.5c0-73.5 98.9-133 220.8-133s211.9 40.7 211.9 133c0 50.1-26.5 85-70.3 106.4-2.4-1.6-4.7-2.9-6.4-3.7-10.2-5.2-27.8-10.5-27.8-10.5s86.6-6.4 86.6-92.7-90.6-87.9-90.6-87.9h-199V361c-74.1-21.5-125.2-67.1-125.2-119.9zm225.1 38.3v-55.6c57.8 0 87.8-6.8 87.8 27.3 0 36.5-38.2 28.3-87.8 28.3zm-.9 72.5H365c10.8 0 18.9 11.7 24 19.2-16.1 1.9-33 2.8-50.6 2.9v-22.1z"/></svg> spends more time on **inter-process communication** than on actual computation.

.note[
⚠️ **Rule of thumb:** one iteration of the loop must be long enough for parallelization to pay off.  
Communication overhead is roughly constant — it only becomes negligible when the task itself takes time.
]

---

## A realistic example

**Bootstrap confidence intervals** for the mean of each of **200 variables** (e.g. omics measurements), using **B = 2 000 bootstraps**

A single bootstrap is fast, but repeating it 200 × 2 000 times makes this *embarrassingly parallel*, ***and slow enough for parallelism to pay off***.

``` r
set.seed(942026)
n_obs  <- 50     # observations per variable
n_vars <- 200    # number of variables
B      <- 2000   # bootstrap resamples

mydata <- matrix(rnorm(n_obs * n_vars), nrow = n_obs, ncol = n_vars)
```

Computing a **bootstrap CI** at 5% for one variable: 
 1. resample B times
 2. compute the mean each time
 3. finally take the 2.5 % and 97.5 % quantiles of the overall results.

``` r
boot_ci <- function(col, B = 2000) {
  means <- replicate(B, mean(sample(col, replace = TRUE)))
  quantile(means, c(0.025, 0.975))
}
```

---

## `apply` and `for`-loops

.pull-left[
Experienced R users often reach for `*apply()` functions:

``` r
bootci_apply <- function(x, B = 2000){
  apply(X = x, MARGIN = 2, FUN = boot_ci, B = B)
}
system.time(bootci_apply(mydata))
```

```
#>    user  system elapsed 
#>   1.980   0.029   2.011
```
]
.pull-right[
Yet, a well-written `for`-loop is *equally fast*:

``` r
bootci_for <- function(x, B = 2000) {
  ans <- matrix(NA_real_, nrow = 2, ncol = ncol(x))
  for (i in 1:ncol(x)){
    ans[, i] <- boot_ci(x[, i], B)
  } 
  return(ans)
}
system.time(bootci_for(mydata))
```

```
#>    user  system elapsed 
#>   1.973   0.021   1.997
```
]

---

## Efficient parallelization — `future.apply`

.exercise[
👉 **Your turn!**

Parallelize the bootstrap CI computation using `future_sapply()`.  
Does it improve computation time compared to the sequential versions this time ?
]

``` r
bootci_future <- function(x, B = 2000) {
  future_sapply(1:ncol(x), FUN = function(i) {
    boot_ci(x[, i], B)
  },
  future.seed = TRUE) #ensures parallel-safe RNG
}

n_workers <- 3 #future::availableCores() - 1
system.time(bootci_future(mydata))
plan(multisession, workers = n_workers)
system.time(bootci_future(mydata))
plan(sequential)
```

---

## `future.apply` – benchmark

.pull-left[

``` r
plan(multisession, workers = n_workers)
bootci_future(mydata) # warm-up call
mb <- microbenchmark(
  "apply" = bootci_apply(mydata),
  "for-loop" = bootci_for(mydata),
  "future.apply" = bootci_future(mydata),
  times = 20
)
plan(sequential) # clean-up
```
]

.pull-right[
<img src="data:image/png;base64,#parallelisation_Rtoul_files/figure-html/bootci_future_mbplot-1.png" alt="" width="504" />
]

---
class: inverse, center, middle
count: false

## <span style="color: var(--myaccent3)">`pbapply`</span>

### Parallel apply **with a progress bar**

---

## `pbapply` — progress bar + parallelizm

`pbapply` is a thin wrapper that adds a **progress bar** to any `*apply` call.  
Its `cl` argument accepts either:
- an integer (number of cores) → creates a PSOCK cluster internally, or  
- an existing `cluster` object from `parallel::makeCluster()`.

``` r
library(pbapply)
?pblapply   # same signature as lapply + cl =
```

.exercise[
👉 **Your turn!**

Rewrite `bootci_apply` using `pbapply::pbsapply()` with `cl = future::availableCores() - 1`.  
Add it to the benchmark. Does it behave differently from `future.apply`?
]

---

## `pbapply` — solution

``` r
bootci_pb <- function(x, cl, B = 2000) {
  pbapply::pbsapply(1:ncol(x), function(i){
    boot_ci(x[, i], B)
  },
  cl = cl)
}

system.time(bootci_pb(mydata, cl = n_workers))
```

```
#>    user  system elapsed 
#>   0.656   0.085   0.755
```

``` r
pboptions(nout=25) #default: nout=100
pboptions(type="none") #default: type="timer"
system.time(bootci_pb(mydata, cl = n_workers))
```

```
#>    user  system elapsed 
#>   0.672   0.094   0.769
```

The progress bar induces some overhead, that can get noticed for fast subtasks

---

## `pbapply` – benchmark

.pull-left[

``` r
plan(multisession, workers = n_workers)
bootci_future(mydata) # warm-up call
mb2 <- microbenchmark(
  "apply" = bootci_apply(mydata),
  "for-loop" = bootci_for(mydata),
  "future.apply" = bootci_future(mydata),
  "pbapply" = bootci_pb(mydata, cl = n_workers),
  times = 20
)
plan(sequential) # clean-up
```
]

.pull-right[ 
<img src="data:image/png;base64,#parallelisation_Rtoul_files/figure-html/unnamed-chunk-5-1.png" alt="" width="504" />
]
---
class: inverse, center, middle
count: false

## <span style="color:#14b8a6">`mirai`</span>

### Minimal, async, lightning-fast

---

## `mirai` — why another package?

`mirai` (Japanese for *"future"*) takes a different architectural approach:

| | `future` | `mirai` |
|---|---|---|
| **Backend** | R sessions (PSOCK/Fork) | NNG C sockets |
| **Overhead** | Moderate | Minimal |
| **Style** | Synchronous (`plan` + `*apply`) | Async-first (`mirai()` returns immediately) |
| **Best for** | General parallel work | High-frequency tasks, Shiny, production |
| **OS** | All | All |

.info[
💡 `mirai` tasks are **non-blocking**: your R session remains free while workers compute.  
Results are fetched explicitly when you need them.
]

---

## `mirai` — core API

**Set up workers** (analogous to `plan(multisession(workers = n))`):
```r
library(mirai)
daemons(n = 3)          # start 3 persistent workers
```

**Launch an async task** — pass extra data as named arguments:
```r
m <- sapply(1:ncol(mydata), function(i) mirai(boot_ci(mydata[, i]), .args=list(boot_ci=boot_ci, mydata=mydata, i=i)))
# returns immediately — R is not blocked!
```

**Retrieve the result** — blocks only at this point:
```r
sapply(m, function(mm) mm[])          # waits and returns the value
```

**Shut down workers:**
```r
daemons(0)
```

---

## `mirai` — your turn!

.exercise[
👉 **Your turn!**

1. Install and load `mirai`
2. Start daemons: `daemons(n_workers)`
3. Write `bootci_mirai()`
4. Add it to the benchmark — compare with `future.apply` and `pbapply`
5. Shut down: `daemons(0)`
]

---

## `mirai` — solution

``` r
bootci_mirai <- function(x, B = 2000) {
  ms <- lapply(1:ncol(x), function(i) {
    mirai(boot_ci(col = x[, i], B = B), 
          .args=list(boot_ci=boot_ci, x=x, i=i, B=B))
  })
  sapply(X = ms, FUN = function(m) m[])
}

system.time(bootci_mirai(mydata))
```

```
#>    user  system elapsed 
#>   0.281   0.536   5.062
```

``` r
daemons(n_workers)
system.time(bootci_mirai(mydata))
```

```
#>    user  system elapsed 
#>   0.012   0.041   0.751
```

``` r
daemons(0)   # clean up workers
```

---

## `mirai` — full benchmark
.pull-left[

``` r
plan(multisession, workers = n_workers)
bootci_future(mydata) # warm-up call
bootci_pb(mydata, cl = n_workers)  # warm-up call
mb3_a <- microbenchmark(
  "apply" = bootci_apply(mydata),
  "for-loop" = bootci_for(mydata),
  "future.apply" = bootci_future(mydata),
  "pbapply"  = bootci_pb(mydata, cl = n_workers),
  times = 20
)
plan(sequential) #cleanup

daemons(n_workers)
bootci_mirai(mydata) # warm-up call
mb3_b <- microbenchmark(
  "mirai" = bootci_mirai(mydata),
  times = 20
)
daemons(0) #cleanup

mb3 <- rbind.data.frame(mb3_a, mb3_b)
```
]

.pull-right[
<img src="data:image/png;base64,#parallelisation_Rtoul_files/figure-html/mirai_mbplot-1.png" alt="" width="504" />
]

---

## Summary & conclusion

| Situation | Recommended |
|---|---|
| Quick script, want a progress bar | `pbapply` |
| General parallel work, drop-in replacement | `future` or `future.apply` |
| High-throughput / async / Shiny integration | `mirai` |

parallelization **can** speed up computation — but the gain depends on the ratio between:
- **communication time** (fixed cost) *VS*  
- **computation time per task** (must dominate)

---

## Going Further

More details on R packages and advanced R programming:  
👉 Online lecture notes from [endeav<svg aria-hidden="true" role="img" viewBox="0 0 581 512" style="height:1em;width:1.13em;vertical-align:-0.125em;margin-left:auto;margin-right:auto;font-size:inherit;fill:currentColor;overflow:visible;position:relative;"><path d="M581 226.6C581 119.1 450.9 32 290.5 32S0 119.1 0 226.6C0 322.4 103.3 402 239.4 418.1V480h99.1v-61.5c24.3-2.7 47.6-7.4 69.4-13.9L448 480h112l-67.4-113.7c54.5-35.4 88.4-84.9 88.4-139.7zm-466.8 14.5c0-73.5 98.9-133 220.8-133s211.9 40.7 211.9 133c0 50.1-26.5 85-70.3 106.4-2.4-1.6-4.7-2.9-6.4-3.7-10.2-5.2-27.8-10.5-27.8-10.5s86.6-6.4 86.6-92.7-90.6-87.9-90.6-87.9h-199V361c-74.1-21.5-125.2-67.1-125.2-119.9zm225.1 38.3v-55.6c57.8 0 87.8-6.8 87.8 27.3 0 36.5-38.2 28.3-87.8 28.3zm-.9 72.5H365c10.8 0 18.9 11.7 24 19.2-16.1 1.9-33 2.8-50.6 2.9v-22.1z"/></svg> course: *a course on code efficiency and software development with R*](https://endeavr.borishejblum.science/r-code-parallelization)

[`futurize`](https://futurize.futureverse.org/) provides an ever lighter API to leverage `future` parallelization.

A list of additional, less straightforward (and not necessarly more efficient computationally), <svg aria-hidden="true" role="img" viewBox="0 0 581 512" style="height:1em;width:1.13em;vertical-align:-0.125em;margin-left:auto;margin-right:auto;font-size:inherit;fill:currentColor;overflow:visible;position:relative;"><path d="M581 226.6C581 119.1 450.9 32 290.5 32S0 119.1 0 226.6C0 322.4 103.3 402 239.4 418.1V480h99.1v-61.5c24.3-2.7 47.6-7.4 69.4-13.9L448 480h112l-67.4-113.7c54.5-35.4 88.4-84.9 88.4-139.7zm-466.8 14.5c0-73.5 98.9-133 220.8-133s211.9 40.7 211.9 133c0 50.1-26.5 85-70.3 106.4-2.4-1.6-4.7-2.9-6.4-3.7-10.2-5.2-27.8-10.5-27.8-10.5s86.6-6.4 86.6-92.7-90.6-87.9-90.6-87.9h-199V361c-74.1-21.5-125.2-67.1-125.2-119.9zm225.1 38.3v-55.6c57.8 0 87.8-6.8 87.8 27.3 0 36.5-38.2 28.3-87.8 28.3zm-.9 72.5H365c10.8 0 18.9 11.7 24 19.2-16.1 1.9-33 2.8-50.6 2.9v-22.1z"/></svg> packages for parallel computations:  
 - [`doParallel`](https://github.com/RevolutionAnalytics/doparallel)
 - [`foreach`](https://github.com/RevolutionAnalytics/foreach)
 - [`itertools`](https://CRAN.R-project.org/package=itertools)
 - [`batchtools`](https://batchtools.mlr-org.com)