Run DAEQTL mapping in parallel • daeqtlr

The parallelization supported by the function daeqtl_mapping() hinges on the packages of the Futureverse, namely future and future.apply.

Running on your local computer

Here’s an example that performs DAEQTL mapping in parallel across 3 cores. The data table snp_pairs is split into 3 chunks (argument .n_chunks) so that each worker (“forked process”) runs on one of those chunks.

library(daeqtlr)

n_workers <- 3L
future::plan(future::multisession, workers = n_workers)

snp_pairs <- read_snp_pairs(file = daeqtlr_example("snp_pairs.csv"))
zygosity <- read_snp_zygosity(file = daeqtlr_example("zygosity.csv"))
ae <- read_ae_ratios(file = daeqtlr_example("ae.csv"))

mapping_dt <- daeqtl_mapping(
  snp_pairs = snp_pairs,
  zygosity = zygosity,
  ae = ae
)

# This is not mandatory but it's a good practice to reset the parallelization
# setup to sequential.
future::plan("sequential")

Running on two remote machines

This example applies to a situation where you’d want to run on two remote computers, e.g. rey and r2d2.

For the following code to work you need the following setup:

SSH access to the remotes machines, e.g. rey and r2d2.
To have already installed daeqtlr on those machines.

library(daeqtlr)

workers <- c(rep("rey", times = 64L), rep("r2d2", times = 40L))
# `future::plan` may take one or two minutes to complete.
plan <- future::plan(future::cluster, workers = workers, homogeneous = FALSE)

snp_pairs <- read_snp_pairs(file = daeqtlr_example("snp_pairs.csv"))
zygosity <- read_snp_zygosity(file = daeqtlr_example("zygosity.csv"))
ae <- read_ae_ratios(file = daeqtlr_example("ae.csv"))

mapping_dt <- daeqtl_mapping(
  snp_pairs = snp_pairs,
  zygosity = zygosity,
  ae = ae
)

# This is not mandatory but it's a good practice to reset the parallelization
# setup to sequential.
future::plan("sequential")

If you want to run this code from one of those machines, then change the hostname of that computer to "localhost", e.g., if you intended to run the code above from rey, then you would do:

library(daeqtlr)

workers <- c(rep("localhost", times = 64L), rep("r2d2", times = 40L))
# `future::plan` may take one or two minutes to complete.
plan <- future::plan(future::cluster, workers = workers, homogeneous = FALSE)

snp_pairs <- read_snp_pairs(file = daeqtlr_example("snp_pairs.csv"))
zygosity <- read_snp_zygosity(file = daeqtlr_example("zygosity.csv"))
ae <- read_ae_ratios(file = daeqtlr_example("ae.csv"))

mapping_dt <- daeqtl_mapping(
  snp_pairs = snp_pairs,
  zygosity = zygosity,
  ae = ae
)

# This is not mandatory but it's a good practice to reset the parallelization
# setup to sequential.
future::plan("sequential")