Gets linkage disequilibrium data for variants from Ensembl REST API. There are four ways to query, either by:

Genomic window centred on variants:

get_ld_variants_by_window(variant_id, genomic_window_size, ...)

Pairs of variants:

get_ld_variants_by_pair(variant_id1, variant_id2, ...)

Genomic range:

get_ld_variants_by_range(genomic_range, ...)

All pair combinations of variants:

get_ld_variants_by_pair_combn(variant_id, ...)

get_ld_variants_by_window(
  variant_id,
  genomic_window_size = 500L,
  species_name = "homo_sapiens",
  population = "1000GENOMES:phase_3:CEU",
  d_prime = 0,
  r_squared = 0.05,
  verbose = FALSE,
  warnings = TRUE,
  progress_bar = TRUE
)

get_ld_variants_by_pair(
  variant_id1,
  variant_id2,
  species_name = "homo_sapiens",
  population = "1000GENOMES:phase_3:CEU",
  d_prime = 0,
  r_squared = 0.05,
  verbose = FALSE,
  warnings = TRUE,
  progress_bar = TRUE
)

get_ld_variants_by_range(
  genomic_range,
  species_name = "homo_sapiens",
  population = "1000GENOMES:phase_3:CEU",
  d_prime = 0,
  r_squared = 0.05,
  verbose = FALSE,
  warnings = TRUE,
  progress_bar = TRUE
)

get_ld_variants_by_pair_combn(
  variant_id,
  species_name = "homo_sapiens",
  population = "1000GENOMES:phase_3:CEU",
  d_prime = 0,
  r_squared = 0.05,
  verbose = FALSE,
  warnings = TRUE,
  progress_bar = TRUE
)

Arguments

variant_id

Variant identifiers, e.g., 'rs123'. This argument is to be used with either function get_ld_variants_by_window() or get_ld_variants_by_pair_combn(). In the case of get_ld_variants_by_pair_combn() all pairwise combinations of elements of variant_id are used to define pairs of variants for querying. Note that this argument is not the same as variant_id1 or variant_id2, to be used with function get_ld_variants_by_pair.

genomic_window_size

An integer vector specifying the genomic window size in kilobases (kb) around the variant indicated in variant_id. This argument is to be used with function get_ld_variants_by_window(). At the moment, the Ensembl REST API does not allow values greater than 500kb. A window size of 500 means looking 250kb upstream and downstream the variant passed as variant_id. The minimum value for this argument is 1L, not 0L.

species_name

The species name, i.e., the scientific name, all letters lowercase and space replaced by underscore. Examples: 'homo_sapiens' (human), 'ovis_aries' (Domestic sheep) or 'capra_hircus' (Goat).

population

Population for which to compute linkage disequilibrium. See get_populations on how to find available populations for a species.

d_prime

\(D'\) is a measure of linkage disequilibrium. d_prime defines a cut-off threshold: only variants whose \(D' \ge \)d_prime are returned.

r_squared

\(r^2\) is a measure of linkage disequilibrium. r_squared defines a cut-off threshold: only variants whose \(r^2 \ge \)r_squared are returned. The lower bound for r_squared is 0.05, not 0; the upper bound is 1.

verbose

Whether to be verbose about the http requests and respective responses' status.

warnings

Whether to show warnings.

progress_bar

Whether to show a progress bar.

variant_id1

The first variant of a pair of variants. Used with variant_id2. Note that this argument is not the same as variant_id. This argument is to be used with function get_ld_variants_by_pair().

variant_id2

The second variant of a pair of variants. Used with variant_id1. Note that this argument is not the same as variant_id. This argument is to be used with function get_ld_variants_by_pair().

genomic_range

Genomic range formatted as a string "chr:start..end", e.g., "X:1..10000". Check function genomic_range to easily create these ranges from vectors of start and end positions. This argument is to be used with function get_ld_variants_by_range().

Value

A tibble of 6 variables:

species_name

Ensembl species name: this is the name used internally by Ensembl to uniquely identify a species by name. It is the scientific name but formatted without capitalisation and spacing converted with an underscore, e.g., 'homo_sapiens'.

population

Population for which to compute linkage disequilibrium.

variant_id1

First variant identifier.

variant_id2

Second variant identifier.

d_prime

\(D'\) between the two variants.

r_squared

\(r^2\) between the two variants.

Examples

# Retrieve variants in LD by a window size of 1kb: # 1kb: 500 bp upstream and 500 bp downstream of variant. get_ld_variants_by_window('rs123', genomic_window_size = 1L)
#> # A tibble: 6 × 6 #> species_name population variant_id1 variant_id2 r_squared d_prime #> <chr> <chr> <chr> <chr> <dbl> <dbl> #> 1 homo_sapiens 1000GENOMES:phase_3:CEU rs123 rs12536724 0.255 1.00 #> 2 homo_sapiens 1000GENOMES:phase_3:CEU rs123 rs114 0.475 0.703 #> 3 homo_sapiens 1000GENOMES:phase_3:CEU rs123 rs115 0.721 1.00 #> 4 homo_sapiens 1000GENOMES:phase_3:CEU rs123 rs122 0.722 1.00 #> 5 homo_sapiens 1000GENOMES:phase_3:CEU rs123 rs124 0.722 1.00 #> 6 homo_sapiens 1000GENOMES:phase_3:CEU rs123 rs10239961 0.255 1.00
# Retrieve LD measures for pairs of variants: get_ld_variants_by_pair( variant_id1 = c('rs123', 'rs35439278'), variant_id2 = c('rs122', 'rs35174522') )
#> # A tibble: 2 × 6 #> species_name population variant_id1 variant_id2 r_squared d_prime #> <chr> <chr> <chr> <chr> <dbl> <dbl> #> 1 homo_sapiens 1000GENOMES:phase_3:CEU rs123 rs122 0.722 1.00 #> 2 homo_sapiens 1000GENOMES:phase_3:CEU rs35439278 rs35174522 0.0973 1.00
# Retrieve variants in LD within a genomic range get_ld_variants_by_range('7:100000..100500')
#> # A tibble: 1 × 6 #> species_name population variant_id1 variant_id2 r_squared d_prime #> <chr> <chr> <chr> <chr> <dbl> <dbl> #> 1 homo_sapiens 1000GENOMES:phase_3:CEU rs35439278 rs35174522 0.0973 1.00
# Retrieve all pair combinations of variants in LD get_ld_variants_by_pair_combn(c('rs6978506', 'rs12718102', 'rs13307200'))
#> # A tibble: 3 × 6 #> species_name population variant_id1 variant_id2 r_squared d_prime #> <chr> <chr> <chr> <chr> <dbl> <dbl> #> 1 homo_sapiens 1000GENOMES:phase_3:CEU rs6978506 rs12718102 0.111 0.999 #> 2 homo_sapiens 1000GENOMES:phase_3:CEU rs6978506 rs13307200 0.320 1.00 #> 3 homo_sapiens 1000GENOMES:phase_3:CEU rs12718102 rs13307200 0.266 0.875