This function implements the Align-GVGD (A-GVGD) method described in Tavtigian et al. (2006).
A-GVGD combines multiple sequence alignment of orthologous sequences with the Grantham distance to classify missense variants, i.e. to distinguish human disease susceptibility missense changes from changes of little clinical significance.
The biochemical variation at each alignment position is converted to a Grantham Variation score (GV) and the difference between these properties and those of the variant amino acid being assessed are calculated and a Grantham Difference score generated (GD). The predicted effect is classed as C0, C15, C25, C35, C45, C55, or C65, with C65 most likely to interfere with function and C0 least likely.
Usage
agvgd(
alignment,
poi,
sub,
mode = c("recycle", "expand_grid"),
sort = FALSE,
keep_self = TRUE,
digits = 2L
)
Arguments
- alignment
A character matrix or an alignment object obtained with
read_alignment()
. Rows are expected to be sequences of single characters (protein residues), and columns the alignment positions. The first row must be the reference sequence, i.e. the sequence whose substitutions will be evaluated against.- poi
A whole number indicating the position of interest (POI).
- sub
A character vector of protein residue substitutions to be classified. The amino acids must be provided as one-letter symbols.
- mode
If both
poi
andsub
contain more than one element,mode
specifies how these two inputs are combined. Ifmode = 'recycle'
the shortest vector is recycled to match the length of the longest. Ifmode = 'expand_grid'
, all combinations between elements ofpoi
andsub
are combined.- sort
Whether to sort the output by
gd
, or not. Default isFALSE
.- keep_self
Whether to keep those results in the output that correspond to residues being the same in
ref
andsub
. Default isTRUE
. But it will be useful to change it toFALSE
if want to compare the results with those provided by http://agvgd.hci.utah.edu/ that filters them out.- digits
Integer indicating the number of decimal places to be used in rounding
gv
andgd
values. Default is2
. Note that the calculation of theprediction
variable won't be affected by rounding ofgv
andgd
, as it is calculated prior to the rounding.
Value
A tibble whose observations refer to the combination alignment position and amino acid substitution; consists of seven variables:
- res
Position of the amino acid residue in the reference protein (first sequence in the alignment). This position corresponds to
poi
minus the gaps in the alignment.- poi
Position of interest, i.e. the alignment position at which the amino acid substitution is being assessed.
- ref
Reference amino acid, i.e. the amino acid in the first sequence of the alignment, at the position of interest.
- sub
Amino acid substitution being assessed.
- gv
Grantham variation score.
- gd
Grantham difference score.
- prediction
Predicted effect of the amino acid substitution. This is classed as C0, C15, C25, C35, C45, C55, or C65, with C65 most likely to interfere with function and C0 least likely.
References
Tavtigian, S.V., Deffenbaugh, A. M., Yin, L., Judkins, T., Scholl, T., Samollow, P.B., de Silva, D., Zharkikh, A., Thomas, A. Comprehensive statistical study of 452 BRCA1 missense substitutions with classification of eight recurrent substitutions as neutral. Journal of Medical Genetics 43, 295--305 (2006). doi:10.1136/jmg.2005.033878 .
Mathe, E., Olivier, M., Kato, S., Ishioka, C., Hainaut, P., Tavtigian, S.V. Computational approaches for predicting the biological effect of p53 missense mutations: a comparison of three sequence analysis based methods. Nucleic Acids Research 34, 1317--1325 (2006). doi:10.1093/nar/gkj518 .
Examples
# Read an alignment into R, e.g. the alignment for gene ATM.
alignment_ATM <- read_alignment(gene = 'ATM')
# Predict the impact of changing the first residue (Met) to a Serine (S).
agvgd(alignment = alignment_ATM, poi = 1, sub = 'S')
#> # A tibble: 1 × 7
#> res poi ref sub gv gd prediction
#> <int> <int> <chr> <chr> <dbl> <dbl> <chr>
#> 1 1 1 M S 0 135. C65
# `poi` can be a vector of positions, e.g., 3 thru 10, allow for prediction
# of multiple positions at once.
agvgd(alignment = alignment_ATM, poi = 3:10, sub = 'S')
#> # A tibble: 8 × 7
#> res poi ref sub gv gd prediction
#> <int> <int> <chr> <chr> <dbl> <dbl> <chr>
#> 1 2 3 S S 87.4 0 C0
#> 2 3 4 L S 31.8 123. C35
#> 3 4 5 V S 96.2 99.1 C15
#> 4 5 6 L S 92.4 57.8 C0
#> 5 6 7 N S 108. 6.2 C0
#> 6 7 8 D S 223. 50.2 C0
#> 7 8 9 L S 0 144. C65
#> 8 9 10 L S 102. 95.9 C15
# `poi` expects a position in the frame of reference of the alignment, i.e.
# an alignment position (a column index). However, if you know instead
# the residue position in the reference sequence (first sequence in the
# alignment), then you may use the function `res_to_poi()`
# to convert from residue position to alignment position.
#
# Example: The second residue in the reference sequence of the ATM alignment
# is a Serine, after a Methionine. In the alignment, there is a gap between
# the two residues, so the alignment is 3 but the residue position on the
# protein is 2.
(poi2 <- res_to_poi(alignment_ATM, 2))
#> [1] 3
agvgd(alignment = alignment_ATM, poi = poi2, sub = 'A')
#> # A tibble: 1 × 7
#> res poi ref sub gv gd prediction
#> <int> <int> <chr> <chr> <dbl> <dbl> <chr>
#> 1 2 3 S A 87.4 65.6 C0
# Because changes are context-dependent, i.e. they depend on the residue
# variation observed at a given alignment position, the same reference
# residue when replaced with the same substitution will in general have
# a different predicted impact.
agvgd(alignment = alignment_ATM, poi = 9:10, sub = 'S')
#> # A tibble: 2 × 7
#> res poi ref sub gv gd prediction
#> <int> <int> <chr> <chr> <dbl> <dbl> <chr>
#> 1 8 9 L S 0 144. C65
#> 2 9 10 L S 102. 95.9 C15
# Use the ancillary function `amino_acids()` to get a vector of one-letter
# residue substitutions if you want to quickly assess the impact of all
# possible substitutions.
agvgd(alignment = alignment_ATM, poi = 1, sub = amino_acids())
#> # A tibble: 20 × 7
#> res poi ref sub gv gd prediction
#> <int> <int> <chr> <chr> <dbl> <dbl> <chr>
#> 1 1 1 M S 0 135. C65
#> 2 1 1 M R 0 91.6 C65
#> 3 1 1 M L 0 14.3 C0
#> 4 1 1 M P 0 86.6 C65
#> 5 1 1 M T 0 81.0 C65
#> 6 1 1 M A 0 84.4 C65
#> 7 1 1 M V 0 21.5 C15
#> 8 1 1 M G 0 127. C65
#> 9 1 1 M I 0 10.1 C0
#> 10 1 1 M F 0 28.5 C25
#> 11 1 1 M Y 0 35.2 C35
#> 12 1 1 M C 0 196. C65
#> 13 1 1 M H 0 86.3 C65
#> 14 1 1 M Q 0 101. C65
#> 15 1 1 M N 0 141. C65
#> 16 1 1 M K 0 94.5 C65
#> 17 1 1 M D 0 160. C65
#> 18 1 1 M E 0 126. C65
#> 19 1 1 M M 0 0 C0
#> 20 1 1 M W 0 66.6 C65
# Parameter `mode` gives you flexibility on how to combine `poi` and `sub`.
agvgd(alignment = alignment_ATM, poi = 3:4, sub = c('A', 'V'))
#> # A tibble: 2 × 7
#> res poi ref sub gv gd prediction
#> <int> <int> <chr> <chr> <dbl> <dbl> <chr>
#> 1 2 3 S A 87.4 65.6 C0
#> 2 3 4 L V 31.8 0 C0
# Use 'expand_grid' for all combinations.
agvgd(alignment = alignment_ATM, poi = 3:4, sub = c('A', 'V'), mode = 'expand_grid')
#> # A tibble: 4 × 7
#> res poi ref sub gv gd prediction
#> <int> <int> <chr> <chr> <dbl> <dbl> <chr>
#> 1 2 3 S A 87.4 65.6 C0
#> 2 2 3 S V 87.4 82.7 C15
#> 3 3 4 L A 31.8 64.4 C25
#> 4 3 4 L V 31.8 0 C0