Align-GVGD (A-GVGD) — agvgd • agvgd

This function implements the Align-GVGD (A-GVGD) method described in Tavtigian et al. (2006).

A-GVGD combines multiple sequence alignment of orthologous sequences with the Grantham distance to classify missense variants, i.e. to distinguish human disease susceptibility missense changes from changes of little clinical significance.

The biochemical variation at each alignment position is converted to a Grantham Variation score (GV) and the difference between these properties and those of the variant amino acid being assessed are calculated and a Grantham Difference score generated (GD). The predicted effect is classed as C0, C15, C25, C35, C45, C55, or C65, with C65 most likely to interfere with function and C0 least likely.

Usage

agvgd(
  alignment,
  poi,
  sub,
  mode = c("recycle", "expand_grid"),
  sort = FALSE,
  keep_self = TRUE,
  digits = 2L
)

Arguments

alignment: A character matrix or an alignment object obtained with read_alignment(). Rows are expected to be sequences of single characters (protein residues), and columns the alignment positions. The first row must be the reference sequence, i.e. the sequence whose substitutions will be evaluated against.
poi: A whole number indicating the position of interest (POI).
sub: A character vector of protein residue substitutions to be classified. The amino acids must be provided as one-letter symbols.
mode: If both poi and sub contain more than one element, mode specifies how these two inputs are combined. If mode = 'recycle' the shortest vector is recycled to match the length of the longest. If mode = 'expand_grid', all combinations between elements of poi and sub are combined.
sort: Whether to sort the output by gd, or not. Default is FALSE.
keep_self: Whether to keep those results in the output that correspond to residues being the same in ref and sub. Default is TRUE. But it will be useful to change it to FALSE if want to compare the results with those provided by http://agvgd.hci.utah.edu/ that filters them out.
digits: Integer indicating the number of decimal places to be used in rounding gv and gd values. Default is 2. Note that the calculation of the prediction variable won't be affected by rounding of gv and gd, as it is calculated prior to the rounding.

Value

A tibble whose observations refer to the combination alignment position and amino acid substitution; consists of seven variables:

res: Position of the amino acid residue in the reference protein (first sequence in the alignment). This position corresponds to poi minus the gaps in the alignment.
poi: Position of interest, i.e. the alignment position at which the amino acid substitution is being assessed.
ref: Reference amino acid, i.e. the amino acid in the first sequence of the alignment, at the position of interest.
sub: Amino acid substitution being assessed.
gv: Grantham variation score.
gd: Grantham difference score.
prediction: Predicted effect of the amino acid substitution. This is classed as C0, C15, C25, C35, C45, C55, or C65, with C65 most likely to interfere with function and C0 least likely.

References

Tavtigian, S.V., Deffenbaugh, A. M., Yin, L., Judkins, T., Scholl, T., Samollow, P.B., de Silva, D., Zharkikh, A., Thomas, A. Comprehensive statistical study of 452 BRCA1 missense substitutions with classification of eight recurrent substitutions as neutral. Journal of Medical Genetics 43, 295--305 (2006). doi:10.1136/jmg.2005.033878 .
Mathe, E., Olivier, M., Kato, S., Ishioka, C., Hainaut, P., Tavtigian, S.V. Computational approaches for predicting the biological effect of p53 missense mutations: a comparison of three sequence analysis based methods. Nucleic Acids Research 34, 1317--1325 (2006). doi:10.1093/nar/gkj518 .

Examples

# Read an alignment into R, e.g. the alignment for gene ATM.
alignment_ATM <- read_alignment(gene = 'ATM')

# Predict the impact of changing the first residue (Met) to a Serine (S).
agvgd(alignment = alignment_ATM, poi = 1, sub = 'S')
#> # A tibble: 1 × 7
#>     res   poi ref   sub      gv    gd prediction
#>   <int> <int> <chr> <chr> <dbl> <dbl> <chr>     
#> 1     1     1 M     S         0  135. C65       

# `poi` can be a vector of positions, e.g., 3 thru 10, allow for prediction
# of multiple positions at once.
agvgd(alignment = alignment_ATM, poi = 3:10, sub = 'S')
#> # A tibble: 8 × 7
#>     res   poi ref   sub      gv    gd prediction
#>   <int> <int> <chr> <chr> <dbl> <dbl> <chr>     
#> 1     2     3 S     S      87.4   0   C0        
#> 2     3     4 L     S      31.8 123.  C35       
#> 3     4     5 V     S      96.2  99.1 C15       
#> 4     5     6 L     S      92.4  57.8 C0        
#> 5     6     7 N     S     108.    6.2 C0        
#> 6     7     8 D     S     223.   50.2 C0        
#> 7     8     9 L     S       0   144.  C65       
#> 8     9    10 L     S     102.   95.9 C15       

# `poi` expects a position in the frame of reference of the alignment, i.e.
# an alignment position (a column index). However, if you know instead
# the residue position in the reference sequence (first sequence in the
# alignment), then you may use the function `res_to_poi()`
# to convert from residue position to alignment position.
#
# Example: The second residue in the reference sequence of the ATM alignment
# is a Serine, after a Methionine. In the alignment, there is a gap between
# the two residues, so the alignment is 3 but the residue position on the
# protein is 2.
(poi2 <- res_to_poi(alignment_ATM, 2))
#> [1] 3
agvgd(alignment = alignment_ATM, poi = poi2, sub = 'A')
#> # A tibble: 1 × 7
#>     res   poi ref   sub      gv    gd prediction
#>   <int> <int> <chr> <chr> <dbl> <dbl> <chr>     
#> 1     2     3 S     A      87.4  65.6 C0        

# Because changes are context-dependent, i.e. they depend on the residue
# variation observed at a given alignment position, the same reference
# residue when replaced with the same substitution will in general have
# a different predicted impact.
agvgd(alignment = alignment_ATM, poi = 9:10, sub = 'S')
#> # A tibble: 2 × 7
#>     res   poi ref   sub      gv    gd prediction
#>   <int> <int> <chr> <chr> <dbl> <dbl> <chr>     
#> 1     8     9 L     S        0  144.  C65       
#> 2     9    10 L     S      102.  95.9 C15       

# Use the ancillary function `amino_acids()` to get a vector of one-letter
# residue substitutions if you want to quickly assess the impact of all
# possible substitutions.
agvgd(alignment = alignment_ATM, poi = 1, sub = amino_acids())
#> # A tibble: 20 × 7
#>      res   poi ref   sub      gv    gd prediction
#>    <int> <int> <chr> <chr> <dbl> <dbl> <chr>     
#>  1     1     1 M     S         0 135.  C65       
#>  2     1     1 M     R         0  91.6 C65       
#>  3     1     1 M     L         0  14.3 C0        
#>  4     1     1 M     P         0  86.6 C65       
#>  5     1     1 M     T         0  81.0 C65       
#>  6     1     1 M     A         0  84.4 C65       
#>  7     1     1 M     V         0  21.5 C15       
#>  8     1     1 M     G         0 127.  C65       
#>  9     1     1 M     I         0  10.1 C0        
#> 10     1     1 M     F         0  28.5 C25       
#> 11     1     1 M     Y         0  35.2 C35       
#> 12     1     1 M     C         0 196.  C65       
#> 13     1     1 M     H         0  86.3 C65       
#> 14     1     1 M     Q         0 101.  C65       
#> 15     1     1 M     N         0 141.  C65       
#> 16     1     1 M     K         0  94.5 C65       
#> 17     1     1 M     D         0 160.  C65       
#> 18     1     1 M     E         0 126.  C65       
#> 19     1     1 M     M         0   0   C0        
#> 20     1     1 M     W         0  66.6 C65       

# Parameter `mode` gives you flexibility on how to combine `poi` and `sub`.
agvgd(alignment = alignment_ATM, poi = 3:4, sub = c('A', 'V'))
#> # A tibble: 2 × 7
#>     res   poi ref   sub      gv    gd prediction
#>   <int> <int> <chr> <chr> <dbl> <dbl> <chr>     
#> 1     2     3 S     A      87.4  65.6 C0        
#> 2     3     4 L     V      31.8   0   C0        

# Use 'expand_grid' for all combinations.
agvgd(alignment = alignment_ATM, poi = 3:4, sub = c('A', 'V'), mode = 'expand_grid')
#> # A tibble: 4 × 7
#>     res   poi ref   sub      gv    gd prediction
#>   <int> <int> <chr> <chr> <dbl> <dbl> <chr>     
#> 1     2     3 S     A      87.4  65.6 C0        
#> 2     2     3 S     V      87.4  82.7 C15       
#> 3     3     4 L     A      31.8  64.4 C25       
#> 4     3     4 L     V      31.8   0   C0