Skip to contents

Introduction

One of the greatest challenges in the study of cancer risk is the assessment of unclassified missense variants.

Besides using biochemical and cell-based transcriptional assays to assess the structural and functional defects associated with missense variants, one can also use bioinformatics analysis based on multiple sequence alignment data and protein structure prediction to approach this problem.

Align-GVGD is one such method that uses protein multiple sequence alignment (PMSA) data to provide cancer risk estimates.

In this vignette we show you how you can use agvgd to reproduce the results obtained by Lee et al. (2010)1 on the study of missense variations in the BRCT Domain of BRCA1 gene.

In Lee’s paper, Align-GVGD prediction scores are part of their cross-validation of structural and functional assays, as indicated in the column labeled AG in Figure 3 of said paper.

Reproducing the AGVGD scores

Data sets

To reproduce the AGVGD prediction scores, we need two data sets:

  • The protein sequence alignment of BRCA1 that included the BRCT Domain
  • The list of the 117 BRCA1 missense variants studied1

Both these data sets are already bundled with agvgd.

To read in the alignment of BRCA1 use the function read_alignment() and the name of the gene "BRCA1":

brca1_alignment <- read_alignment("BRCA1")
print(brca1_alignment, line_width = 200)
#> B1_Hsap_NP_009225    1 MDLSALRVEEVQNVINAMQKILECPICLELIKEPVSTKCDHIFCKFCMLKLLN-QKKGPSQCPLCKNDITKRSLQESTRFSQLVEELLKIICAFQLDTGLEYANSYNFAKKENN--SPEHL--KDEVSIIQSMGYRNRAKRLLQSEP--ENPSLQETSLSVQLSNLGTV-RTLRTKQRIQPQKTS--VYIELGSDSSE-D
#> B1_Ptro_AAG43492     1 MDLSALRVEEVQNVINAMQKILECPICLELIKEPVSTKCDHIFCKFCMLKLLN-QKKGPSQCPLCKNDITKRSLQESTRFSQLVEELLKIICAFQLDTGLEYANSYNFAKKENN--SPEHL--KDEVSIIQSMGYRNRAKRLLQSEP--ENPSLQETSLSVQLSNLGTV-RTLRTKQRIQPQKKS--VYIELGSDSSE-D
#> B1_Ggor_AAT44835     1 MDLSALRVEEVQNVINAMQKILECPICLELIKEPVSTKCDHIFCKFCMLKLLN-QKKGPSQCPLCKNDITKRSLQESTRFSQLVEELLKIICAFQLDTGLEYANSYNFAKKENN--SPEHL--KDEVSIIQSTGYRSRAKRLLQSEP--ENPSLQETGLSVQVSNLGTV-RTLRTKQRIQPQKKS--VYIELGSDSSE-D
#> B1_Ppyg_AAT44834     1 MDLSAVRVEEVQNVINAMQKILECPICLELIKEPVSTKCDHIFCKFCMLKLLN-QKKGPSQCPLCKNDITKRSLQESTRFSQLVEELLKIICAFQLDTGLQYANSYNFAKKENN--SPEHL--KDEVSIIQSMGYRNRAKRLLQSEP--ENPSLQETSPSVQLSNLGTV-RTLRTKQRIQPQKKS--VYIELGSDSSE-D
#> B1_Mmul_AAT44833     1 MDLSAVRVEEVQNVINAMQKILECPICLELIKEPVSTKCDHIFCRFCMLKLLN-QKKGPSQCPLCKNDITKRSLQESTRFSQLVEELLKIIHAFQLDTGLQFANSYNFAKKENH--SPEHL--KDEVSIIQSMGYRNRAKRLLQSEP--ENPSLQETSLSVPLSNLGIV-RTLRTKQQIQPQKKS--VYIELGSDSSE-D
#> B1_Mmus_AAD00168     1 MDLSAVQIQEVQNVLHAMQKILECPICLELIKEPVSTKCDHIFCKFCMLKLLN-QKKGPSQCPLCKNEITKRSLQGSTRFSQLAEELLRIMAAFELDTGMQLTNGFSFSKKRNN--SCERL--NEEASIIQSVGYRNRVRRLPQVEP--GNATLKD-SLGVQLSNLGIV-RSVKKNRQTQPRKKS--VYIELDSDSSE-E
#> B1_Cfam_AAC48663     1 MDLSADRVEEVQNVLNAMQKILECPICLELIKEPVSTKCDHIFCKFCMLKLLN-QRKGPSQCPLCKNDITKRSLQESTRFSQLVEELLKIIHAFELDTGLQFADSYNFSKKENN--SPEHL--KEEVSIIQSMGYRNRAKRLRQSEP--ENPTL-ETSLSVQLSNLGIV-RSLRTKQQIQPQNKS--VYIELGSDSSE-D
#> B1_Btau_NP_848668    1 MDLSADHVEEVQNVLNAMQKILECPICLELIKEPVSTKCDHIFCKFCMLKLLN-QKKGPSQCPLCKNDITKRSLQESTRFSQLVEELLKIIHAFELDTGLQFANSYNFSRKEDN--SPEHL--KEEVSIIQSMGYRNRAKRLWQSEP--ENPTLQETSLTVELSNLGIV-RSLRTKQQTQSQNKS--VYIELGSDSSE-D
#> B1_Mdom_temp1.pep    1 MDLPTVTIEEVKNVLIGMQKILECPICLELIKEPVSTTCDHIFCRFCMLKLLS-KKKGPSQCPLCKNNITKRSLRESTRFNQLVEGLLKTIRAFELDTGFQFSNTQDFSKWERR--TPEPL--KKEAATIQSIGYRNRSKRFKASES--ENSTL-ESSLGVQLYDLGIR-KGSLRKQKKCIKNNA--VYIKLGSDSSE-D
#> B1_Ggal_NP_989500    1 MDLSVIAIGDVQNVLSAMQKNLECPVCLDVIKEPVSTKCDHVFCRFCMFKLLSRKKKGVIQCPLCKTEVTKRSLKENSRFKQLIEGLLEAISAFELDTGVKFLSSRYFPKTSTEVATAELL--GNNSSVIQSKGFRNRKRGAKENRQ--DSCTL-EANVDPQLTDNRVKGSSVRSKKQKCGIEKG--VLIELGTDSSE-E
#> B1_Xlav_AAL13037     1 MTCSRMDIEGICSVISVMQKNLECPICLELMKEPVATKCDHIFCKFCMLQLLSKKKKGTVPCPLCKTEVTRRSLQESHRFKLLVEGQLKIIKAFEFDSGYKFFPSQEHTKGLDS--TIEDVLVKEDQSIVHCKGYRNRKKGVFNRKTYEETGML-SVSKAEEQF-AKEVTRLIPCRQK-KPKKEAALIFSNCVPDSSDGD
#> B1_Tnig_AAR89523     1 ME--APTATDVKKRISLLWETLQCPICLDLMSEPVSTKCDHQFCRFCMLKLLSNTKQNKANCPVCKSKITKRSLQESPGFQRLVSGLQEIILAYENDTGTNYFTGLS---------------------------------------------------------------------KQAQPPHVA--------------- 
#> 
#> B1_Hsap_NP_009225  201 TVNKATYCSVGDQELLQITPQGTRDEISLDS------AKKAACEFSETDVTNTEHHQPSNNDLNTTEKRAAERHPEKYQGSSVSNLHVEP--CGTNT---HASSLQHENSSLLLTKDRMNVEKAEFCNKSKQPGLARSQHNRWAGSKETCNDRRTPSTE--KKVDLNADPLCERKEWNKQKL-PCSENPRDTEDVPWIT-
#> B1_Ptro_AAG43492   201 TVNKATYCSVGDQELLQITPQGTRDEISLDS------AKKAACEFSETDVTNTEHHQPSNNDLNTTEKRATERHPEKYQGSSVSNLHVEP--CGTNT---HASSLQHENSSLLLTKDRMNVEKAEFCNKSKQPGLARSQHNRWAGSKETCNDRRTPSTE--KKVDLNADPLCERKEWNKQKL-PCSENPRDTEDVPWIT-
#> B1_Ggor_AAT44835   201 TVNKATYCSVGDQELLQITPQGTRDEISLDS------AKKAACEFSETDVTNTEHHQPSNNDLNTTEKRATERHPEKYQGSSVSNLHVEP--CGTNT---HASSLQHENSSLLLTKDRMNVEKAEFCNKNKQPGLARSQHNRWAGSKETCNDRRTPSTE--KKVDLNADPLCERNEWNKQKL-PCSENPRDTEDVPWIT-
#> B1_Ppyg_AAT44834   201 TVNKATYCSVGDQELLQITPQGTSDEISLDS------AKKAACEFSETDVTNTEHHQPSNNDLNTTEKRATERHPEKYQGSSVSNLHVEP--CGTNT---HASSLQHENSSLLLTKDRMNVEKAEFCNKSKQPGLARSQHNRWAGSKETCNDRQTPSTE--KKVDLNADPLCERKEWNKQKL-PCSENPRDTEDVPWIT-
#> B1_Mmul_AAT44833   201 TVNKATYCSVGDQELLQITPQGTRDETSLDS------AKKAACEFSEKDITNTEHHQSSNNDLNTTEKHATERHPEKYQGSSVSNLHVEP--CGTNT---HASSLQHENS-LLLTKDRMNVEKAEFCNKSKQPGLARSQHNRWTGSKETCNDRQTPSTE--KKVDLNANALYERKEWNKQKL-PCSENPRDAEDVPWIT-
#> B1_Mmus_AAD00168   201 TVTKPGDCSVRDQELLQTAPQEAGDEGKLHS------AEEAACEFSE-GIRNIEHHQCS-DDLNPTENHATERHPEKCQSISISNVCVEP--CGTDA---HASSLQPETSSLLLIEDRMNAEKAEFCNKSKQPGIAVSQQSRWAASKGTCNDRQVPSTG--EKVGPNADSLSDREKWTHPQS-LCPENSGATTDVPWIT-
#> B1_Cfam_AAC48663   201 TVNKASSCSVGDDEL-EITSQGARAEASLNP------AKKAACEFS-GDITNIEHHQSSNKDLTTTEKHATKKHPEKYQGISVSNLHVEP--CGTNT---HASSLQHENSSLLLTKHRMNVEKAEICNNSKQPGLARSQQSRWAESKETCNDRQIPSTE--KKVVVNADLLCGRKELNKQKP-PHSDSPRDSQDVPWIT-
#> B1_Btau_NP_848668  201 TVNKASYFSVGDHELLEITPQGAKAKTNLNP------AEKAACEFSEKDITNTEHHQLSIKDLITTQKHATETHPEKYQGISVSDFHVEP--CGTDT---HASSLQHENSSLLLTENRLNVEKAEFCNKSKQPVLVKSQQSRWAESKGTCKDRQIPSTE--KKIVLNTDPLYRRKELRKQKP-ACPDSPGDSQDVPWVT-
#> B1_Mdom_temp1.pep  201 GVKNAICNSVKDQGLCQTSPKGTR----LKS------KEKAEYEFSERAIKSLQQHQSNTVDVHVINENATEGHSEESRGVSSSDLNMKP--WNTDI---HASSLPPEITSVLTNTVSMNIEKAELCDKSKRPGLARSQQISQDNSKEKCSAGKTSYAE--VPHELNPHHLYERQELEEQPECPKYPRGNPQNCLSGTK-
#> B1_Ggal_NP_989500  201 HFILASSTGLEDKEELEEPKSAEKYGSSCNTQPLKLGAKEIILPNVIGETDFLKEALDKKSMLNITEHIKCNQVNTIEGQSSPLNVFDADLLTGQRDGIGNASPLKND-TSFLKNAEEMDVEETQCSHKNQELDLEDSSEGRLDKIKE--KDICVPSVEDVEMCEPMDDSLLEKEPPVEKPLQPKIPHCPTLNEVSTKG-
#> B1_Xlav_AAL13037   201 LLNKENGLRNDCSPL-----HYEKEDTQIPEMEEMVESDLAECEFAESAGSNLLGFDG----------------PEGIPEISAETSINAAGNCDFYGRKTEQFPNDHHCSFKQNIADAEQNKRNQHCGNVPFAPMGKSNLDEKETVETDFDNQHNDS------------------------------NPENNDPLGKVTK
#> B1_Tnig_AAR89523   201 ------------------------------------------------DIKAQHHNKVSVMDASCAEDDYEEALPK-----SQSSTTAAQDGFARLMGLKDTSPLTTGLDSGLGEAPPTCDKKMYSPTKVENVPLEPA----FIPDEDERSDLQTPSKKKSKK-DLEPDKILDQR------------------------- 
#> 
#> B1_Hsap_NP_009225  401 -LNSSIQKVNEWFSRSDELLGSDDSHDGESESNAKVADVLDVLNEVDEYSGSS--EKIDLLASDPHEALICKSERVHSKSVESN-IEDKIFGKTYRKKASLPNLSHVTE---NLIIGAFVT--EPQIIQERPLTN----KLKRKRRPTSGLHPEDFIKKADLA-VQKTPEMINQGTNQT---EQNGQVMNITNSGHE--N
#> B1_Ptro_AAG43492   401 -LNSSIQKVNEWFSRSDELLGSDDSHDGGSESNAKVADVLDVLNEVDEYSGSS--EKIDLLASDPHEALICKSERVHSKSVESN-TEDKIFGKTYRRKASLPNLSHVTE---NLIIGAFVT--EPQIIQERPLTN----KLKRKRRATSGLHPEDFIKKADLA-VQKTPEMINQGTNQM---EQNGQVMNITNSGHE--N
#> B1_Ggor_AAT44835   401 -LNSSIQKVNEWFSRSDELLGSDDSHDGGSESNAKVADVLDVLNEVDEYSGSS--EKIDLLASDPHEALICKSERVHSKSVESN-IEDKIFGKTYRRKASLPSLSHVTE---NLIIGAFVT--EPQIIQERPLTN----KLKRKRRATSGLHPEDFIKKADLA-VQKTPEMINQGTNQM---EQNGQVMNITNSGHE--N
#> B1_Ppyg_AAT44834   401 -LNSSIQKVNEWFSRSDELLGSDDSHDGRSESNAKVADVLDVLNEVDEYSGSS--EKIDLLASDPHEALICKSERVHSKSVESN-IEDKIFGKTYRRKASLPNLSHVTE---NLIIGAFVT--EPQIIQERPLTN----KLKRKRRATSGLHPEDFIKKADLA-VQKTPEMINQGTNQM---EQNGQVMNITNSGHE--N
#> B1_Mmul_AAT44833   401 -LNSSIQKVNEWFSRSDELLSSDDSHDGGSESNAKVADVLDVLNEVDEYSGSS--EKIDLLASDPHEPLICKSERVHSSSVESN-IKDKIFGKTYRRKANLPNLSHVTE---NLIIGALVT--ESQIMQERPLTN----KLKRKRRTTSGLHPEDFIKKADLA-VQKTPEIINQGTNQM---EQNGQVMNITNSAHE--N
#> B1_Mmus_AAD00168   401 -LNSSVEKVNEWFSRTGEMLTSDSASARRHESNAEAAVVLEVSNEVDGGFSSS--RKTDLVTPDPHHTLMCKSGRDFSKPVEDN-ISDKIFGKSYQRKGSRPHLNHVTE-----IIGTFIT--EPQITQEQPFTN----KLKRKR--STSLQPEDFIKKADSAGVQRTPDNINQGTDLM---EPNEQAVSTTSNCQE--N
#> B1_Cfam_AAC48663   401 -LNSSIRKVNEWFSRSDEILTSDDSHDRGSELNTEVGGAVEVPNEVGEYSGSS--EKIDLMASDPQDAFICESERVHTKPVGGN-IEDKIFGKTYRRKASLPKVSHTTE---VLTIGACAI--EPQTMQTHPFMN----KAEHKRRTTSSLHPEDFIKKVELGIVPKTPEKLIEGINQI---KRDGHVINITNNGPE--N
#> B1_Btau_NP_848668  401 -LNNSIQKVNDWFSRSDEILTSDDSCDGGSESNNEVAGAVEIPNKVDGYSGSS--EKINLMASDPHGTLI--HERVHSKPVESN-IEDKIFGKTYRRKSSLPNFSHIAE---DLILGAFTV--EPQITQEQPLTN----KLKCKRRGTSGLQPEDFIKKVDLTIVPKTPEKMTEGTDQT---EQKCHGMNITSDGHE--N
#> B1_Mdom_temp1.pep  401 -LKSSIQKVNDWLSRSNDILVSDYSSVRIHEQNAEMASVLEIGHPDTTDGNSSISGKTDLVADSTDGAWLHMSERSCPRQAENNNIEDKIFGKTYHRKSVHTNLNYVTE---NLIVGAVAS--DCLIPPEHVKQT----RLKRKRKTISDLQPEDFIKKTDTEFTHKSPEKKIHAVDQILEQEQNGQVMNTVNGHLE--Q
#> B1_Ggal_NP_989500  401 -LNQSIQKVNEWFSKSSRILSSSSSQNDHAEA-TDASGEGDI-SLSDKDSCIS--EKTNPIVDSVEFAVIERNKR-WTKQTTYS-IEDKIFGKTYERGRKSNPSTILRD-----ILPATK--KEDAAAEEGCLNNSRKDRLKRKRKSACILQPEDFIKKKDLEEADRCPQGIKSSLGDA---EKE--------KCDENSA
#> B1_Xlav_AAL13037   401 LMRRSTERVNEWLLKTNQ----DFSTLSAEEDPILDALALQNKETSDKRSCSS--DDSELMPVLHKHAEKGISGGGFDKPA-VG-VKDKIFCKVYKRERKAMPPNNITCVAEVHHDSALETGKENMTLEYGTGMS----HLMSKRKMVYSLNPENTSKKNDLANINGSINVFPDCIS-----DANLELEDKSEADSNSAD
#> B1_Tnig_AAR89523   401 -QKKSLEKVAEWLMNVP------------SEQSLEMENPEEDGDDSDSRSSTS--T-IDLGQL--------HRGTNPTRGRAKA-LEDQVFGAVYKRERRGKEMVKPTE--AALEVARFNLSVENTSEDEN----------------RDNKQEEHFIREREK----------NTGSNVL---EGEVEFLEDCRGSLEPTH 
#> 
#> B1_Hsap_NP_009225  601 KTKGDSIQNEKNPNP---IESLEKESAFKTKAEPISSSISNMELELNIHNSKAPKKNRLRRKS---STRHIHALELVVSRNLSP-----------------------PNCTELQIDSCSSSEEIKK-KKYNQMPVRHSRNLQLMEGKEPATGAKKSNKPNEQTSKRHDSDTFPELKLT----NAPGSFTKCSNTSELKEF
#> B1_Ptro_AAG43492   601 KTKGDSIQNEKNPNP---IESLEKESAFKTKAEPISSSISNMELELNIHNSKAPKKNRLRRKS---STRHIHALELVVSRNLSP-----------------------PNCTELQIDSCSSSEEIKK-KKYNQMPVRHSRNLQLMEDKEPATGVKKSNKPNEQTSKRHDSDTFPELKLT----NAPGSFTNCSNTSELKEF
#> B1_Ggor_AAT44835   601 KTKGDSIQNEKNPNP---IESLEKESAFKTKAEPISSCISNMELELNIHNSKAPKKNRLRRKS---STRHIHALELVVSRNLSP-----------------------PNCTELQIDSCSSSEEIKK-KKYNQMPVRHSRNLQLMEDKEPATGAKKSNKPNEQTSKRHDSDTFPELKLT----NAPGSFTNCSNTSELKEF
#> B1_Ppyg_AAT44834   601 KTKGDSIQNEKNPNP---IESLEKESAFKTKAEPISSSISNMELELNIHNSKAPKKNRLRRKS---STRHIHALELVVSRNLSP-----------------------PNCTELQIDSCSSSEEIKK-KKYNQMPVRHSRNLQLMEDKEPATGAKKSNKPNEQTSKRHDSDTFPELKLT----NAPGSFTNCSNTSELKEF
#> B1_Mmul_AAT44833   601 KTKGDSIQNEKNPNA---IESLEEESAFKTKAEPISSSINNMELELNIHNSKAPKKNRLRRKS---STRHIHALELVVSRNLSP-----------------------PNCTELQIDSCSSSEEIKK-KNYNQMPVRHSRNLQLMEDKESATGAKKSNKPNEQTSKRHASDTFPELKLT----KVPGSFTNCSNTSELKEF
#> B1_Mmus_AAD00168   601 QIAGSNLQKEKSAHP---TESLRKEPASTAGAKSISNSVSDLEVELNVHSSKAPKKNRLRRKS---SIRCALPLE-PISRNPSP-----------------------PTCAELQIDSCGSSEETKK-NHSNQQPAGHLREPQLIEDTEPAADAKK-NEPNEHIRKRRASDAFPEEKLM----NKAGLLTSCSSPRKSQGP
#> B1_Cfam_AAC48663   601 ETEGDYVQKEKNANP---TESLEKESAFRTKTEPMSSRISNMELELNSSSSKAPKKNRLRRKS---SARHTCALEFVVNRNLNP-----------------------PDHSELQIESCSSSEEMKK-QHLDQVPVRHNKTLQLMQDKEPAGRAKKSSKPGEQINKRLASHAFPELTLT----NVSGFFANYSSSSKPQEC
#> B1_Btau_NP_848668  601 KTKRDYVQKEQNANP---AESLEKESVFRTEAEPISISISNMELELNIHRSKAPK-NRLKRKS---STRKIPELELVVSRNPSL-----------------------PNHTELPIDSSSSNEEMKK-KHSSQMPVRQSQKLQLIGDKELTAGAK-NNKTYEQINKRLASDAFPELKLT----NTPGYFTNCSS--KPEEF
#> B1_Mdom_temp1.pep  601 KALVDGHVEEVNDALASELLPVEKESTFRTGTDSAAGSINHGGLKLNGRNAKMTKKDKLRKKS---SARIVHELELIVDKNPSS-----------------------SNETELQIDSYPSSEEIRKGNNSEQKQIRRSRRLQLLSEE-IAMETKKAYEPDEQAEKSCVNEVFPDLKMG----NIPACDTVSLTTESDQML
#> B1_Ggal_NP_989500  601 VKENPLLEKRKGSTL---AEFKERGLQWKNAAEKVSGKCSDGQLELNNSDQKSTKNACSTAKGCRHSTRTRCAIHL-VDRNPGS-----------------------FDLAEPLINSYPSNEEPSK-ADCERRQVRRSRRLQLLSEE-ITKETGKMR----VIKEAKNSDSGPEGSVFGVERNVLVHNSQCKDLRKQQDI
#> B1_Xlav_AAL13037   601 ACQSDIVHSSNTQIKQQCVESLANAGETRKKQE-LSCERSQEEKDFT----------------------QSGALGPKTRSQKSP-----------------------YGHSELHIESSQISNE-PS-NVTKQVEVRRSRRLLMLPKG-PGNKSNSNA--VKEMNEQENIAQIPEFRVKDTNKNNEISDTVPSQLKKKDTY
#> B1_Tnig_AAR89523   601 MSENDENKEDEVPHP---VSVIEEQQAETKGKRRTRSALQHVDSDLLKCTQKEPENTEPKR------T-QKRSRGIKSERAKSARTSKPLVLVAVENGEGGPKIGPRSEEVQVHIENYPSSGD--Q-EVPSGRSTRKSRRLRGFTKE-DTGKERSRSS----VPEKEHSSKHPKFEC-ETLNNVKSLDY----------- 
#> 
#> B1_Hsap_NP_009225  801 VNPSLPREEKEEKLETVK---VSNNAEDPKDLMLSGERVLQT-ERSVESSSISLVPGTDYGTQESISLLEVSTLGKAKTEPNKCVSQCAAFENPKGLIHGCSKD----NRNDTEGFKYPLGHEVNHS-RETSIEME---ESELDAQYLQNTFKVSKRQSFAPFSNPGNAEEECATFSAHSGSLKKQSPKVTFECEQKEEN
#> B1_Ptro_AAG43492   801 VNPSLPREEEEEKLETVK---VSNNAEDPKDLMLSGERVLQT-ERSVESSSISLVPGTDYGTQESISLLEVSTLGKAKTEPNKCVSQCAAFENPKGLIHGCSKD----TRNDTEGFKYPLGHEVNHS-RETSIEME---ESELDAQYLQNTFKVSKRQSFALFSNPGNPEEECATFSAHCRSLKKQSPKVTFEREQKEQN
#> B1_Ggor_AAT44835   801 VNPSLPREEKEEKLETVK---VSNNAEDPKDLMLSGERVLQT-ERSVESSSISLVPGTDYGTQESISLLEVSTLGKAKTEPHKCVSQCAAFENPKGLIHGCSKD----TRNDTEGFKYPLGHEVNHS-RETSIEME---ESELDAQYLQNTFKVSKRQSFALFSNPGNPEEECATFSAHSRSLKKQSPKVTFECEQKEEN
#> B1_Ppyg_AAT44834   801 VNPSLPREEKEEKLGTVK---VSNNAKDPKDLMLSGERVLQT-ERSVESSSISLVPGTDYGTQESISLLEVSTLGKAKTEPNKCVSQCAAFENPKELIHGCFKD----TRNDTEGFKYPLGHEVNHS-QETSIEME---ESELDTQYLQNTFKVSKRQSFALFSNPGNPEEECATFSAHSRSLKKQSPKVTFECEQKEEN
#> B1_Mmul_AAT44833   801 VNPSLSREEKEEKLETVK---VSNNAKDPKDLMLSGERVLQT-ERSVESSSISLVPDTDYGTQESISLLEVSTLGKAKTERNKCMSQCAAFENPKELIHGCSED----TRNDTEGFKYPLGSEVNHS-QETSIEIE---ESELDTQYLQNTFKVSKRQSFALFSNPGNPEEECATFSAHSRSLKKQSPKVTSECEQKEEN
#> B1_Mmus_AAD00168   801 VNPS-PQRTGTEQLETRQ---MSDSAKELGDRVLGGEPSGKTTDRSEESTSVSLVPDTDYDTQNSVSVLDAHTVRYARTGSAQCMTQFVASENPKELVHG-SNN----AGSGTEGLKPPLRHALNLS-QE-KVEME---DSELDTQYLQNTFQVSKRQSFALFSKPRSPQKDCA----HSVPSKELSPKVTAKGKQK-ER
#> B1_Cfam_AAC48663   801 INPGLRREEIEESRRMTQ---VSDSTRDPKELVLSGGRGLQT-ERSVESTSISLVLDTDYGTQDSISLLEADTLRKAKTVSNQQANLCATIENPKEPIHGCSKD----TRNDTEGFVVPLTCKDNHT-QETSIEME---ESELDTQCLRNMFKVSKRQSFALFSYPRDPEEDCVTVCPRSGAFGKQGPKVTLECGQKEES
#> B1_Btau_NP_848668  801 VHPSLQREE---NLGTIQ---VSNSTKDPKDLILREGKALQI-ERSVESTNISLVPDTDYSTQDSISLLEAKTPEKAKTAPNPCVSLCTATKNLKELIHRDFKD----TKNNTEGFQDLLGHDINYVIQETSREME---DSELDTQYLQNTFKASKRQTFALFSNPGNPQKECATVFAHSGSLRDQSPRDPLKCRQKEDS
#> B1_Mdom_temp1.pep  801 ASCSVTEEGHEKSLEAVQ------SSQDQEDLAISGGEGSQG-QRAKGNLEALEVPDTDWDTQDSTSLFPANTPQNSKAGPNPHRSQCGIMETPKELLDGCSSEN---TGSTTEDLRGLMRQGVKNA-SETTTEME---DSELDTQYLQNTFKRSKRQTFALGSSP---RQECMKPCAISQALHQRGLHNATDCGDHEKE
#> B1_Ggal_NP_989500  801 LSYMSLADRNGADLEANG---IQISSKNSDDMAK--NRSFFN-PTFSCQLSNFNSPSSKAGSQEGEMLGKLFLPQSPSKTVLHAASILTEEKRSWSCTV-FSQDKGCCSRNVPKDFRIGKSPMAKNA-SEFTMEAE---DSELDMQYLRNIFRSSKRQSFSLYPTP---MKACTTDDVASE-------KLNTSCPDQVEE
#> B1_Xlav_AAL13037   801 FPPCTSEDT--GELEN-------------------GIPVRKI-SDQNASLDNPIDPCKDEYSDT------------------------------------------------------LLHVAGDNV-QPDYMETE---ESELETQHIVKMFKTSKRTSFILESKEAENENRVNSAVEISQV------ETSNELPNVAEC
#> B1_Tnig_AAR89523   801 ---------KEQKWQADKNGCIYSQDMEEIENMDSGEKTSSR-PEEGSEQTLFEVPNTETLFQAACSVAE---------------------------------------------STAQPSNTARLL-TELEMENEQKNDSEQDTEQLVKSFKATKRKSFHLGSRPDVKRSRSLV-------------------QESDQS 
#> 
#> B1_Hsap_NP_009225 1001 QGKNESNIKPVQTVNITA-GFPVVGQK-DKPVDNAKCSIKGGSRFCLSSQFRGNETGLITPNKHGLLQNPYRIPPLFPIK--SFVKTKCKKNLLEENFEEHSMSP----EREMGNEN-IPSTVSTISRNNIRENVFKEASSSNINEVGSSTNEVGSSINEIGSSDENIQAELGRNRGPK--LNAMLRLGVLQPEVYKQSL
#> B1_Ptro_AAG43492  1001 QGKNESNIKPVQTVNITA-GFPVVCQK-DKPVDYAKCSIKGGSRFCLSSQFRGNETGLITPNKHGLLQNPYHIPPLFPIK--SFVKTKCKKNLLEENFEEHSMSP----EREMGNEN-IPSTVSTISRNNIRENVFKEASSSNINEVGSSTNEVGSSINEVGSSDENIQAELGRNRGPK--LNAMLRLGVLQPEVYKQSL
#> B1_Ggor_AAT44835  1001 QGKNESNIKPVQTVNITA-GFPVVCQK-DKPVDYAKCSIKGGSRFCLSSQFRGNETGLITPNKHGLLQNPYHIPPLFPIK--SFVKTKCKKNLLEENFEEHSMSP----EREMGNEN-IPSTVSTISRNNIRENVFKEASSSNINEVGSSTNEVGSSINEVGSSDENIQAELGRNRGPK--LNAMLRLGVLQPEVYKQSL
#> B1_Ppyg_AAT44834  1001 QGKNESNIKPVQTANITA-GFPVVCQK-DKPVDYAKCSIKGGSRFCLSSQFRGNETGLITPNKHGLSQNPYHIPPLFPIK--SFVKTKCKKNLLEENSEEHSMSP----EREMGNEN-IPSTVSIISRNNIRENVFKEASSSNINEVGSSTNEVGSSINEVGSSDENIQAELGRSRGPK--LNAMLRLGVLQPEVYKQSF
#> B1_Mmul_AAT44833  1001 QGKKESNIKPVQTVNITA-GFSVVCQK-DKPVDNAKCSIKGGSRFCLSSQFRGNETGLITPNKHGLLQNPYHIPPLFPVK--SFVKTKCNKNLLEENSEEHSVSP----ERAVGNENIIPSTVSTISHNNIRENAFKEASSSNINEVGSSTNEVGSSINEVGPSDENIQAELGRNRGPK--LNAVLRLGLLQPEVCKQSL
#> B1_Mmus_AAD00168  1001 QGQEEFEISHVQAVAATV-GLPVLCQE-GKLAADTMCD--RGSRLCPSSHYRSGENGLSATGKSGISQNSHFKQSVSPIR--SSIKTDNRKPLTEGRFERHTSST----EMAVGNENILQSTVHTVSLNN-RGNACQEAGSG--------------SIHEVCSTGDSFPGQLGRNRGPK--VNTVPPLDSMQPGVCQQSV
#> B1_Cfam_AAC48663  1001 QGKKESEIRHVQAVHTNA-GFSAVSQKAKKPGDFAKCSIKGVSRLCLSSQFKGKETELLIANYHGISQNPYHIPPLSPIR--SCVKTLCQENLSEEKFEQHSMSP----ERAVGNERVIQSTVSTISQNNIRECASKEVGSSSVNEVVSSTNEVGSSVNEVGSSGENIQAELGRNRGPK--LNAMLRLGLMQPEVCKQSL
#> B1_Btau_NP_848668 1001 QGKSESKSQHVQAICTTV-HFPVADQQDRTPGDDAKCSAKEVTRVCQSSQLRGHKTELVFANKQGVSEKPNLIPSLSPIK--SSVKTICKKSP-SEKFEEPVTSP----EKTLGSESIIQSAVSTISQNNIQESTFKEVSSNSVNEVGSSTNEVGSSVNEVGSSGENIQAEPGRNREPK--LRALLGLGLTQPEVYKQSL
#> B1_Mdom_temp1.pep 1001 KLGNRESNKPVQAKSAVM-NLAVVCQIERKPSDCASVSRI----CHIDPLHGGNDCEFIAGNNEEISQVPNQKQSVSPAG-SSTSKIIYTKKLLEENLDE--ISP----ETAVGNEILAQSSLSLVSPSNSRDCVSKVADLNRFIGIGSNGEGSQAEKHKNKESELNTLPKLKLVQPQV--CQQSFPQDNFSKEPEREEK
#> B1_Ggal_NP_989500 1001 RNSKYLKTENLQEEKTTAENLSSVCEKFETCESACVSPV---------SCFVSSAACVHTVENQDVSKVANHGNLTTLLRICAARNEDGNRPQKGEQGSEKTLSTGIGVESKL-RLSPVRSNRSQSDQSNTEEHAFQRTGLNAV--------------SETYFSSESNQVEKAEVVDDKGLMQHFQPSPMLCPTACQQNP
#> B1_Xlav_AAL13037  1001 RHTSLLSSAKEQSGALLKQGSPSSEPKKTSPIHMLKKTESKHSKMSRNRRGKVKPSSNSAKNTTGQPDNLNN-PTQGVTGSLYNKQVMSDFPMRLNVDEEHTNASAKGSQSSVADKSTAHSN------------------------------------------------------------------------------
#> B1_Tnig_AAR89523  1001 AGAEENRYVCSVDPSAPKHAEPAAAGKTDKVLVDS--------------------------------------------------QNMPGSDLISDSHLASLKRKASGLYSGCSAEGGCASASSPLPPNLESKHAGQSSKDSAICFATEKPSQISGS-------------------------QANFMMEDTQSSTLLQSV 
#> 
#> B1_Hsap_NP_009225 1201 PGSNCKHPEIK---------KQ-EYEEVVQTVNTDFSPYLISDNLEQPMGSSHASQVCSETPDDLL-DDGEIKEDTSFAENDIKESSAVFSK----SVQKGELSRSP-SPFTHTH--LAQGYRRGAKKLESSEENLSSEDEELPCFQHLLFGKVNNIPSQSTRHST---VATECLSKNTEENLLSLKNS--LNDCSNQVI
#> B1_Ptro_AAG43492  1201 PESNCKHPEIK---------KQ-EYEEVVQTVNTDFSPCLISDNLEQPMGSSHASQVCSETPDDLL-DDGEIKEDTSFAENDIKESSAVFSK----SVQRGELSRSP-SPFTHTH--LAQGYRRGAKKLESSEENLSSEDEELPCFQHLLFGKVSNIPSQSTRHST---VATECLSKNTEENLLSLKNS--LNDCSNQVI
#> B1_Ggor_AAT44835  1201 PGSNCKHPEIK---------KQ-EYEEVVQTVNTDFSPCLISDNLEQPMGSSHASQVCSETPDDLL-DDGEIKEDTSFAKNDIKESSAVFSK----NVQRGELSRSP-SPFTHTH--LAQGYRRGAKKLESSEENLSSEDEELPCFQHLLFGKVSNIPSQSTRHST---VATECLSKNTEENLLSLKNS--LNDCSNQVI
#> B1_Ppyg_AAT44834  1201 PGSNGKHPEIK---------KQ-EYEEVLQTVNTDFSPCLISDNLEQPMRSSHASQVCSETPNDLL-DDGEIKEDTSFAENDIKESSAVFSK----SVQRGELSRSP-SPFTHTH--LAQGYRRGAKKLESSEENLSSEDEELPCFQHLLFGKVSNIPSQSTRHST---VATECLSKNTEENLLSLKNS--LNDYSNQVI
#> B1_Mmul_AAT44833  1201 PISNCKHPEIK---------KQ-EHEELVQTVNTDFSPCLISDNLEQPMGSSHASEVCSETPDDLL-DDGEIKEDTSFAANDIKESSAVFSK----SIQRGELSRSP-SPFTHTH--LAQGYQKEAKKLESSEENLSSEDEELPCFQHLLFGKVSNIPSQTTRHST---VATECLSKNTEENLLSLKNS--LTDCSNQVI
#> B1_Mmus_AAD00168  1201 PVSD-KYLEIK---------KQ-EGEAVC----ADFSPCLFSDHLEQSM-SGKVFQVCSETPDDLL-DDVEIQGHTSFGEGDIMERSAVFNG----SILRRESSRSP-SPVTHAS--KSQSLHRASRKLESSEESDSTEDEDLPCFQHLL-SRISNTPELTRCRSA---V-TQGIPEKAEGTQAPWKGS--SSDCNNEVI
#> B1_Cfam_AAC48663  1201 SLSNCKHPEMK---------WQGQSEGAVLSVSADFSPCLISDNPEQPMGSSRSSQVCSETPDDLL-NGDKIKGKVSFAESDIKEKSAVFSK----SVQSGEFSRSP-SPSDHTR--LAQGYQRGTKKLESSEENMSSEEEELPCFQHLIFGKVTNMPSQSTSHNA---VAAEGLSNKTEENLDSLKNS--LSDISNQVP
#> B1_Btau_NP_848668 1201 PVSNCHHPEIK---------RQGENEDMPQAVKADFSPCLISDNLEQPTGSRHASQVCSETPDNLL-NDDEIKENSHFAESDIKERSAVFSE----SVQKGEFRGSP-GPFTHTH--LAQGHQRGAGKLES-EETVSSEDEELPCFQQLLFGKVTSTLSPSTGCNT---VATEGLSKETEGNLESLKSG--LNDCSGQVT
#> B1_Mdom_temp1.pep 1201 EENVKLTPAI----------------------SADSSPCL-----EQTKENTHFTQVWSETPD-LLDSDGELKENTSFAESDIKEQSAVFGKNG--KSSQVRKSRKNLSPLVHRNPSLSWKSRRQARKLESSEEEASSEEGELPSFQDLIFGKAASTPFQPTKNKT---IAKEFSANEAKENLAFLNRN-NMSVNNLQIP
#> B1_Ggal_NP_989500 1201 AEFNCELTEKKIITRERSLVKG-NEERVIQTVSTGLSEFSVREALEESLKGHSDFTDLSETPDGLLCSDNDTEESASFYVTNRKDTSAVFVKRSGAAWVK-EVNDSVVSCKPRSE--GIQRFRRRAQKLQSSDEE-SSDDEDLPSFQELMFGKSVSTPLQIQKQVT--------SVVQSSANPSTLPCSECLNE-NNEQK
#> B1_Xlav_AAL13037  1201 -------------------------------VNTDF-----------------ISDINSATPDGLL-HYMDKAEGNSSPWDTTRDKNAALLE----------------SCLPSAY--GSTGEKTPIPKVQSSEEGSSQ-------------GLFNQKPKCSSSGKADQKNSSKNPGKSQCRILSDFSGSSNNGKCSSQEM
#> B1_Tnig_AAR89523  1201 KADAAKEP---------------------------------------------LNAPSSLTPSGL-----------------------------QTSVPGGEMTHSQSS---REL--STRRKRTKAQKLDCLSDSSDCAEEEFPCLAEIL--NETASPGEHATR----------------------PPACPSPDCVN--- 
#> 
#> B1_Hsap_NP_009225 1401 LAKASQE---HHLSEETKCSASLFSSQCSELEDLTANTNTQDPFLI--GSSKQMRHQSESQGVGLSDKELVSDDE-ERGT--GLEEN--N-QEEQSMDSNLGEA-ASGCESETSVSEDCSGLSSQSDILTTQQRDTMQHNLIKLQQEMAELEAVLEQHGSQPSNSYPSIISDS-SALEDLRNPEQSTSEKAVLTSQKSSE
#> B1_Ptro_AAG43492  1401 LAKASQE---HHLSEETKCSASLFSSQCSELEDLTANTNTQDPFLI--GSSKQMRHQSESQGVGLSDKELVSDDE-ERGT--GLEEN--N-QEEQSMDSNLGEA-ASGCESETSVSEDCSGLSSQSDILTTQQRDTMQDNLIKLQQEMAELEAVLEQHGSQPSNSYPSIISDS-SALEDLQNPEQSTSEKAVLTSQKSSE
#> B1_Ggor_AAT44835  1401 LAKTSQE---HHLSEETKCSASLFSSQCSELEDLTANTNTQDPFLI--GSSKQMRHQSESQGVGLSDKELVSDDE-ERGT--GLEEN--N-QEEQSMDSNLGEA-ASGCESETSVSEDCSGLSSQSDILTTQQRDTMQDNLIKLQQEMAELEAVLEQHGSQPSNSYPSIISDS-SALEDLRNPEQSTSEKAVLTSQKSSE
#> B1_Ppyg_AAT44834  1401 LVKASQE---HHLSEETKCSASLFSSQCSELEDLTANTNTQDRFFI--GSSKQMRHQSESQGVGLSDKELVSDDE-ERGT--DLEEN--N-QEEQGVDSNLGEA-ASGYESETSVSEDCSGLSSQSDILTTQQRDTMQDNLIKLQQEMAELEAVLEQHGSQPSNSYPSIISDS-SALEDLRNPEQSTSEKAVLTSQKSSE
#> B1_Mmul_AAT44833  1401 LAKASQE---HHLSEETKCSGSLFSSQCSELEDLTANTNTQDPFLI--GSSKRMRHQSESQGVGLSDKELVSDDE-ERGT--GLEED--N-QEEQSVDSNLGEA-ASGYESETSVSEDCSRLSSQSEILTTQQRDTMQDNLIKLQQEMAELEAVLEQHGSQPSNSYPSIITDS-SALEDLRNPEQSTSEKAVLTSQKSSE
#> B1_Mmus_AAD00168  1401 MIEASQE---HQFSEDPRCSGRMFSSQNSAAQGSTANANSQDSNFI--PPSKQRSHQCGNEEAFLSDKELISDNE-EMAT--CLEED--N-DQEE--DSIIPDSEASGYESETNLSEDC----SQSDILTTQQRATMKYNLIKLQQEMAHLEAVLEQRGNQPSGHSPSLLADP-CALEDLPDLEPNMSGAAILTSKNINE
#> B1_Cfam_AAC48663  1401 SAKASQE---HHLSEEARCSGSLFSSQCSALEDLTVNTNTQDPFSMFDPTSKQVRHQSENLDV-LNDKELVSDDDDEREP--GLEED--SPQEEQSVDSDLGEV-ASGYESETSLSEDCSRLSSQSDILTTQQRDTMQDNLIKLQQEMAELEAVLEQHESQPSNSSPSLIADS-CSPEDLLNPEQNASER-VLTSEKSSD
#> B1_Btau_NP_848668 1401 SAKVSQE---HHLNEEARCSGSLFSSQCSAMEDLTTNTNTQDPFLMFERPSKQV-YQSESEEV-LSDKELVSDDE-ERET--GLEED--SCQEEQSVDSDLGEA-VSDHVSETSLSEDGVGLSSQSDILTTQQRDTMQDNLLKLQQEMAELEAVLERHGSQPSHSSASLTADS-RGPEHLLNLEQDTSERAILTSEKSRD
#> B1_Mdom_temp1.pep 1401 SGEASQD---HDSSQESECSGSLFSSHSNSLEDLAGKANDKDSALVF-GSSHQESSHSRTQEI-QSGSEMPKASQDKEERETDLDED--HHPKDQGVDSNLVVE-GSGYDSETSHPGDTSQLSFQSDILTTQQRDTMQDNLKKLQQEMAVLEAVLEQHSSQVTNNSSSQEPGL-CPSMDQPDLKQTNSERGA---EENVD
#> B1_Ggal_NP_989500 1401 TLEAALSNECASPSQESECSVNLFSSQSNMSEESVDGAQELKKTLT--QVSNVKKSKEAPQSCSGGLKRLKNN----------LNDE--Y-QEDPNMGANLGE--ASGYDSETSRVEDSHEPFSQGEILSTQQKNAMQNNLKKLQQEMAVLEAVLKQHGSQDAEVLPLCRELPYCSIGGTLGLERMRQETENVSEHDS--
#> B1_Xlav_AAL13037  1401 LSKAA----FPSLSQESQCSVSLFSTQSNMSQQSVDEDHKQD----------------------VLQDELVSPNKTKRTAISGTEETQISPLRDQYQNPDIDE--ASECESEASHTGDSSILSSQDELLNTQQRNYMKDSLKKLQQEMAALEAVLGQHGTQKLEAETTCIPSS-EHVTEMQEATEEEEEETYQGENLFVK
#> B1_Tnig_AAR89523  1401 ---------------SSQASVDLF-----------------------------------------------------------------------------------GTPHECAVN-DVASSQFSSEVLVTQQKIEMKKELVRLEKLMALVSEVLHEKEASPA------------------------KDMLDKTKQKITG 
#> 
#> B1_Hsap_NP_009225 1601 YPISQNPEGLSADKFEVS-ADSSTSKNKEPGVERSSPSKCPSLDDRWYMHSCSGSLQNRNYP-SQEELIKVVDVEEQQLEESGPHDLTETSYLPRQDLEGTPYLESGISLFSDDPESDPSEDRAPESARVGNIPSSTSALKVPQLKVAESAQSPAAAHTTDTAGY---NAMEESVSREKPELTASTERVNKRMSMVVSGL
#> B1_Ptro_AAG43492  1601 YPISQNPEGLSADKFEVS-ADSSTSKNKEPGVERSSPSKCPSLDDRWYMHSCSGSLQNTNYP-SQEELIKVVDVEEQQLEESGPHDLTETSYLPRQDLEGTPYLESGISLFSDDPESDPSEDKAPESAHVGNIPSSTSALKVPQLKVAESAQSPAAAHTTNTAGY---NAMEESVSREKPELTASTERVNKRMSMVVSGL
#> B1_Ggor_AAT44835  1601 YPISQNPEGLSADKFEVS-ADSSTSKNKEPGVERSSPSKCPSLDDRWYMHSCSGSLQNRNYP-SQEELIKVVDVEEQQLEESGPHDLTETSYLPRQDLEGTPYLESGISLFSDDPESDPSEDRAPESAHVGNIPSSTSALKVPQLKVAESAQSPAAAHTTNTAGY---HAMEESVSREKPELTASTERVNKRMSMVVSGL
#> B1_Ppyg_AAT44834  1601 YPISQNPEGLSADKFEVS-ADSSTNKNKEPGVERSSPSKCPSLDDRWYMHSCSGSLQNGNYP-SQEELIKVVDVEKQQLEESGPHDLTEPSYLPRQDLEGTPYLESGISLFSDDPESDASEDRAPESAHVGSIPSSTSALKVPQLKVAESAQSPAAAQTTNTAGY---NAMEESVSREKPELTASTERVNKRMSMVVSGL
#> B1_Mmul_AAT44833  1601 YPINQNPEGLSADKFEVS-ADSSTSKNKEPGVERSSPSKCQSLEDRWYVHSSSGSLQNGNYP-SQEELIKVVDVETQQLEKSGPHDLMEPSYLPRQDLDGTPYLESGISLFSDDPESDPSEDRAPESAHVGSIPSSTSALKVPQWQVAESAQSPAAAHNTNTAGY---NAMEESVSRENPKLTASTERVNKRMSLVVSGL
#> B1_Mmus_AAD00168  1601 NPVSQNLKSACDDKFQLQHLEGPTSGDDESGMGRPSPFKSPLAGSRGSAHGCSRHLQKRNSP-SQEELLQPAGSE----ASSEPHNSTGQSCLPRRELEGTPYLGSGISLFS---SRDPESESPKEPAHIGTTPASTSALKIPQGQVAFRSAAAAGAD----------KAVVGIVSKIKPELTSSEERADRDISMVVSGL
#> B1_Cfam_AAC48663  1601 SPISQNPESLSTDKFQVF-LDSSTSKNGEPGMIRSSPSQSRLLDTRWYVHSCPRSLQDTNCP-SQKELTKVVSMEEQQPTESEARDLMEQSYLSRPDLEGAPYLESGISLFSDDPESDPSSHRASELAHVSSMPTSTSALKLPQFQVEESAKSTAAVHIASTAGY---NKSEDSVGIEKPEVISSTRGVNKRISMVASGL
#> B1_Btau_NP_848668 1601 YSRSQNPESLSADKFPVS-LDSSTNKNKEPGMERSSASKFQLSYNRWYMHS-SRSLQDRNCP-SQKEPINVADMEEQQLAKREAQDLMG-SFLPRQDQEGTPYLKSGISLFSHEPESDPSEDRAAEPAHVHSMPPSASALKLSQFRVEESTKNPAAAHIANTTRC---NLREESMSKEKPEVISSTERSKKRLSMVASGL
#> B1_Mdom_temp1.pep 1601 HHKSQNVQSFSADKSRAF-PSDSNSKNEEAGVEGSSFSKSQMPNRVWSPLSRSRTPWEGDSP-S-EESGKVTTRKEEHKRQ------PNEQGVLKQNLEITTEPESGVKVNLKNIKSNPHQNKDPALRNIFNLPTSTSALRSSQLQTIEATNTTAPCAPAC-------NVRGQLKEDSASERLKKGGTGNRKISLVSSGL
#> B1_Ggal_NP_989500 1601 --------------------ETKLTKASVLPVLCGNVTKNPNSSSFSVKHPCPQTAEAT----------------------------------------------------------DSSAVAQGDNKSNVQVCKSKRSVCFPTSVLHNVA--------------------------GKENAASSGTTCRTEMSIVASGL
#> B1_Xlav_AAL13037  1601 TPPE-----------HVANGSTLIGQSTDNGLQGRNRPVSPSFLCQTRIEKDTPEIVCPAMGISKKSLLTQKELPSQHKQEIRP-DLETVNVTPTNPIDAQESLRQS----SKNKRGSRMLHNKRSTSNQGSFCLSVEAAESPQIPKQNKAEFGIARKSTSPTFASPARAKVPSVGFKSPVVS-----SRRNLSFVASGL
#> B1_Tnig_AAR89523  1601 SDVDHVLSCGQGEVFNQETFPEEEEQDANASLNDGKGAARPTGSKHSSITEL----------------------------------------------NSRISNTVGLSSAAKTLKSDGSPSDGHE---------------------------------------------------DKENNTPERARSLARMLLVTSGL 
#> 
#> B1_Hsap_NP_009225 1801 TPEEFMLVYKFARKHHITLTNLITEETTHVVMKTDAEFVCERTLKYFLGIAGGKWVVSYFWVTQSIKERKMLNEHDFEVRGDVVNGRNHQGPKRARESQDR-----KIFRGLEICCYGPFTNMPTDQLEWMVQLCGASVVKELSSFTLGTGVHPIVVVQPDAWTEDNGFHAIGQMCEAPVVTREWVLDSVALYQCQELDT
#> B1_Ptro_AAG43492  1801 TPEEFMLVYKFARKHHITLTNLITEETTHVVMKTDAEFVCERTLKYFLGIAGGKWVVSYFWVTQSIKERKMLNEHDFEVRGDVVNGRNHQGPKRARESQDR-----KIFRGLEICCYGPFTNMPTDQLEWMVQLCGASVVKELSSFTLGTGVHPIVVVQPDAWTEDNGFHAIGQMCEAPVVTREWVLDSVALYQCQELDT
#> B1_Ggor_AAT44835  1801 TPEEFMLVYKFARKHHITLTNLITEETTHVVMKTDAEFVCERTLKYFLGIAGGKWVVSYFWVTQSIKEGKMLNEHDFEVRGDVVNGRNHQGPKRARESQDR-----KIFRGLDICCYGPFTNMPTDQLEWMVQLCGASVVKELSSFTLGTGVHPIVVVQPDAWTEDNGFHAIGQMCEAPVVTREWVLDSVALYQCQELDT
#> B1_Ppyg_AAT44834  1801 TPEEFMLVYKFARKHHITLTNLITEETTHVVMKTDAEFVCERTLKYFLGIAGGKWVVSYFWVTQSIKERKMLNEHDFEVRGDVVNGRNHQGPKRARESQDR-----KIFRGLEICCYGPFTNMPTDQLEWIVQLCGASVVKELSSFTLGTGVHPIVVVQPDAWTEDNGFHAIGQMCEAPVVTREWVLDSVALYQCQELDT
#> B1_Mmul_AAT44833  1801 TPEEFMLVYKFARRYHIALTNLISEETTHVVMKTDAEFVCERTLKYFLGIAGGKWVVSYFWVTQSIKERKMLNEHDFEVRGDVVNGRNHQGPKRARESPDR-----KIFRGLEICCYGPFTNMPTDQLEWMVQLCGASVVKELSSFTLGTGFHPIVVVQPDAWTEDNGFHAIGQMCEAPVVTREWVLDSVALYQCQELDT
#> B1_Mmus_AAD00168  1801 TPKEVMTVQKFAEKYRLTLTDAITEETTHVIIKTDAEFVCERTLKYFLGIAGGKWIVSYSWVVRSIQERRLLNVHEFEVKGDVVTGRNHQGPRRSRESRE------KLFKGLQVYCCEPFTNMPKDELERMLQLCGASVVKELPSLTHDTGAHLVVIVQPSAWTEDSNCPDIGQLCKARLVMWDWVLDSLSSYRCRDLDA
#> B1_Cfam_AAC48663  1801 TPKEFMLVHKFARKHHISLTNLISEETTHVIMKTDAEFVCERTLKYFLGIAGGKWVVSYFWVTQSIKERKILDEHDFEVRGDVVNGRNHQGPKRARESQDRESQDRKIFRGLEICCYGPFTNMPTDQLEWMVHLCGASVVKEPSLFTLSKGTHPVVVVQPDAWTEDSGFHAIGQMCEAPVVTREWVLDSVALYQCQELDT
#> B1_Btau_NP_848668 1801 TPKELMLVQKFARKHHVTLTNLITEETTHVIMKTDPEFVCERTLKYFLGIAGGKWVVSYFWVTQSIKEGKMLDEHDFEVRGDVVNGRNHQGPKRARESRDK-----KIFKGLEICCYGPFTNMPTDQLEWMVQLCGASVVKEPSSFTPDQGTHPVVVVQPDAWTEDAGFHVIGQMCEAPVVTREWVLDSVALYQCQELDT
#> B1_Mdom_temp1.pep 1801 TPKENMLVQKFARKTHSTVSHQITEGTTHVIMKTDAEFVCERTLKYFLGIAGGKWVVSFLWVVQSFKEGKMLPECDFEVRGDVINGRNHRGPERARESQGM-----KIFRGLEICCYGPFTDMSTDQLEWMVQLCGASVVKKPSSLRFRVGSSPVVVVQPDAWEDDSSFQEIGLVCEAPVVTREWVLDSVACYQRQELDT
#> B1_Ggal_NP_989500 1801 NQSEHLMVQKFARKTQSTFSNHITDGTTHVIMKTDEELVCERTLKYFLGIAGRKWVVSYQWIIQSFKEGRILDEEHFEVKGDVINGRNHQGPKRARQSPAE-----KIFKDFEICCCGPFTDMTTGHLEWIVELCGASVVKQLHLFTHKVNSTAVVVVQPDAWMEGTSYEAIQRKNNVAVVTREWVLDSVACFECQELDA
#> B1_Xlav_AAL13037  1801 NQCEMALVQRFSKTTQSILSSRITDSTTHVIMKTDAELVCERTLKYFQGIASRKWVVSYEWVVQSFREGQILDEYDFEVKGDVINGRNHRGPRRSRLGSDG-----LLLIDFEICFFGSFTDMTLDDLEWMVSECGATVVKKLQFFKKKHNVTSLVIVQPDASTEVRDYTEIRKKHKALVVTREWLMDSVATYRLQKFDA
#> B1_Tnig_AAR89523  1801 GPSQQITVKKFAKRIGATVVSQVTPEVTHVVMHTDEQLVCERTLKYFLGIAGRKWVVSFQWISECIKQKKLLNETLFEVRGDVVNGFDHQGPMKARATADN----NLLMKGYSICFQGPFTDMTTAEMELMVELCGATVVQDPLLLDGKRTSHQLIVVQSGS----ESSRSVSG--KATVVTRGWLLDSVATYTIQNLKN 
#> 
#> B1_Hsap_NP_009225 2001 YLIPQIPHSHY-------
#> B1_Ptro_AAG43492  2001 YLIPQIPHSHY-------
#> B1_Ggor_AAT44835  2001 YLIPQIPHSHY-------
#> B1_Ppyg_AAT44834  2001 YLIPQIPHSHY-------
#> B1_Mmul_AAT44833  2001 YLIPQIPHSHY-------
#> B1_Mmus_AAD00168  2001 YLVQNITCDSSEPQDSND
#> B1_Cfam_AAC48663  2001 YLIPQIPRTAADSSQPCV
#> B1_Btau_NP_848668 2001 YLVP--------------
#> B1_Mdom_temp1.pep 2001 YLISQTSPSLC-------
#> B1_Ggal_NP_989500 2001 YLVSQD------------
#> B1_Xlav_AAL13037  2001 YLA---------------
#> B1_Tnig_AAR89523  2001 YRADLRAA----------

To read in the 117 BRCA1 missense variants, i.e., the 117 substitutions, we use the function read_substitutions():

path <- system.file("extdata/lee2010_sub.txt", package = 'agvgd', mustWork = TRUE)
missense_variants <- read_substitutions(path)
missense_variants
#> # A tibble: 117 × 4
#>      res   poi ref   sub  
#>    <int> <int> <chr> <chr>
#>  1  1647    NA N     K    
#>  2  1651    NA S     F    
#>  3  1652    NA M     T    
#>  4  1652    NA M     I    
#>  5  1653    NA V     M    
#>  6  1655    NA S     F    
#>  7  1656    NA G     D    
#>  8  1662    NA F     S    
#>  9  1663    NA M     L    
#> 10  1663    NA M     K    
#> # ℹ 107 more rows

The list of missense variants comes with the indication of the residue number in the protein sequence, i.e. column res in missense_variants. As we’ll see next, the function agvgd() uses the alignment positions (column poi in missense_variants) instead to refer to those positions in the alignment.

The difference between res and poi is that res counts the residues in the protein primary sequence reference, while poi refers to the positions in the alignment, accounting for gaps.

So, before we proceed, we will update the missense_variants tibble and replace the NA values with the corresponding poi values. For that we use the function res_to_poi():

missense_variants$poi <- res_to_poi(brca1_alignment, missense_variants$res)
missense_variants
#> # A tibble: 117 × 4
#>      res   poi ref   sub  
#>    <int> <int> <chr> <chr>
#>  1  1647  1790 N     K    
#>  2  1651  1794 S     F    
#>  3  1652  1795 M     T    
#>  4  1652  1795 M     I    
#>  5  1653  1796 V     M    
#>  6  1655  1798 S     F    
#>  7  1656  1799 G     D    
#>  8  1662  1805 F     S    
#>  9  1663  1806 M     L    
#> 10  1663  1806 M     K    
#> # ℹ 107 more rows

Running agvgd()

Run Align-GVGD with the function agvgd():

scores <- agvgd(alignment = brca1_alignment,
      poi = missense_variants$poi,
      sub = missense_variants$sub)

print(scores, n = Inf)
#> # A tibble: 117 × 7
#>       res   poi ref   sub      gv     gd prediction
#>     <int> <int> <chr> <chr> <dbl>  <dbl> <chr>     
#>   1  1647  1790 N     K     177.    0    C0        
#>   2  1651  1794 S     F     144.   21.3  C0        
#>   3  1652  1795 M     T      30.3  81.0  C25       
#>   4  1652  1795 M     I      30.3   0    C0        
#>   5  1653  1796 V     M       0    21.5  C15       
#>   6  1655  1798 S     F       0   155.   C65       
#>   7  1656  1799 G     D       0    93.8  C65       
#>   8  1662  1805 F     S     161.   25.1  C0        
#>   9  1663  1806 M     L      96.2   0    C0        
#>  10  1663  1806 M     K      96.2  57.1  C0        
#>  11  1664  1807 L     P      92.4  28.9  C0        
#>  12  1665  1808 V     M       0    21.5  C15       
#>  13  1669  1812 A     S      99.1   0    C0        
#>  14  1682  1825 E     K     117.   36.7  C0        
#>  15  1682  1825 E     V     117.   43.3  C0        
#>  16  1685  1828 T     I       0    89.3  C65       
#>  17  1685  1828 T     A       0    58.0  C55       
#>  18  1689  1832 M     T      10.1  81.0  C55       
#>  19  1689  1832 M     R      10.1  90.6  C55       
#>  20  1691  1834 T     K       0    77.7  C65       
#>  21  1691  1834 T     I       0    89.3  C65       
#>  22  1692  1835 D     H       0    81.2  C65       
#>  23  1692  1835 D     N       0    23.0  C15       
#>  24  1692  1835 D     Y       0   160.   C65       
#>  25  1695  1838 F     L      21.8   0    C0        
#>  26  1696  1839 V     L       0    31.8  C25       
#>  27  1697  1840 C     R       0   180.   C65       
#>  28  1699  1842 R     W       0   101.   C65       
#>  29  1699  1842 R     Q       0    42.8  C35       
#>  30  1699  1842 R     L       0   102.   C65       
#>  31  1700  1843 T     A       0    58.0  C55       
#>  32  1706  1849 G     A       0    60    C55       
#>  33  1706  1849 G     E       0    97.8  C65       
#>  34  1708  1851 A     V       0    64.4  C55       
#>  35  1708  1851 A     E       0   107.   C65       
#>  36  1713  1856 V     A      29.6  64.4  C25       
#>  37  1714  1857 V     G       0   109.   C65       
#>  38  1715  1858 S     R       0   109.   C65       
#>  39  1715  1858 S     N       0    46.2  C45       
#>  40  1715  1858 S     C       0   112.   C65       
#>  41  1718  1861 W     C       0   214.   C65       
#>  42  1718  1861 W     S       0   177.   C65       
#>  43  1720  1863 T     A     142.    1.01 C0        
#>  44  1722  1865 S     F     112.  125.   C15       
#>  45  1726  1869 R     G     131.    0    C0        
#>  46  1730  1873 N     S     108.    2.79 C0        
#>  47  1733  1876 D     G     172.   51.7  C0        
#>  48  1734  1877 F     S       0   155.   C65       
#>  49  1736  1879 V     A       0    64.4  C55       
#>  50  1736  1879 V     G       0   109.   C65       
#>  51  1738  1881 G     E       0    97.8  C65       
#>  52  1738  1881 G     R       0   125.   C65       
#>  53  1739  1882 D     Y       0   160.   C65       
#>  54  1739  1882 D     V       0   152.   C65       
#>  55  1739  1882 D     E       0    44.6  C35       
#>  56  1739  1882 D     G       0    93.8  C65       
#>  57  1741  1884 V     G      29.6 109.   C35       
#>  58  1746  1889 H     N       0    68.4  C65       
#>  59  1749  1892 P     R       0   103.   C65       
#>  60  1751  1894 R     P      26    96.5  C35       
#>  61  1751  1894 R     Q      26    38.2  C0        
#>  62  1752  1895 A     V      99.1  63.6  C0        
#>  63  1752  1895 A     P      99.1   1.7  C0        
#>  64  1753  1896 R     T       0    71.0  C65       
#>  65  1761  1909 F     S      30.3 135.   C45       
#>  66  1763  1911 G     V      93.8  77.6  C15       
#>  67  1764  1912 L     P      35.7  85.7  C25       
#>  68  1766  1914 I     S      29.6 123.   C35       
#>  69  1771  1919 P     L      73.4  97.8  C15       
#>  70  1771  1919 P     R      73.4  95.1  C15       
#>  71  1773  1921 T     I       0    89.3  C65       
#>  72  1773  1921 T     S       0    57.8  C55       
#>  73  1775  1923 M     R       0    91.6  C65       
#>  74  1775  1923 M     K       0    94.5  C65       
#>  75  1778  1926 D     G     134.    0    C0        
#>  76  1778  1926 D     Y     134.   88.6  C15       
#>  77  1778  1926 D     N     134.    2.03 C0        
#>  78  1780  1928 L     P      14.3  86.6  C45       
#>  79  1783  1931 M     I      10.1   0    C0        
#>  80  1783  1931 M     T      10.1  81.0  C55       
#>  81  1783  1931 M     L      10.1   4.86 C0        
#>  82  1785  1933 Q     H     100.    0    C0        
#>  83  1787  1935 C     S       0   112.   C65       
#>  84  1788  1936 G     V       0   109.   C65       
#>  85  1788  1936 G     D       0    93.8  C65       
#>  86  1789  1937 A     S       0    99.1  C65       
#>  87  1794  1942 E     D     106.    0    C0        
#>  88  1803  1951 G     A      87.3  49.4  C0        
#>  89  1804  1952 V     D     155.   61.5  C0        
#>  90  1805  1953 H     P      91.6  16.3  C0        
#>  91  1806  1954 P     A     156.    0    C0        
#>  92  1808  1956 V     A      29.6  64.4  C25       
#>  93  1809  1957 V     F      29.6  21.3  C0        
#>  94  1809  1957 V     A      29.6  64.4  C25       
#>  95  1810  1958 V     G       0   109.   C65       
#>  96  1811  1959 Q     R       0    42.8  C35       
#>  97  1818  1966 D     G     251.    0    C0        
#>  98  1819  1967 N     S     152.    0    C0        
#>  99  1823  1971 A     T     160.    0    C0        
#> 100  1826  1974 Q     H     172.    0    C0        
#> 101  1830  1978 A     T      64.4  49.4  C0        
#> 102  1833  1981 V     M       0    21.5  C15       
#> 103  1835  1983 R     P     101.   92.7  C15       
#> 104  1836  1984 E     K     113.   46.1  C0        
#> 105  1837  1985 W     R       0   101.   C65       
#> 106  1837  1985 W     G       0   184.   C65       
#> 107  1837  1985 W     C       0   214.   C65       
#> 108  1838  1986 V     E      31.8 121.   C35       
#> 109  1841  1989 S     N       0    46.2  C45       
#> 110  1841  1989 S     R       0   109.   C65       
#> 111  1843  1991 A     P      99.1   1.7  C0        
#> 112  1844  1992 L     R     217.   24.8  C0        
#> 113  1851  1999 D     E     101.    0    C0        
#> 114  1853  2001 Y     C       0   194.   C65       
#> 115  1854  2002 L     P     102.   79.5  C15       
#> 116  1856  2004 P     S     238.    0    C0        
#> 117  1859  2007 P     R     354.    0    C0

References