Introduction
One of the greatest challenges in the study of cancer risk is the assessment of unclassified missense variants.
Besides using biochemical and cell-based transcriptional assays to assess the structural and functional defects associated with missense variants, one can also use bioinformatics analysis based on multiple sequence alignment data and protein structure prediction to approach this problem.
Align-GVGD is one such method that uses protein multiple sequence alignment (PMSA) data to provide cancer risk estimates.
In this vignette we show you how you can use agvgd to reproduce the results obtained by Lee et al. (2010)1 on the study of missense variations in the BRCT Domain of BRCA1 gene.
In Lee’s paper, Align-GVGD prediction scores are part of their cross-validation of structural and functional assays, as indicated in the column labeled AG in Figure 3 of said paper.
Reproducing the AGVGD scores
Data sets
To reproduce the AGVGD prediction scores, we need two data sets:
- The protein sequence alignment of BRCA1 that included the BRCT Domain
- The list of the 117 BRCA1 missense variants studied1
Both these data sets are already bundled with agvgd.
To read in the alignment of BRCA1 use the function
read_alignment()
and the name of the gene
"BRCA1"
:
brca1_alignment <- read_alignment("BRCA1")
print(brca1_alignment, line_width = 200)
#> B1_Hsap_NP_009225 1 MDLSALRVEEVQNVINAMQKILECPICLELIKEPVSTKCDHIFCKFCMLKLLN-QKKGPSQCPLCKNDITKRSLQESTRFSQLVEELLKIICAFQLDTGLEYANSYNFAKKENN--SPEHL--KDEVSIIQSMGYRNRAKRLLQSEP--ENPSLQETSLSVQLSNLGTV-RTLRTKQRIQPQKTS--VYIELGSDSSE-D
#> B1_Ptro_AAG43492 1 MDLSALRVEEVQNVINAMQKILECPICLELIKEPVSTKCDHIFCKFCMLKLLN-QKKGPSQCPLCKNDITKRSLQESTRFSQLVEELLKIICAFQLDTGLEYANSYNFAKKENN--SPEHL--KDEVSIIQSMGYRNRAKRLLQSEP--ENPSLQETSLSVQLSNLGTV-RTLRTKQRIQPQKKS--VYIELGSDSSE-D
#> B1_Ggor_AAT44835 1 MDLSALRVEEVQNVINAMQKILECPICLELIKEPVSTKCDHIFCKFCMLKLLN-QKKGPSQCPLCKNDITKRSLQESTRFSQLVEELLKIICAFQLDTGLEYANSYNFAKKENN--SPEHL--KDEVSIIQSTGYRSRAKRLLQSEP--ENPSLQETGLSVQVSNLGTV-RTLRTKQRIQPQKKS--VYIELGSDSSE-D
#> B1_Ppyg_AAT44834 1 MDLSAVRVEEVQNVINAMQKILECPICLELIKEPVSTKCDHIFCKFCMLKLLN-QKKGPSQCPLCKNDITKRSLQESTRFSQLVEELLKIICAFQLDTGLQYANSYNFAKKENN--SPEHL--KDEVSIIQSMGYRNRAKRLLQSEP--ENPSLQETSPSVQLSNLGTV-RTLRTKQRIQPQKKS--VYIELGSDSSE-D
#> B1_Mmul_AAT44833 1 MDLSAVRVEEVQNVINAMQKILECPICLELIKEPVSTKCDHIFCRFCMLKLLN-QKKGPSQCPLCKNDITKRSLQESTRFSQLVEELLKIIHAFQLDTGLQFANSYNFAKKENH--SPEHL--KDEVSIIQSMGYRNRAKRLLQSEP--ENPSLQETSLSVPLSNLGIV-RTLRTKQQIQPQKKS--VYIELGSDSSE-D
#> B1_Mmus_AAD00168 1 MDLSAVQIQEVQNVLHAMQKILECPICLELIKEPVSTKCDHIFCKFCMLKLLN-QKKGPSQCPLCKNEITKRSLQGSTRFSQLAEELLRIMAAFELDTGMQLTNGFSFSKKRNN--SCERL--NEEASIIQSVGYRNRVRRLPQVEP--GNATLKD-SLGVQLSNLGIV-RSVKKNRQTQPRKKS--VYIELDSDSSE-E
#> B1_Cfam_AAC48663 1 MDLSADRVEEVQNVLNAMQKILECPICLELIKEPVSTKCDHIFCKFCMLKLLN-QRKGPSQCPLCKNDITKRSLQESTRFSQLVEELLKIIHAFELDTGLQFADSYNFSKKENN--SPEHL--KEEVSIIQSMGYRNRAKRLRQSEP--ENPTL-ETSLSVQLSNLGIV-RSLRTKQQIQPQNKS--VYIELGSDSSE-D
#> B1_Btau_NP_848668 1 MDLSADHVEEVQNVLNAMQKILECPICLELIKEPVSTKCDHIFCKFCMLKLLN-QKKGPSQCPLCKNDITKRSLQESTRFSQLVEELLKIIHAFELDTGLQFANSYNFSRKEDN--SPEHL--KEEVSIIQSMGYRNRAKRLWQSEP--ENPTLQETSLTVELSNLGIV-RSLRTKQQTQSQNKS--VYIELGSDSSE-D
#> B1_Mdom_temp1.pep 1 MDLPTVTIEEVKNVLIGMQKILECPICLELIKEPVSTTCDHIFCRFCMLKLLS-KKKGPSQCPLCKNNITKRSLRESTRFNQLVEGLLKTIRAFELDTGFQFSNTQDFSKWERR--TPEPL--KKEAATIQSIGYRNRSKRFKASES--ENSTL-ESSLGVQLYDLGIR-KGSLRKQKKCIKNNA--VYIKLGSDSSE-D
#> B1_Ggal_NP_989500 1 MDLSVIAIGDVQNVLSAMQKNLECPVCLDVIKEPVSTKCDHVFCRFCMFKLLSRKKKGVIQCPLCKTEVTKRSLKENSRFKQLIEGLLEAISAFELDTGVKFLSSRYFPKTSTEVATAELL--GNNSSVIQSKGFRNRKRGAKENRQ--DSCTL-EANVDPQLTDNRVKGSSVRSKKQKCGIEKG--VLIELGTDSSE-E
#> B1_Xlav_AAL13037 1 MTCSRMDIEGICSVISVMQKNLECPICLELMKEPVATKCDHIFCKFCMLQLLSKKKKGTVPCPLCKTEVTRRSLQESHRFKLLVEGQLKIIKAFEFDSGYKFFPSQEHTKGLDS--TIEDVLVKEDQSIVHCKGYRNRKKGVFNRKTYEETGML-SVSKAEEQF-AKEVTRLIPCRQK-KPKKEAALIFSNCVPDSSDGD
#> B1_Tnig_AAR89523 1 ME--APTATDVKKRISLLWETLQCPICLDLMSEPVSTKCDHQFCRFCMLKLLSNTKQNKANCPVCKSKITKRSLQESPGFQRLVSGLQEIILAYENDTGTNYFTGLS---------------------------------------------------------------------KQAQPPHVA---------------
#>
#> B1_Hsap_NP_009225 201 TVNKATYCSVGDQELLQITPQGTRDEISLDS------AKKAACEFSETDVTNTEHHQPSNNDLNTTEKRAAERHPEKYQGSSVSNLHVEP--CGTNT---HASSLQHENSSLLLTKDRMNVEKAEFCNKSKQPGLARSQHNRWAGSKETCNDRRTPSTE--KKVDLNADPLCERKEWNKQKL-PCSENPRDTEDVPWIT-
#> B1_Ptro_AAG43492 201 TVNKATYCSVGDQELLQITPQGTRDEISLDS------AKKAACEFSETDVTNTEHHQPSNNDLNTTEKRATERHPEKYQGSSVSNLHVEP--CGTNT---HASSLQHENSSLLLTKDRMNVEKAEFCNKSKQPGLARSQHNRWAGSKETCNDRRTPSTE--KKVDLNADPLCERKEWNKQKL-PCSENPRDTEDVPWIT-
#> B1_Ggor_AAT44835 201 TVNKATYCSVGDQELLQITPQGTRDEISLDS------AKKAACEFSETDVTNTEHHQPSNNDLNTTEKRATERHPEKYQGSSVSNLHVEP--CGTNT---HASSLQHENSSLLLTKDRMNVEKAEFCNKNKQPGLARSQHNRWAGSKETCNDRRTPSTE--KKVDLNADPLCERNEWNKQKL-PCSENPRDTEDVPWIT-
#> B1_Ppyg_AAT44834 201 TVNKATYCSVGDQELLQITPQGTSDEISLDS------AKKAACEFSETDVTNTEHHQPSNNDLNTTEKRATERHPEKYQGSSVSNLHVEP--CGTNT---HASSLQHENSSLLLTKDRMNVEKAEFCNKSKQPGLARSQHNRWAGSKETCNDRQTPSTE--KKVDLNADPLCERKEWNKQKL-PCSENPRDTEDVPWIT-
#> B1_Mmul_AAT44833 201 TVNKATYCSVGDQELLQITPQGTRDETSLDS------AKKAACEFSEKDITNTEHHQSSNNDLNTTEKHATERHPEKYQGSSVSNLHVEP--CGTNT---HASSLQHENS-LLLTKDRMNVEKAEFCNKSKQPGLARSQHNRWTGSKETCNDRQTPSTE--KKVDLNANALYERKEWNKQKL-PCSENPRDAEDVPWIT-
#> B1_Mmus_AAD00168 201 TVTKPGDCSVRDQELLQTAPQEAGDEGKLHS------AEEAACEFSE-GIRNIEHHQCS-DDLNPTENHATERHPEKCQSISISNVCVEP--CGTDA---HASSLQPETSSLLLIEDRMNAEKAEFCNKSKQPGIAVSQQSRWAASKGTCNDRQVPSTG--EKVGPNADSLSDREKWTHPQS-LCPENSGATTDVPWIT-
#> B1_Cfam_AAC48663 201 TVNKASSCSVGDDEL-EITSQGARAEASLNP------AKKAACEFS-GDITNIEHHQSSNKDLTTTEKHATKKHPEKYQGISVSNLHVEP--CGTNT---HASSLQHENSSLLLTKHRMNVEKAEICNNSKQPGLARSQQSRWAESKETCNDRQIPSTE--KKVVVNADLLCGRKELNKQKP-PHSDSPRDSQDVPWIT-
#> B1_Btau_NP_848668 201 TVNKASYFSVGDHELLEITPQGAKAKTNLNP------AEKAACEFSEKDITNTEHHQLSIKDLITTQKHATETHPEKYQGISVSDFHVEP--CGTDT---HASSLQHENSSLLLTENRLNVEKAEFCNKSKQPVLVKSQQSRWAESKGTCKDRQIPSTE--KKIVLNTDPLYRRKELRKQKP-ACPDSPGDSQDVPWVT-
#> B1_Mdom_temp1.pep 201 GVKNAICNSVKDQGLCQTSPKGTR----LKS------KEKAEYEFSERAIKSLQQHQSNTVDVHVINENATEGHSEESRGVSSSDLNMKP--WNTDI---HASSLPPEITSVLTNTVSMNIEKAELCDKSKRPGLARSQQISQDNSKEKCSAGKTSYAE--VPHELNPHHLYERQELEEQPECPKYPRGNPQNCLSGTK-
#> B1_Ggal_NP_989500 201 HFILASSTGLEDKEELEEPKSAEKYGSSCNTQPLKLGAKEIILPNVIGETDFLKEALDKKSMLNITEHIKCNQVNTIEGQSSPLNVFDADLLTGQRDGIGNASPLKND-TSFLKNAEEMDVEETQCSHKNQELDLEDSSEGRLDKIKE--KDICVPSVEDVEMCEPMDDSLLEKEPPVEKPLQPKIPHCPTLNEVSTKG-
#> B1_Xlav_AAL13037 201 LLNKENGLRNDCSPL-----HYEKEDTQIPEMEEMVESDLAECEFAESAGSNLLGFDG----------------PEGIPEISAETSINAAGNCDFYGRKTEQFPNDHHCSFKQNIADAEQNKRNQHCGNVPFAPMGKSNLDEKETVETDFDNQHNDS------------------------------NPENNDPLGKVTK
#> B1_Tnig_AAR89523 201 ------------------------------------------------DIKAQHHNKVSVMDASCAEDDYEEALPK-----SQSSTTAAQDGFARLMGLKDTSPLTTGLDSGLGEAPPTCDKKMYSPTKVENVPLEPA----FIPDEDERSDLQTPSKKKSKK-DLEPDKILDQR-------------------------
#>
#> B1_Hsap_NP_009225 401 -LNSSIQKVNEWFSRSDELLGSDDSHDGESESNAKVADVLDVLNEVDEYSGSS--EKIDLLASDPHEALICKSERVHSKSVESN-IEDKIFGKTYRKKASLPNLSHVTE---NLIIGAFVT--EPQIIQERPLTN----KLKRKRRPTSGLHPEDFIKKADLA-VQKTPEMINQGTNQT---EQNGQVMNITNSGHE--N
#> B1_Ptro_AAG43492 401 -LNSSIQKVNEWFSRSDELLGSDDSHDGGSESNAKVADVLDVLNEVDEYSGSS--EKIDLLASDPHEALICKSERVHSKSVESN-TEDKIFGKTYRRKASLPNLSHVTE---NLIIGAFVT--EPQIIQERPLTN----KLKRKRRATSGLHPEDFIKKADLA-VQKTPEMINQGTNQM---EQNGQVMNITNSGHE--N
#> B1_Ggor_AAT44835 401 -LNSSIQKVNEWFSRSDELLGSDDSHDGGSESNAKVADVLDVLNEVDEYSGSS--EKIDLLASDPHEALICKSERVHSKSVESN-IEDKIFGKTYRRKASLPSLSHVTE---NLIIGAFVT--EPQIIQERPLTN----KLKRKRRATSGLHPEDFIKKADLA-VQKTPEMINQGTNQM---EQNGQVMNITNSGHE--N
#> B1_Ppyg_AAT44834 401 -LNSSIQKVNEWFSRSDELLGSDDSHDGRSESNAKVADVLDVLNEVDEYSGSS--EKIDLLASDPHEALICKSERVHSKSVESN-IEDKIFGKTYRRKASLPNLSHVTE---NLIIGAFVT--EPQIIQERPLTN----KLKRKRRATSGLHPEDFIKKADLA-VQKTPEMINQGTNQM---EQNGQVMNITNSGHE--N
#> B1_Mmul_AAT44833 401 -LNSSIQKVNEWFSRSDELLSSDDSHDGGSESNAKVADVLDVLNEVDEYSGSS--EKIDLLASDPHEPLICKSERVHSSSVESN-IKDKIFGKTYRRKANLPNLSHVTE---NLIIGALVT--ESQIMQERPLTN----KLKRKRRTTSGLHPEDFIKKADLA-VQKTPEIINQGTNQM---EQNGQVMNITNSAHE--N
#> B1_Mmus_AAD00168 401 -LNSSVEKVNEWFSRTGEMLTSDSASARRHESNAEAAVVLEVSNEVDGGFSSS--RKTDLVTPDPHHTLMCKSGRDFSKPVEDN-ISDKIFGKSYQRKGSRPHLNHVTE-----IIGTFIT--EPQITQEQPFTN----KLKRKR--STSLQPEDFIKKADSAGVQRTPDNINQGTDLM---EPNEQAVSTTSNCQE--N
#> B1_Cfam_AAC48663 401 -LNSSIRKVNEWFSRSDEILTSDDSHDRGSELNTEVGGAVEVPNEVGEYSGSS--EKIDLMASDPQDAFICESERVHTKPVGGN-IEDKIFGKTYRRKASLPKVSHTTE---VLTIGACAI--EPQTMQTHPFMN----KAEHKRRTTSSLHPEDFIKKVELGIVPKTPEKLIEGINQI---KRDGHVINITNNGPE--N
#> B1_Btau_NP_848668 401 -LNNSIQKVNDWFSRSDEILTSDDSCDGGSESNNEVAGAVEIPNKVDGYSGSS--EKINLMASDPHGTLI--HERVHSKPVESN-IEDKIFGKTYRRKSSLPNFSHIAE---DLILGAFTV--EPQITQEQPLTN----KLKCKRRGTSGLQPEDFIKKVDLTIVPKTPEKMTEGTDQT---EQKCHGMNITSDGHE--N
#> B1_Mdom_temp1.pep 401 -LKSSIQKVNDWLSRSNDILVSDYSSVRIHEQNAEMASVLEIGHPDTTDGNSSISGKTDLVADSTDGAWLHMSERSCPRQAENNNIEDKIFGKTYHRKSVHTNLNYVTE---NLIVGAVAS--DCLIPPEHVKQT----RLKRKRKTISDLQPEDFIKKTDTEFTHKSPEKKIHAVDQILEQEQNGQVMNTVNGHLE--Q
#> B1_Ggal_NP_989500 401 -LNQSIQKVNEWFSKSSRILSSSSSQNDHAEA-TDASGEGDI-SLSDKDSCIS--EKTNPIVDSVEFAVIERNKR-WTKQTTYS-IEDKIFGKTYERGRKSNPSTILRD-----ILPATK--KEDAAAEEGCLNNSRKDRLKRKRKSACILQPEDFIKKKDLEEADRCPQGIKSSLGDA---EKE--------KCDENSA
#> B1_Xlav_AAL13037 401 LMRRSTERVNEWLLKTNQ----DFSTLSAEEDPILDALALQNKETSDKRSCSS--DDSELMPVLHKHAEKGISGGGFDKPA-VG-VKDKIFCKVYKRERKAMPPNNITCVAEVHHDSALETGKENMTLEYGTGMS----HLMSKRKMVYSLNPENTSKKNDLANINGSINVFPDCIS-----DANLELEDKSEADSNSAD
#> B1_Tnig_AAR89523 401 -QKKSLEKVAEWLMNVP------------SEQSLEMENPEEDGDDSDSRSSTS--T-IDLGQL--------HRGTNPTRGRAKA-LEDQVFGAVYKRERRGKEMVKPTE--AALEVARFNLSVENTSEDEN----------------RDNKQEEHFIREREK----------NTGSNVL---EGEVEFLEDCRGSLEPTH
#>
#> B1_Hsap_NP_009225 601 KTKGDSIQNEKNPNP---IESLEKESAFKTKAEPISSSISNMELELNIHNSKAPKKNRLRRKS---STRHIHALELVVSRNLSP-----------------------PNCTELQIDSCSSSEEIKK-KKYNQMPVRHSRNLQLMEGKEPATGAKKSNKPNEQTSKRHDSDTFPELKLT----NAPGSFTKCSNTSELKEF
#> B1_Ptro_AAG43492 601 KTKGDSIQNEKNPNP---IESLEKESAFKTKAEPISSSISNMELELNIHNSKAPKKNRLRRKS---STRHIHALELVVSRNLSP-----------------------PNCTELQIDSCSSSEEIKK-KKYNQMPVRHSRNLQLMEDKEPATGVKKSNKPNEQTSKRHDSDTFPELKLT----NAPGSFTNCSNTSELKEF
#> B1_Ggor_AAT44835 601 KTKGDSIQNEKNPNP---IESLEKESAFKTKAEPISSCISNMELELNIHNSKAPKKNRLRRKS---STRHIHALELVVSRNLSP-----------------------PNCTELQIDSCSSSEEIKK-KKYNQMPVRHSRNLQLMEDKEPATGAKKSNKPNEQTSKRHDSDTFPELKLT----NAPGSFTNCSNTSELKEF
#> B1_Ppyg_AAT44834 601 KTKGDSIQNEKNPNP---IESLEKESAFKTKAEPISSSISNMELELNIHNSKAPKKNRLRRKS---STRHIHALELVVSRNLSP-----------------------PNCTELQIDSCSSSEEIKK-KKYNQMPVRHSRNLQLMEDKEPATGAKKSNKPNEQTSKRHDSDTFPELKLT----NAPGSFTNCSNTSELKEF
#> B1_Mmul_AAT44833 601 KTKGDSIQNEKNPNA---IESLEEESAFKTKAEPISSSINNMELELNIHNSKAPKKNRLRRKS---STRHIHALELVVSRNLSP-----------------------PNCTELQIDSCSSSEEIKK-KNYNQMPVRHSRNLQLMEDKESATGAKKSNKPNEQTSKRHASDTFPELKLT----KVPGSFTNCSNTSELKEF
#> B1_Mmus_AAD00168 601 QIAGSNLQKEKSAHP---TESLRKEPASTAGAKSISNSVSDLEVELNVHSSKAPKKNRLRRKS---SIRCALPLE-PISRNPSP-----------------------PTCAELQIDSCGSSEETKK-NHSNQQPAGHLREPQLIEDTEPAADAKK-NEPNEHIRKRRASDAFPEEKLM----NKAGLLTSCSSPRKSQGP
#> B1_Cfam_AAC48663 601 ETEGDYVQKEKNANP---TESLEKESAFRTKTEPMSSRISNMELELNSSSSKAPKKNRLRRKS---SARHTCALEFVVNRNLNP-----------------------PDHSELQIESCSSSEEMKK-QHLDQVPVRHNKTLQLMQDKEPAGRAKKSSKPGEQINKRLASHAFPELTLT----NVSGFFANYSSSSKPQEC
#> B1_Btau_NP_848668 601 KTKRDYVQKEQNANP---AESLEKESVFRTEAEPISISISNMELELNIHRSKAPK-NRLKRKS---STRKIPELELVVSRNPSL-----------------------PNHTELPIDSSSSNEEMKK-KHSSQMPVRQSQKLQLIGDKELTAGAK-NNKTYEQINKRLASDAFPELKLT----NTPGYFTNCSS--KPEEF
#> B1_Mdom_temp1.pep 601 KALVDGHVEEVNDALASELLPVEKESTFRTGTDSAAGSINHGGLKLNGRNAKMTKKDKLRKKS---SARIVHELELIVDKNPSS-----------------------SNETELQIDSYPSSEEIRKGNNSEQKQIRRSRRLQLLSEE-IAMETKKAYEPDEQAEKSCVNEVFPDLKMG----NIPACDTVSLTTESDQML
#> B1_Ggal_NP_989500 601 VKENPLLEKRKGSTL---AEFKERGLQWKNAAEKVSGKCSDGQLELNNSDQKSTKNACSTAKGCRHSTRTRCAIHL-VDRNPGS-----------------------FDLAEPLINSYPSNEEPSK-ADCERRQVRRSRRLQLLSEE-ITKETGKMR----VIKEAKNSDSGPEGSVFGVERNVLVHNSQCKDLRKQQDI
#> B1_Xlav_AAL13037 601 ACQSDIVHSSNTQIKQQCVESLANAGETRKKQE-LSCERSQEEKDFT----------------------QSGALGPKTRSQKSP-----------------------YGHSELHIESSQISNE-PS-NVTKQVEVRRSRRLLMLPKG-PGNKSNSNA--VKEMNEQENIAQIPEFRVKDTNKNNEISDTVPSQLKKKDTY
#> B1_Tnig_AAR89523 601 MSENDENKEDEVPHP---VSVIEEQQAETKGKRRTRSALQHVDSDLLKCTQKEPENTEPKR------T-QKRSRGIKSERAKSARTSKPLVLVAVENGEGGPKIGPRSEEVQVHIENYPSSGD--Q-EVPSGRSTRKSRRLRGFTKE-DTGKERSRSS----VPEKEHSSKHPKFEC-ETLNNVKSLDY-----------
#>
#> B1_Hsap_NP_009225 801 VNPSLPREEKEEKLETVK---VSNNAEDPKDLMLSGERVLQT-ERSVESSSISLVPGTDYGTQESISLLEVSTLGKAKTEPNKCVSQCAAFENPKGLIHGCSKD----NRNDTEGFKYPLGHEVNHS-RETSIEME---ESELDAQYLQNTFKVSKRQSFAPFSNPGNAEEECATFSAHSGSLKKQSPKVTFECEQKEEN
#> B1_Ptro_AAG43492 801 VNPSLPREEEEEKLETVK---VSNNAEDPKDLMLSGERVLQT-ERSVESSSISLVPGTDYGTQESISLLEVSTLGKAKTEPNKCVSQCAAFENPKGLIHGCSKD----TRNDTEGFKYPLGHEVNHS-RETSIEME---ESELDAQYLQNTFKVSKRQSFALFSNPGNPEEECATFSAHCRSLKKQSPKVTFEREQKEQN
#> B1_Ggor_AAT44835 801 VNPSLPREEKEEKLETVK---VSNNAEDPKDLMLSGERVLQT-ERSVESSSISLVPGTDYGTQESISLLEVSTLGKAKTEPHKCVSQCAAFENPKGLIHGCSKD----TRNDTEGFKYPLGHEVNHS-RETSIEME---ESELDAQYLQNTFKVSKRQSFALFSNPGNPEEECATFSAHSRSLKKQSPKVTFECEQKEEN
#> B1_Ppyg_AAT44834 801 VNPSLPREEKEEKLGTVK---VSNNAKDPKDLMLSGERVLQT-ERSVESSSISLVPGTDYGTQESISLLEVSTLGKAKTEPNKCVSQCAAFENPKELIHGCFKD----TRNDTEGFKYPLGHEVNHS-QETSIEME---ESELDTQYLQNTFKVSKRQSFALFSNPGNPEEECATFSAHSRSLKKQSPKVTFECEQKEEN
#> B1_Mmul_AAT44833 801 VNPSLSREEKEEKLETVK---VSNNAKDPKDLMLSGERVLQT-ERSVESSSISLVPDTDYGTQESISLLEVSTLGKAKTERNKCMSQCAAFENPKELIHGCSED----TRNDTEGFKYPLGSEVNHS-QETSIEIE---ESELDTQYLQNTFKVSKRQSFALFSNPGNPEEECATFSAHSRSLKKQSPKVTSECEQKEEN
#> B1_Mmus_AAD00168 801 VNPS-PQRTGTEQLETRQ---MSDSAKELGDRVLGGEPSGKTTDRSEESTSVSLVPDTDYDTQNSVSVLDAHTVRYARTGSAQCMTQFVASENPKELVHG-SNN----AGSGTEGLKPPLRHALNLS-QE-KVEME---DSELDTQYLQNTFQVSKRQSFALFSKPRSPQKDCA----HSVPSKELSPKVTAKGKQK-ER
#> B1_Cfam_AAC48663 801 INPGLRREEIEESRRMTQ---VSDSTRDPKELVLSGGRGLQT-ERSVESTSISLVLDTDYGTQDSISLLEADTLRKAKTVSNQQANLCATIENPKEPIHGCSKD----TRNDTEGFVVPLTCKDNHT-QETSIEME---ESELDTQCLRNMFKVSKRQSFALFSYPRDPEEDCVTVCPRSGAFGKQGPKVTLECGQKEES
#> B1_Btau_NP_848668 801 VHPSLQREE---NLGTIQ---VSNSTKDPKDLILREGKALQI-ERSVESTNISLVPDTDYSTQDSISLLEAKTPEKAKTAPNPCVSLCTATKNLKELIHRDFKD----TKNNTEGFQDLLGHDINYVIQETSREME---DSELDTQYLQNTFKASKRQTFALFSNPGNPQKECATVFAHSGSLRDQSPRDPLKCRQKEDS
#> B1_Mdom_temp1.pep 801 ASCSVTEEGHEKSLEAVQ------SSQDQEDLAISGGEGSQG-QRAKGNLEALEVPDTDWDTQDSTSLFPANTPQNSKAGPNPHRSQCGIMETPKELLDGCSSEN---TGSTTEDLRGLMRQGVKNA-SETTTEME---DSELDTQYLQNTFKRSKRQTFALGSSP---RQECMKPCAISQALHQRGLHNATDCGDHEKE
#> B1_Ggal_NP_989500 801 LSYMSLADRNGADLEANG---IQISSKNSDDMAK--NRSFFN-PTFSCQLSNFNSPSSKAGSQEGEMLGKLFLPQSPSKTVLHAASILTEEKRSWSCTV-FSQDKGCCSRNVPKDFRIGKSPMAKNA-SEFTMEAE---DSELDMQYLRNIFRSSKRQSFSLYPTP---MKACTTDDVASE-------KLNTSCPDQVEE
#> B1_Xlav_AAL13037 801 FPPCTSEDT--GELEN-------------------GIPVRKI-SDQNASLDNPIDPCKDEYSDT------------------------------------------------------LLHVAGDNV-QPDYMETE---ESELETQHIVKMFKTSKRTSFILESKEAENENRVNSAVEISQV------ETSNELPNVAEC
#> B1_Tnig_AAR89523 801 ---------KEQKWQADKNGCIYSQDMEEIENMDSGEKTSSR-PEEGSEQTLFEVPNTETLFQAACSVAE---------------------------------------------STAQPSNTARLL-TELEMENEQKNDSEQDTEQLVKSFKATKRKSFHLGSRPDVKRSRSLV-------------------QESDQS
#>
#> B1_Hsap_NP_009225 1001 QGKNESNIKPVQTVNITA-GFPVVGQK-DKPVDNAKCSIKGGSRFCLSSQFRGNETGLITPNKHGLLQNPYRIPPLFPIK--SFVKTKCKKNLLEENFEEHSMSP----EREMGNEN-IPSTVSTISRNNIRENVFKEASSSNINEVGSSTNEVGSSINEIGSSDENIQAELGRNRGPK--LNAMLRLGVLQPEVYKQSL
#> B1_Ptro_AAG43492 1001 QGKNESNIKPVQTVNITA-GFPVVCQK-DKPVDYAKCSIKGGSRFCLSSQFRGNETGLITPNKHGLLQNPYHIPPLFPIK--SFVKTKCKKNLLEENFEEHSMSP----EREMGNEN-IPSTVSTISRNNIRENVFKEASSSNINEVGSSTNEVGSSINEVGSSDENIQAELGRNRGPK--LNAMLRLGVLQPEVYKQSL
#> B1_Ggor_AAT44835 1001 QGKNESNIKPVQTVNITA-GFPVVCQK-DKPVDYAKCSIKGGSRFCLSSQFRGNETGLITPNKHGLLQNPYHIPPLFPIK--SFVKTKCKKNLLEENFEEHSMSP----EREMGNEN-IPSTVSTISRNNIRENVFKEASSSNINEVGSSTNEVGSSINEVGSSDENIQAELGRNRGPK--LNAMLRLGVLQPEVYKQSL
#> B1_Ppyg_AAT44834 1001 QGKNESNIKPVQTANITA-GFPVVCQK-DKPVDYAKCSIKGGSRFCLSSQFRGNETGLITPNKHGLSQNPYHIPPLFPIK--SFVKTKCKKNLLEENSEEHSMSP----EREMGNEN-IPSTVSIISRNNIRENVFKEASSSNINEVGSSTNEVGSSINEVGSSDENIQAELGRSRGPK--LNAMLRLGVLQPEVYKQSF
#> B1_Mmul_AAT44833 1001 QGKKESNIKPVQTVNITA-GFSVVCQK-DKPVDNAKCSIKGGSRFCLSSQFRGNETGLITPNKHGLLQNPYHIPPLFPVK--SFVKTKCNKNLLEENSEEHSVSP----ERAVGNENIIPSTVSTISHNNIRENAFKEASSSNINEVGSSTNEVGSSINEVGPSDENIQAELGRNRGPK--LNAVLRLGLLQPEVCKQSL
#> B1_Mmus_AAD00168 1001 QGQEEFEISHVQAVAATV-GLPVLCQE-GKLAADTMCD--RGSRLCPSSHYRSGENGLSATGKSGISQNSHFKQSVSPIR--SSIKTDNRKPLTEGRFERHTSST----EMAVGNENILQSTVHTVSLNN-RGNACQEAGSG--------------SIHEVCSTGDSFPGQLGRNRGPK--VNTVPPLDSMQPGVCQQSV
#> B1_Cfam_AAC48663 1001 QGKKESEIRHVQAVHTNA-GFSAVSQKAKKPGDFAKCSIKGVSRLCLSSQFKGKETELLIANYHGISQNPYHIPPLSPIR--SCVKTLCQENLSEEKFEQHSMSP----ERAVGNERVIQSTVSTISQNNIRECASKEVGSSSVNEVVSSTNEVGSSVNEVGSSGENIQAELGRNRGPK--LNAMLRLGLMQPEVCKQSL
#> B1_Btau_NP_848668 1001 QGKSESKSQHVQAICTTV-HFPVADQQDRTPGDDAKCSAKEVTRVCQSSQLRGHKTELVFANKQGVSEKPNLIPSLSPIK--SSVKTICKKSP-SEKFEEPVTSP----EKTLGSESIIQSAVSTISQNNIQESTFKEVSSNSVNEVGSSTNEVGSSVNEVGSSGENIQAEPGRNREPK--LRALLGLGLTQPEVYKQSL
#> B1_Mdom_temp1.pep 1001 KLGNRESNKPVQAKSAVM-NLAVVCQIERKPSDCASVSRI----CHIDPLHGGNDCEFIAGNNEEISQVPNQKQSVSPAG-SSTSKIIYTKKLLEENLDE--ISP----ETAVGNEILAQSSLSLVSPSNSRDCVSKVADLNRFIGIGSNGEGSQAEKHKNKESELNTLPKLKLVQPQV--CQQSFPQDNFSKEPEREEK
#> B1_Ggal_NP_989500 1001 RNSKYLKTENLQEEKTTAENLSSVCEKFETCESACVSPV---------SCFVSSAACVHTVENQDVSKVANHGNLTTLLRICAARNEDGNRPQKGEQGSEKTLSTGIGVESKL-RLSPVRSNRSQSDQSNTEEHAFQRTGLNAV--------------SETYFSSESNQVEKAEVVDDKGLMQHFQPSPMLCPTACQQNP
#> B1_Xlav_AAL13037 1001 RHTSLLSSAKEQSGALLKQGSPSSEPKKTSPIHMLKKTESKHSKMSRNRRGKVKPSSNSAKNTTGQPDNLNN-PTQGVTGSLYNKQVMSDFPMRLNVDEEHTNASAKGSQSSVADKSTAHSN------------------------------------------------------------------------------
#> B1_Tnig_AAR89523 1001 AGAEENRYVCSVDPSAPKHAEPAAAGKTDKVLVDS--------------------------------------------------QNMPGSDLISDSHLASLKRKASGLYSGCSAEGGCASASSPLPPNLESKHAGQSSKDSAICFATEKPSQISGS-------------------------QANFMMEDTQSSTLLQSV
#>
#> B1_Hsap_NP_009225 1201 PGSNCKHPEIK---------KQ-EYEEVVQTVNTDFSPYLISDNLEQPMGSSHASQVCSETPDDLL-DDGEIKEDTSFAENDIKESSAVFSK----SVQKGELSRSP-SPFTHTH--LAQGYRRGAKKLESSEENLSSEDEELPCFQHLLFGKVNNIPSQSTRHST---VATECLSKNTEENLLSLKNS--LNDCSNQVI
#> B1_Ptro_AAG43492 1201 PESNCKHPEIK---------KQ-EYEEVVQTVNTDFSPCLISDNLEQPMGSSHASQVCSETPDDLL-DDGEIKEDTSFAENDIKESSAVFSK----SVQRGELSRSP-SPFTHTH--LAQGYRRGAKKLESSEENLSSEDEELPCFQHLLFGKVSNIPSQSTRHST---VATECLSKNTEENLLSLKNS--LNDCSNQVI
#> B1_Ggor_AAT44835 1201 PGSNCKHPEIK---------KQ-EYEEVVQTVNTDFSPCLISDNLEQPMGSSHASQVCSETPDDLL-DDGEIKEDTSFAKNDIKESSAVFSK----NVQRGELSRSP-SPFTHTH--LAQGYRRGAKKLESSEENLSSEDEELPCFQHLLFGKVSNIPSQSTRHST---VATECLSKNTEENLLSLKNS--LNDCSNQVI
#> B1_Ppyg_AAT44834 1201 PGSNGKHPEIK---------KQ-EYEEVLQTVNTDFSPCLISDNLEQPMRSSHASQVCSETPNDLL-DDGEIKEDTSFAENDIKESSAVFSK----SVQRGELSRSP-SPFTHTH--LAQGYRRGAKKLESSEENLSSEDEELPCFQHLLFGKVSNIPSQSTRHST---VATECLSKNTEENLLSLKNS--LNDYSNQVI
#> B1_Mmul_AAT44833 1201 PISNCKHPEIK---------KQ-EHEELVQTVNTDFSPCLISDNLEQPMGSSHASEVCSETPDDLL-DDGEIKEDTSFAANDIKESSAVFSK----SIQRGELSRSP-SPFTHTH--LAQGYQKEAKKLESSEENLSSEDEELPCFQHLLFGKVSNIPSQTTRHST---VATECLSKNTEENLLSLKNS--LTDCSNQVI
#> B1_Mmus_AAD00168 1201 PVSD-KYLEIK---------KQ-EGEAVC----ADFSPCLFSDHLEQSM-SGKVFQVCSETPDDLL-DDVEIQGHTSFGEGDIMERSAVFNG----SILRRESSRSP-SPVTHAS--KSQSLHRASRKLESSEESDSTEDEDLPCFQHLL-SRISNTPELTRCRSA---V-TQGIPEKAEGTQAPWKGS--SSDCNNEVI
#> B1_Cfam_AAC48663 1201 SLSNCKHPEMK---------WQGQSEGAVLSVSADFSPCLISDNPEQPMGSSRSSQVCSETPDDLL-NGDKIKGKVSFAESDIKEKSAVFSK----SVQSGEFSRSP-SPSDHTR--LAQGYQRGTKKLESSEENMSSEEEELPCFQHLIFGKVTNMPSQSTSHNA---VAAEGLSNKTEENLDSLKNS--LSDISNQVP
#> B1_Btau_NP_848668 1201 PVSNCHHPEIK---------RQGENEDMPQAVKADFSPCLISDNLEQPTGSRHASQVCSETPDNLL-NDDEIKENSHFAESDIKERSAVFSE----SVQKGEFRGSP-GPFTHTH--LAQGHQRGAGKLES-EETVSSEDEELPCFQQLLFGKVTSTLSPSTGCNT---VATEGLSKETEGNLESLKSG--LNDCSGQVT
#> B1_Mdom_temp1.pep 1201 EENVKLTPAI----------------------SADSSPCL-----EQTKENTHFTQVWSETPD-LLDSDGELKENTSFAESDIKEQSAVFGKNG--KSSQVRKSRKNLSPLVHRNPSLSWKSRRQARKLESSEEEASSEEGELPSFQDLIFGKAASTPFQPTKNKT---IAKEFSANEAKENLAFLNRN-NMSVNNLQIP
#> B1_Ggal_NP_989500 1201 AEFNCELTEKKIITRERSLVKG-NEERVIQTVSTGLSEFSVREALEESLKGHSDFTDLSETPDGLLCSDNDTEESASFYVTNRKDTSAVFVKRSGAAWVK-EVNDSVVSCKPRSE--GIQRFRRRAQKLQSSDEE-SSDDEDLPSFQELMFGKSVSTPLQIQKQVT--------SVVQSSANPSTLPCSECLNE-NNEQK
#> B1_Xlav_AAL13037 1201 -------------------------------VNTDF-----------------ISDINSATPDGLL-HYMDKAEGNSSPWDTTRDKNAALLE----------------SCLPSAY--GSTGEKTPIPKVQSSEEGSSQ-------------GLFNQKPKCSSSGKADQKNSSKNPGKSQCRILSDFSGSSNNGKCSSQEM
#> B1_Tnig_AAR89523 1201 KADAAKEP---------------------------------------------LNAPSSLTPSGL-----------------------------QTSVPGGEMTHSQSS---REL--STRRKRTKAQKLDCLSDSSDCAEEEFPCLAEIL--NETASPGEHATR----------------------PPACPSPDCVN---
#>
#> B1_Hsap_NP_009225 1401 LAKASQE---HHLSEETKCSASLFSSQCSELEDLTANTNTQDPFLI--GSSKQMRHQSESQGVGLSDKELVSDDE-ERGT--GLEEN--N-QEEQSMDSNLGEA-ASGCESETSVSEDCSGLSSQSDILTTQQRDTMQHNLIKLQQEMAELEAVLEQHGSQPSNSYPSIISDS-SALEDLRNPEQSTSEKAVLTSQKSSE
#> B1_Ptro_AAG43492 1401 LAKASQE---HHLSEETKCSASLFSSQCSELEDLTANTNTQDPFLI--GSSKQMRHQSESQGVGLSDKELVSDDE-ERGT--GLEEN--N-QEEQSMDSNLGEA-ASGCESETSVSEDCSGLSSQSDILTTQQRDTMQDNLIKLQQEMAELEAVLEQHGSQPSNSYPSIISDS-SALEDLQNPEQSTSEKAVLTSQKSSE
#> B1_Ggor_AAT44835 1401 LAKTSQE---HHLSEETKCSASLFSSQCSELEDLTANTNTQDPFLI--GSSKQMRHQSESQGVGLSDKELVSDDE-ERGT--GLEEN--N-QEEQSMDSNLGEA-ASGCESETSVSEDCSGLSSQSDILTTQQRDTMQDNLIKLQQEMAELEAVLEQHGSQPSNSYPSIISDS-SALEDLRNPEQSTSEKAVLTSQKSSE
#> B1_Ppyg_AAT44834 1401 LVKASQE---HHLSEETKCSASLFSSQCSELEDLTANTNTQDRFFI--GSSKQMRHQSESQGVGLSDKELVSDDE-ERGT--DLEEN--N-QEEQGVDSNLGEA-ASGYESETSVSEDCSGLSSQSDILTTQQRDTMQDNLIKLQQEMAELEAVLEQHGSQPSNSYPSIISDS-SALEDLRNPEQSTSEKAVLTSQKSSE
#> B1_Mmul_AAT44833 1401 LAKASQE---HHLSEETKCSGSLFSSQCSELEDLTANTNTQDPFLI--GSSKRMRHQSESQGVGLSDKELVSDDE-ERGT--GLEED--N-QEEQSVDSNLGEA-ASGYESETSVSEDCSRLSSQSEILTTQQRDTMQDNLIKLQQEMAELEAVLEQHGSQPSNSYPSIITDS-SALEDLRNPEQSTSEKAVLTSQKSSE
#> B1_Mmus_AAD00168 1401 MIEASQE---HQFSEDPRCSGRMFSSQNSAAQGSTANANSQDSNFI--PPSKQRSHQCGNEEAFLSDKELISDNE-EMAT--CLEED--N-DQEE--DSIIPDSEASGYESETNLSEDC----SQSDILTTQQRATMKYNLIKLQQEMAHLEAVLEQRGNQPSGHSPSLLADP-CALEDLPDLEPNMSGAAILTSKNINE
#> B1_Cfam_AAC48663 1401 SAKASQE---HHLSEEARCSGSLFSSQCSALEDLTVNTNTQDPFSMFDPTSKQVRHQSENLDV-LNDKELVSDDDDEREP--GLEED--SPQEEQSVDSDLGEV-ASGYESETSLSEDCSRLSSQSDILTTQQRDTMQDNLIKLQQEMAELEAVLEQHESQPSNSSPSLIADS-CSPEDLLNPEQNASER-VLTSEKSSD
#> B1_Btau_NP_848668 1401 SAKVSQE---HHLNEEARCSGSLFSSQCSAMEDLTTNTNTQDPFLMFERPSKQV-YQSESEEV-LSDKELVSDDE-ERET--GLEED--SCQEEQSVDSDLGEA-VSDHVSETSLSEDGVGLSSQSDILTTQQRDTMQDNLLKLQQEMAELEAVLERHGSQPSHSSASLTADS-RGPEHLLNLEQDTSERAILTSEKSRD
#> B1_Mdom_temp1.pep 1401 SGEASQD---HDSSQESECSGSLFSSHSNSLEDLAGKANDKDSALVF-GSSHQESSHSRTQEI-QSGSEMPKASQDKEERETDLDED--HHPKDQGVDSNLVVE-GSGYDSETSHPGDTSQLSFQSDILTTQQRDTMQDNLKKLQQEMAVLEAVLEQHSSQVTNNSSSQEPGL-CPSMDQPDLKQTNSERGA---EENVD
#> B1_Ggal_NP_989500 1401 TLEAALSNECASPSQESECSVNLFSSQSNMSEESVDGAQELKKTLT--QVSNVKKSKEAPQSCSGGLKRLKNN----------LNDE--Y-QEDPNMGANLGE--ASGYDSETSRVEDSHEPFSQGEILSTQQKNAMQNNLKKLQQEMAVLEAVLKQHGSQDAEVLPLCRELPYCSIGGTLGLERMRQETENVSEHDS--
#> B1_Xlav_AAL13037 1401 LSKAA----FPSLSQESQCSVSLFSTQSNMSQQSVDEDHKQD----------------------VLQDELVSPNKTKRTAISGTEETQISPLRDQYQNPDIDE--ASECESEASHTGDSSILSSQDELLNTQQRNYMKDSLKKLQQEMAALEAVLGQHGTQKLEAETTCIPSS-EHVTEMQEATEEEEEETYQGENLFVK
#> B1_Tnig_AAR89523 1401 ---------------SSQASVDLF-----------------------------------------------------------------------------------GTPHECAVN-DVASSQFSSEVLVTQQKIEMKKELVRLEKLMALVSEVLHEKEASPA------------------------KDMLDKTKQKITG
#>
#> B1_Hsap_NP_009225 1601 YPISQNPEGLSADKFEVS-ADSSTSKNKEPGVERSSPSKCPSLDDRWYMHSCSGSLQNRNYP-SQEELIKVVDVEEQQLEESGPHDLTETSYLPRQDLEGTPYLESGISLFSDDPESDPSEDRAPESARVGNIPSSTSALKVPQLKVAESAQSPAAAHTTDTAGY---NAMEESVSREKPELTASTERVNKRMSMVVSGL
#> B1_Ptro_AAG43492 1601 YPISQNPEGLSADKFEVS-ADSSTSKNKEPGVERSSPSKCPSLDDRWYMHSCSGSLQNTNYP-SQEELIKVVDVEEQQLEESGPHDLTETSYLPRQDLEGTPYLESGISLFSDDPESDPSEDKAPESAHVGNIPSSTSALKVPQLKVAESAQSPAAAHTTNTAGY---NAMEESVSREKPELTASTERVNKRMSMVVSGL
#> B1_Ggor_AAT44835 1601 YPISQNPEGLSADKFEVS-ADSSTSKNKEPGVERSSPSKCPSLDDRWYMHSCSGSLQNRNYP-SQEELIKVVDVEEQQLEESGPHDLTETSYLPRQDLEGTPYLESGISLFSDDPESDPSEDRAPESAHVGNIPSSTSALKVPQLKVAESAQSPAAAHTTNTAGY---HAMEESVSREKPELTASTERVNKRMSMVVSGL
#> B1_Ppyg_AAT44834 1601 YPISQNPEGLSADKFEVS-ADSSTNKNKEPGVERSSPSKCPSLDDRWYMHSCSGSLQNGNYP-SQEELIKVVDVEKQQLEESGPHDLTEPSYLPRQDLEGTPYLESGISLFSDDPESDASEDRAPESAHVGSIPSSTSALKVPQLKVAESAQSPAAAQTTNTAGY---NAMEESVSREKPELTASTERVNKRMSMVVSGL
#> B1_Mmul_AAT44833 1601 YPINQNPEGLSADKFEVS-ADSSTSKNKEPGVERSSPSKCQSLEDRWYVHSSSGSLQNGNYP-SQEELIKVVDVETQQLEKSGPHDLMEPSYLPRQDLDGTPYLESGISLFSDDPESDPSEDRAPESAHVGSIPSSTSALKVPQWQVAESAQSPAAAHNTNTAGY---NAMEESVSRENPKLTASTERVNKRMSLVVSGL
#> B1_Mmus_AAD00168 1601 NPVSQNLKSACDDKFQLQHLEGPTSGDDESGMGRPSPFKSPLAGSRGSAHGCSRHLQKRNSP-SQEELLQPAGSE----ASSEPHNSTGQSCLPRRELEGTPYLGSGISLFS---SRDPESESPKEPAHIGTTPASTSALKIPQGQVAFRSAAAAGAD----------KAVVGIVSKIKPELTSSEERADRDISMVVSGL
#> B1_Cfam_AAC48663 1601 SPISQNPESLSTDKFQVF-LDSSTSKNGEPGMIRSSPSQSRLLDTRWYVHSCPRSLQDTNCP-SQKELTKVVSMEEQQPTESEARDLMEQSYLSRPDLEGAPYLESGISLFSDDPESDPSSHRASELAHVSSMPTSTSALKLPQFQVEESAKSTAAVHIASTAGY---NKSEDSVGIEKPEVISSTRGVNKRISMVASGL
#> B1_Btau_NP_848668 1601 YSRSQNPESLSADKFPVS-LDSSTNKNKEPGMERSSASKFQLSYNRWYMHS-SRSLQDRNCP-SQKEPINVADMEEQQLAKREAQDLMG-SFLPRQDQEGTPYLKSGISLFSHEPESDPSEDRAAEPAHVHSMPPSASALKLSQFRVEESTKNPAAAHIANTTRC---NLREESMSKEKPEVISSTERSKKRLSMVASGL
#> B1_Mdom_temp1.pep 1601 HHKSQNVQSFSADKSRAF-PSDSNSKNEEAGVEGSSFSKSQMPNRVWSPLSRSRTPWEGDSP-S-EESGKVTTRKEEHKRQ------PNEQGVLKQNLEITTEPESGVKVNLKNIKSNPHQNKDPALRNIFNLPTSTSALRSSQLQTIEATNTTAPCAPAC-------NVRGQLKEDSASERLKKGGTGNRKISLVSSGL
#> B1_Ggal_NP_989500 1601 --------------------ETKLTKASVLPVLCGNVTKNPNSSSFSVKHPCPQTAEAT----------------------------------------------------------DSSAVAQGDNKSNVQVCKSKRSVCFPTSVLHNVA--------------------------GKENAASSGTTCRTEMSIVASGL
#> B1_Xlav_AAL13037 1601 TPPE-----------HVANGSTLIGQSTDNGLQGRNRPVSPSFLCQTRIEKDTPEIVCPAMGISKKSLLTQKELPSQHKQEIRP-DLETVNVTPTNPIDAQESLRQS----SKNKRGSRMLHNKRSTSNQGSFCLSVEAAESPQIPKQNKAEFGIARKSTSPTFASPARAKVPSVGFKSPVVS-----SRRNLSFVASGL
#> B1_Tnig_AAR89523 1601 SDVDHVLSCGQGEVFNQETFPEEEEQDANASLNDGKGAARPTGSKHSSITEL----------------------------------------------NSRISNTVGLSSAAKTLKSDGSPSDGHE---------------------------------------------------DKENNTPERARSLARMLLVTSGL
#>
#> B1_Hsap_NP_009225 1801 TPEEFMLVYKFARKHHITLTNLITEETTHVVMKTDAEFVCERTLKYFLGIAGGKWVVSYFWVTQSIKERKMLNEHDFEVRGDVVNGRNHQGPKRARESQDR-----KIFRGLEICCYGPFTNMPTDQLEWMVQLCGASVVKELSSFTLGTGVHPIVVVQPDAWTEDNGFHAIGQMCEAPVVTREWVLDSVALYQCQELDT
#> B1_Ptro_AAG43492 1801 TPEEFMLVYKFARKHHITLTNLITEETTHVVMKTDAEFVCERTLKYFLGIAGGKWVVSYFWVTQSIKERKMLNEHDFEVRGDVVNGRNHQGPKRARESQDR-----KIFRGLEICCYGPFTNMPTDQLEWMVQLCGASVVKELSSFTLGTGVHPIVVVQPDAWTEDNGFHAIGQMCEAPVVTREWVLDSVALYQCQELDT
#> B1_Ggor_AAT44835 1801 TPEEFMLVYKFARKHHITLTNLITEETTHVVMKTDAEFVCERTLKYFLGIAGGKWVVSYFWVTQSIKEGKMLNEHDFEVRGDVVNGRNHQGPKRARESQDR-----KIFRGLDICCYGPFTNMPTDQLEWMVQLCGASVVKELSSFTLGTGVHPIVVVQPDAWTEDNGFHAIGQMCEAPVVTREWVLDSVALYQCQELDT
#> B1_Ppyg_AAT44834 1801 TPEEFMLVYKFARKHHITLTNLITEETTHVVMKTDAEFVCERTLKYFLGIAGGKWVVSYFWVTQSIKERKMLNEHDFEVRGDVVNGRNHQGPKRARESQDR-----KIFRGLEICCYGPFTNMPTDQLEWIVQLCGASVVKELSSFTLGTGVHPIVVVQPDAWTEDNGFHAIGQMCEAPVVTREWVLDSVALYQCQELDT
#> B1_Mmul_AAT44833 1801 TPEEFMLVYKFARRYHIALTNLISEETTHVVMKTDAEFVCERTLKYFLGIAGGKWVVSYFWVTQSIKERKMLNEHDFEVRGDVVNGRNHQGPKRARESPDR-----KIFRGLEICCYGPFTNMPTDQLEWMVQLCGASVVKELSSFTLGTGFHPIVVVQPDAWTEDNGFHAIGQMCEAPVVTREWVLDSVALYQCQELDT
#> B1_Mmus_AAD00168 1801 TPKEVMTVQKFAEKYRLTLTDAITEETTHVIIKTDAEFVCERTLKYFLGIAGGKWIVSYSWVVRSIQERRLLNVHEFEVKGDVVTGRNHQGPRRSRESRE------KLFKGLQVYCCEPFTNMPKDELERMLQLCGASVVKELPSLTHDTGAHLVVIVQPSAWTEDSNCPDIGQLCKARLVMWDWVLDSLSSYRCRDLDA
#> B1_Cfam_AAC48663 1801 TPKEFMLVHKFARKHHISLTNLISEETTHVIMKTDAEFVCERTLKYFLGIAGGKWVVSYFWVTQSIKERKILDEHDFEVRGDVVNGRNHQGPKRARESQDRESQDRKIFRGLEICCYGPFTNMPTDQLEWMVHLCGASVVKEPSLFTLSKGTHPVVVVQPDAWTEDSGFHAIGQMCEAPVVTREWVLDSVALYQCQELDT
#> B1_Btau_NP_848668 1801 TPKELMLVQKFARKHHVTLTNLITEETTHVIMKTDPEFVCERTLKYFLGIAGGKWVVSYFWVTQSIKEGKMLDEHDFEVRGDVVNGRNHQGPKRARESRDK-----KIFKGLEICCYGPFTNMPTDQLEWMVQLCGASVVKEPSSFTPDQGTHPVVVVQPDAWTEDAGFHVIGQMCEAPVVTREWVLDSVALYQCQELDT
#> B1_Mdom_temp1.pep 1801 TPKENMLVQKFARKTHSTVSHQITEGTTHVIMKTDAEFVCERTLKYFLGIAGGKWVVSFLWVVQSFKEGKMLPECDFEVRGDVINGRNHRGPERARESQGM-----KIFRGLEICCYGPFTDMSTDQLEWMVQLCGASVVKKPSSLRFRVGSSPVVVVQPDAWEDDSSFQEIGLVCEAPVVTREWVLDSVACYQRQELDT
#> B1_Ggal_NP_989500 1801 NQSEHLMVQKFARKTQSTFSNHITDGTTHVIMKTDEELVCERTLKYFLGIAGRKWVVSYQWIIQSFKEGRILDEEHFEVKGDVINGRNHQGPKRARQSPAE-----KIFKDFEICCCGPFTDMTTGHLEWIVELCGASVVKQLHLFTHKVNSTAVVVVQPDAWMEGTSYEAIQRKNNVAVVTREWVLDSVACFECQELDA
#> B1_Xlav_AAL13037 1801 NQCEMALVQRFSKTTQSILSSRITDSTTHVIMKTDAELVCERTLKYFQGIASRKWVVSYEWVVQSFREGQILDEYDFEVKGDVINGRNHRGPRRSRLGSDG-----LLLIDFEICFFGSFTDMTLDDLEWMVSECGATVVKKLQFFKKKHNVTSLVIVQPDASTEVRDYTEIRKKHKALVVTREWLMDSVATYRLQKFDA
#> B1_Tnig_AAR89523 1801 GPSQQITVKKFAKRIGATVVSQVTPEVTHVVMHTDEQLVCERTLKYFLGIAGRKWVVSFQWISECIKQKKLLNETLFEVRGDVVNGFDHQGPMKARATADN----NLLMKGYSICFQGPFTDMTTAEMELMVELCGATVVQDPLLLDGKRTSHQLIVVQSGS----ESSRSVSG--KATVVTRGWLLDSVATYTIQNLKN
#>
#> B1_Hsap_NP_009225 2001 YLIPQIPHSHY-------
#> B1_Ptro_AAG43492 2001 YLIPQIPHSHY-------
#> B1_Ggor_AAT44835 2001 YLIPQIPHSHY-------
#> B1_Ppyg_AAT44834 2001 YLIPQIPHSHY-------
#> B1_Mmul_AAT44833 2001 YLIPQIPHSHY-------
#> B1_Mmus_AAD00168 2001 YLVQNITCDSSEPQDSND
#> B1_Cfam_AAC48663 2001 YLIPQIPRTAADSSQPCV
#> B1_Btau_NP_848668 2001 YLVP--------------
#> B1_Mdom_temp1.pep 2001 YLISQTSPSLC-------
#> B1_Ggal_NP_989500 2001 YLVSQD------------
#> B1_Xlav_AAL13037 2001 YLA---------------
#> B1_Tnig_AAR89523 2001 YRADLRAA----------
To read in the 117 BRCA1 missense variants, i.e., the 117
substitutions, we use the function
read_substitutions()
:
path <- system.file("extdata/lee2010_sub.txt", package = 'agvgd', mustWork = TRUE)
missense_variants <- read_substitutions(path)
missense_variants
#> # A tibble: 117 × 4
#> res poi ref sub
#> <int> <int> <chr> <chr>
#> 1 1647 NA N K
#> 2 1651 NA S F
#> 3 1652 NA M T
#> 4 1652 NA M I
#> 5 1653 NA V M
#> 6 1655 NA S F
#> 7 1656 NA G D
#> 8 1662 NA F S
#> 9 1663 NA M L
#> 10 1663 NA M K
#> # ℹ 107 more rows
The list of missense variants comes with the indication of the
residue number in the protein sequence, i.e. column res
in
missense_variants
. As we’ll see next, the function
agvgd()
uses the alignment positions (column
poi
in missense_variants
) instead to refer to
those positions in the alignment.
The difference between res
and poi
is that
res
counts the residues in the protein primary sequence
reference, while poi
refers to the positions in the
alignment, accounting for gaps.
So, before we proceed, we will update the
missense_variants
tibble and replace the NA
values with the corresponding poi
values. For that we use
the function res_to_poi()
:
missense_variants$poi <- res_to_poi(brca1_alignment, missense_variants$res)
missense_variants
#> # A tibble: 117 × 4
#> res poi ref sub
#> <int> <int> <chr> <chr>
#> 1 1647 1790 N K
#> 2 1651 1794 S F
#> 3 1652 1795 M T
#> 4 1652 1795 M I
#> 5 1653 1796 V M
#> 6 1655 1798 S F
#> 7 1656 1799 G D
#> 8 1662 1805 F S
#> 9 1663 1806 M L
#> 10 1663 1806 M K
#> # ℹ 107 more rows
Running agvgd()
Run Align-GVGD with the function agvgd()
:
scores <- agvgd(alignment = brca1_alignment,
poi = missense_variants$poi,
sub = missense_variants$sub)
print(scores, n = Inf)
#> # A tibble: 117 × 7
#> res poi ref sub gv gd prediction
#> <int> <int> <chr> <chr> <dbl> <dbl> <chr>
#> 1 1647 1790 N K 177. 0 C0
#> 2 1651 1794 S F 144. 21.3 C0
#> 3 1652 1795 M T 30.3 81.0 C25
#> 4 1652 1795 M I 30.3 0 C0
#> 5 1653 1796 V M 0 21.5 C15
#> 6 1655 1798 S F 0 155. C65
#> 7 1656 1799 G D 0 93.8 C65
#> 8 1662 1805 F S 161. 25.1 C0
#> 9 1663 1806 M L 96.2 0 C0
#> 10 1663 1806 M K 96.2 57.1 C0
#> 11 1664 1807 L P 92.4 28.9 C0
#> 12 1665 1808 V M 0 21.5 C15
#> 13 1669 1812 A S 99.1 0 C0
#> 14 1682 1825 E K 117. 36.7 C0
#> 15 1682 1825 E V 117. 43.3 C0
#> 16 1685 1828 T I 0 89.3 C65
#> 17 1685 1828 T A 0 58.0 C55
#> 18 1689 1832 M T 10.1 81.0 C55
#> 19 1689 1832 M R 10.1 90.6 C55
#> 20 1691 1834 T K 0 77.7 C65
#> 21 1691 1834 T I 0 89.3 C65
#> 22 1692 1835 D H 0 81.2 C65
#> 23 1692 1835 D N 0 23.0 C15
#> 24 1692 1835 D Y 0 160. C65
#> 25 1695 1838 F L 21.8 0 C0
#> 26 1696 1839 V L 0 31.8 C25
#> 27 1697 1840 C R 0 180. C65
#> 28 1699 1842 R W 0 101. C65
#> 29 1699 1842 R Q 0 42.8 C35
#> 30 1699 1842 R L 0 102. C65
#> 31 1700 1843 T A 0 58.0 C55
#> 32 1706 1849 G A 0 60 C55
#> 33 1706 1849 G E 0 97.8 C65
#> 34 1708 1851 A V 0 64.4 C55
#> 35 1708 1851 A E 0 107. C65
#> 36 1713 1856 V A 29.6 64.4 C25
#> 37 1714 1857 V G 0 109. C65
#> 38 1715 1858 S R 0 109. C65
#> 39 1715 1858 S N 0 46.2 C45
#> 40 1715 1858 S C 0 112. C65
#> 41 1718 1861 W C 0 214. C65
#> 42 1718 1861 W S 0 177. C65
#> 43 1720 1863 T A 142. 1.01 C0
#> 44 1722 1865 S F 112. 125. C15
#> 45 1726 1869 R G 131. 0 C0
#> 46 1730 1873 N S 108. 2.79 C0
#> 47 1733 1876 D G 172. 51.7 C0
#> 48 1734 1877 F S 0 155. C65
#> 49 1736 1879 V A 0 64.4 C55
#> 50 1736 1879 V G 0 109. C65
#> 51 1738 1881 G E 0 97.8 C65
#> 52 1738 1881 G R 0 125. C65
#> 53 1739 1882 D Y 0 160. C65
#> 54 1739 1882 D V 0 152. C65
#> 55 1739 1882 D E 0 44.6 C35
#> 56 1739 1882 D G 0 93.8 C65
#> 57 1741 1884 V G 29.6 109. C35
#> 58 1746 1889 H N 0 68.4 C65
#> 59 1749 1892 P R 0 103. C65
#> 60 1751 1894 R P 26 96.5 C35
#> 61 1751 1894 R Q 26 38.2 C0
#> 62 1752 1895 A V 99.1 63.6 C0
#> 63 1752 1895 A P 99.1 1.7 C0
#> 64 1753 1896 R T 0 71.0 C65
#> 65 1761 1909 F S 30.3 135. C45
#> 66 1763 1911 G V 93.8 77.6 C15
#> 67 1764 1912 L P 35.7 85.7 C25
#> 68 1766 1914 I S 29.6 123. C35
#> 69 1771 1919 P L 73.4 97.8 C15
#> 70 1771 1919 P R 73.4 95.1 C15
#> 71 1773 1921 T I 0 89.3 C65
#> 72 1773 1921 T S 0 57.8 C55
#> 73 1775 1923 M R 0 91.6 C65
#> 74 1775 1923 M K 0 94.5 C65
#> 75 1778 1926 D G 134. 0 C0
#> 76 1778 1926 D Y 134. 88.6 C15
#> 77 1778 1926 D N 134. 2.03 C0
#> 78 1780 1928 L P 14.3 86.6 C45
#> 79 1783 1931 M I 10.1 0 C0
#> 80 1783 1931 M T 10.1 81.0 C55
#> 81 1783 1931 M L 10.1 4.86 C0
#> 82 1785 1933 Q H 100. 0 C0
#> 83 1787 1935 C S 0 112. C65
#> 84 1788 1936 G V 0 109. C65
#> 85 1788 1936 G D 0 93.8 C65
#> 86 1789 1937 A S 0 99.1 C65
#> 87 1794 1942 E D 106. 0 C0
#> 88 1803 1951 G A 87.3 49.4 C0
#> 89 1804 1952 V D 155. 61.5 C0
#> 90 1805 1953 H P 91.6 16.3 C0
#> 91 1806 1954 P A 156. 0 C0
#> 92 1808 1956 V A 29.6 64.4 C25
#> 93 1809 1957 V F 29.6 21.3 C0
#> 94 1809 1957 V A 29.6 64.4 C25
#> 95 1810 1958 V G 0 109. C65
#> 96 1811 1959 Q R 0 42.8 C35
#> 97 1818 1966 D G 251. 0 C0
#> 98 1819 1967 N S 152. 0 C0
#> 99 1823 1971 A T 160. 0 C0
#> 100 1826 1974 Q H 172. 0 C0
#> 101 1830 1978 A T 64.4 49.4 C0
#> 102 1833 1981 V M 0 21.5 C15
#> 103 1835 1983 R P 101. 92.7 C15
#> 104 1836 1984 E K 113. 46.1 C0
#> 105 1837 1985 W R 0 101. C65
#> 106 1837 1985 W G 0 184. C65
#> 107 1837 1985 W C 0 214. C65
#> 108 1838 1986 V E 31.8 121. C35
#> 109 1841 1989 S N 0 46.2 C45
#> 110 1841 1989 S R 0 109. C65
#> 111 1843 1991 A P 99.1 1.7 C0
#> 112 1844 1992 L R 217. 24.8 C0
#> 113 1851 1999 D E 101. 0 C0
#> 114 1853 2001 Y C 0 194. C65
#> 115 1854 2002 L P 102. 79.5 C15
#> 116 1856 2004 P S 238. 0 C0
#> 117 1859 2007 P R 354. 0 C0