Getting PGS Performance Metrics
Source:vignettes/getting-pgs-performance-metrics.Rmd
getting-pgs-performance-metrics.Rmd
PGS performance metrics
Performance metrics assess the validity of a PGS in a Sample Set. This assessment is performed on samples not used for score development.
Performance metrics are retrieved with the function get_performance_metrics()
. The returned data is provided as an S4 object of class performance_metrics
.
Common metrics include:
- standardized effect sizes: odds ratios or hazard ratios, and regression coefficients \(\beta\), see slot
pgs_effect_sizes
; - classification accuracy metrics: area under the receiver operating characteristic curve, C-index and area under the precision-recall curve, see slot
pgs_classification_metrics
; - other relevant metrics: calibration (\(\chi^2\))), see slot
pgs_other_metrics
.
The covariates used in the model (most commonly age, sex and genetic principal components to account for the population structure) are also recorded for each set of metrics. These can be found in the slot demographics
.
Getting PGS performance metrics
In the PGS Catalog, performance metrics have been catalogued and have an associated identifier that starts with the prefix "PPM"
. To retrieve the performance metrics associated with one assessment of a polygenic score, you can use directly its identifier:
library(quincunx)
get_performance_metrics(ppm_id = 'PPM000001')
#> An object of class "performance_metrics"
#> Slot "performance_metrics":
#> # A tibble: 1 × 5
#> ppm_id pgs_id reported_trait covariates comments
#> <chr> <chr> <chr> <chr> <chr>
#> 1 PPM000001 PGS000001 All breast cancer NA NA
#>
#> Slot "publications":
#> # A tibble: 1 × 8
#> ppm_id pgp_id pubmed_id publication_date publication title autho…¹ doi
#> <chr> <chr> <chr> <date> <chr> <chr> <chr> <chr>
#> 1 PPM000001 PGP000001 25855707 2015-04-08 J Natl Can… Pred… Mavadd… 10.1…
#> # … with abbreviated variable name ¹author_fullname
#>
#> Slot "sample_sets":
#> # A tibble: 1 × 2
#> ppm_id pss_id
#> <chr> <chr>
#> 1 PPM000001 PSS000001
#>
#> Slot "samples":
#> # A tibble: 1 × 16
#> ppm_id pss_id sampl…¹ stage sampl…² sampl…³ sampl…⁴ sampl…⁵ pheno…⁶ ances…⁷
#> <chr> <chr> <int> <chr> <int> <int> <int> <dbl> <chr> <chr>
#> 1 PPM000001 PSS00… 1 eval 67054 33673 33381 NA All br… Europe…
#> # … with 6 more variables: ancestry <chr>, country <chr>,
#> # ancestry_additional_description <chr>, study_id <chr>, pubmed_id <chr>,
#> # cohorts_additional_description <chr>, and abbreviated variable names
#> # ¹sample_id, ²sample_size, ³sample_cases, ⁴sample_controls,
#> # ⁵sample_percent_male, ⁶phenotype_description, ⁷ancestry_category
#> # ℹ Use `colnames()` to see all variable names
#>
#> Slot "demographics":
#> # A tibble: 0 × 12
#> # … with 12 variables: ppm_id <chr>, pss_id <chr>, sample_id <int>,
#> # variable <chr>, estimate_type <chr>, estimate <dbl>, unit <chr>,
#> # variability_type <chr>, variability <dbl>, interval_type <chr>,
#> # interval_lower <dbl>, interval_upper <dbl>
#> # ℹ Use `colnames()` to see all variable names
#>
#> Slot "cohorts":
#> # A tibble: 33 × 5
#> ppm_id pss_id sample_id cohort_symbol cohort_name
#> <chr> <chr> <int> <chr> <chr>
#> 1 PPM000001 PSS000001 1 ABCFS Australian Breast Cancer Family …
#> 2 PPM000001 PSS000001 1 MCCS Melbourne Collaborative Cohort S…
#> 3 PPM000001 PSS000001 1 HMBCS Hannover-Minsk Breast Cancer Stu…
#> 4 PPM000001 PSS000001 1 LMBC Leuven Multidisciplinary Breast …
#> 5 PPM000001 PSS000001 1 MTLGEBCS Montreal Gene-Environment Breast…
#> 6 PPM000001 PSS000001 1 CGPS Copenhagen General Population St…
#> 7 PPM000001 PSS000001 1 KBCP Kuopio Breast Cancer Project
#> 8 PPM000001 PSS000001 1 OBCS Oulu Breast Cancer Study
#> 9 PPM000001 PSS000001 1 CECILE CECILE Breast Cancer Study
#> 10 PPM000001 PSS000001 1 BBCC Bavarian Breast Cancer Cases and…
#> # … with 23 more rows
#> # ℹ Use `print(n = ...)` to see more rows
#>
#> Slot "pgs_effect_sizes":
#> # A tibble: 1 × 11
#> ppm_id effec…¹ estim…² estim…³ estim…⁴ unit varia…⁵ varia…⁶ inter…⁷ inter…⁸
#> <chr> <int> <chr> <chr> <dbl> <chr> <chr> <dbl> <chr> <dbl>
#> 1 PPM0000… 1 Odds R… OR 1.55 NA se NA ci 1.52
#> # … with 1 more variable: interval_upper <dbl>, and abbreviated variable names
#> # ¹effect_size_id, ²estimate_type_long, ³estimate_type, ⁴estimate,
#> # ⁵variability_type, ⁶variability, ⁷interval_type, ⁸interval_lower
#> # ℹ Use `colnames()` to see all variable names
#>
#> Slot "pgs_classification_metrics":
#> # A tibble: 1 × 11
#> ppm_id class…¹ estim…² estim…³ estim…⁴ unit varia…⁵ varia…⁶ inter…⁷ inter…⁸
#> <chr> <int> <chr> <chr> <dbl> <chr> <chr> <dbl> <chr> <dbl>
#> 1 PPM0000… 1 Concor… C-index 0.622 NA se NA ci 0.619
#> # … with 1 more variable: interval_upper <dbl>, and abbreviated variable names
#> # ¹classification_metrics_id, ²estimate_type_long, ³estimate_type, ⁴estimate,
#> # ⁵variability_type, ⁶variability, ⁷interval_type, ⁸interval_lower
#> # ℹ Use `colnames()` to see all variable names
#>
#> Slot "pgs_other_metrics":
#> # A tibble: 0 × 11
#> # … with 11 variables: ppm_id <chr>, other_metrics_id <int>,
#> # estimate_type_long <chr>, estimate_type <chr>, estimate <dbl>, unit <chr>,
#> # variability_type <chr>, variability <dbl>, interval_type <chr>,
#> # interval_lower <dbl>, interval_upper <dbl>
#> # ℹ Use `colnames()` to see all variable names
Searching by PGS identifier
Alternatively, you could also search by the associated PGS identifier, i.e. "PGS000001"
:
get_performance_metrics(pgs_id = 'PGS000001')
#> An object of class "performance_metrics"
#> Slot "performance_metrics":
#> # A tibble: 13 × 5
#> ppm_id pgs_id reported_trait covar…¹ comme…²
#> <chr> <chr> <chr> <chr> <chr>
#> 1 PPM000001 PGS000001 All breast cancer NA NA
#> 2 PPM000011 PGS000001 Invasive breast cancer study,… NA
#> 3 PPM000114 PGS000001 Breast cancer in BRCA1 mutation carriers Countr… NA
#> 4 PPM000117 PGS000001 Breast cancer in BRCA2 mutation carriers Countr… NA
#> 5 PPM000944 PGS000001 Metachronous contralateral breast cancer Country NA
#> 6 PPM000945 PGS000001 Invasive metachronous contralateral brea… Country NA
#> 7 PPM000961 PGS000001 Metachronous contralateral breast cancer Country NA
#> 8 PPM000962 PGS000001 Invasive metachronous contralateral brea… Country NA
#> 9 PPM002150 PGS000001 Breast cancer in CHEK2 mutation carriers Year o… Only 7…
#> 10 PPM002151 PGS000001 Breast cancer in CHEK2 mutation carriers Year o… Only 7…
#> 11 PPM002152 PGS000001 Breast cancer in CHEK2 mutation carriers… Year o… Only 7…
#> 12 PPM002153 PGS000001 Breast cancer in CHEK2 mutation carriers… Year o… Only 7…
#> 13 PPM002154 PGS000001 Breast cancer in CHEK2 mutation carriers… Year o… Only 7…
#> # … with abbreviated variable names ¹covariates, ²comments
#>
#> Slot "publications":
#> # A tibble: 13 × 8
#> ppm_id pgp_id pubmed_id publication_date publicat…¹ title autho…² doi
#> <chr> <chr> <chr> <date> <chr> <chr> <chr> <chr>
#> 1 PPM000001 PGP000001 25855707 2015-04-08 J Natl Ca… Pred… Mavadd… 10.1…
#> 2 PPM000011 PGP000002 30554720 2018-12-13 Am J Hum … Poly… Mavadd… 10.1…
#> 3 PPM000114 PGP000033 28376175 2017-07-01 J Natl Ca… Eval… Kuchen… 10.1…
#> 4 PPM000117 PGP000033 28376175 2017-07-01 J Natl Ca… Eval… Kuchen… 10.1…
#> 5 PPM000944 PGP000109 33022221 2020-10-05 Am J Hum … Brea… Kramer… 10.1…
#> 6 PPM000945 PGP000109 33022221 2020-10-05 Am J Hum … Brea… Kramer… 10.1…
#> 7 PPM000961 PGP000109 33022221 2020-10-05 Am J Hum … Brea… Kramer… 10.1…
#> 8 PPM000962 PGP000109 33022221 2020-10-05 Am J Hum … Brea… Kramer… 10.1…
#> 9 PPM002150 PGP000198 33372680 2020-12-29 J Natl Ca… Perf… Borde J 10.1…
#> 10 PPM002151 PGP000198 33372680 2020-12-29 J Natl Ca… Perf… Borde J 10.1…
#> 11 PPM002152 PGP000198 33372680 2020-12-29 J Natl Ca… Perf… Borde J 10.1…
#> 12 PPM002153 PGP000198 33372680 2020-12-29 J Natl Ca… Perf… Borde J 10.1…
#> 13 PPM002154 PGP000198 33372680 2020-12-29 J Natl Ca… Perf… Borde J 10.1…
#> # … with abbreviated variable names ¹publication, ²author_fullname
#>
#> Slot "sample_sets":
#> # A tibble: 13 × 2
#> ppm_id pss_id
#> <chr> <chr>
#> 1 PPM000001 PSS000001
#> 2 PPM000011 PSS000004
#> 3 PPM000114 PSS000070
#> 4 PPM000117 PSS000071
#> 5 PPM000944 PSS000484
#> 6 PPM000945 PSS000486
#> 7 PPM000961 PSS000484
#> 8 PPM000962 PSS000486
#> 9 PPM002150 PSS001054
#> 10 PPM002151 PSS001054
#> 11 PPM002152 PSS001054
#> 12 PPM002153 PSS001054
#> 13 PPM002154 PSS001054
#>
#> Slot "samples":
#> # A tibble: 13 × 16
#> ppm_id pss_id sampl…¹ stage sampl…² sampl…³ sampl…⁴ sampl…⁵ pheno…⁶ ances…⁷
#> <chr> <chr> <int> <chr> <int> <int> <int> <dbl> <chr> <chr>
#> 1 PPM0000… PSS00… 1 eval 67054 33673 33381 NA "All b… Europe…
#> 2 PPM0000… PSS00… 1 eval 29751 11428 18323 0 "Invas… Europe…
#> 3 PPM0001… PSS00… 1 eval 15252 7797 7455 0 "BRCA1… Europe…
#> 4 PPM0001… PSS00… 1 eval 8211 4330 3881 0 "BRCA2… Europe…
#> 5 PPM0009… PSS00… 1 eval 56068 1027 55041 0 "Women… Europe…
#> 6 PPM0009… PSS00… 1 eval 56068 923 55145 0 "Women… Europe…
#> 7 PPM0009… PSS00… 1 eval 56068 1027 55041 0 "Women… Europe…
#> 8 PPM0009… PSS00… 1 eval 56068 923 55145 0 "Women… Europe…
#> 9 PPM0021… PSS00… 1 eval 760 561 199 0 "All w… Europe…
#> 10 PPM0021… PSS00… 1 eval 760 561 199 0 "All w… Europe…
#> 11 PPM0021… PSS00… 1 eval 760 561 199 0 "All w… Europe…
#> 12 PPM0021… PSS00… 1 eval 760 561 199 0 "All w… Europe…
#> 13 PPM0021… PSS00… 1 eval 760 561 199 0 "All w… Europe…
#> # … with 6 more variables: ancestry <chr>, country <chr>,
#> # ancestry_additional_description <chr>, study_id <chr>, pubmed_id <chr>,
#> # cohorts_additional_description <chr>, and abbreviated variable names
#> # ¹sample_id, ²sample_size, ³sample_cases, ⁴sample_controls,
#> # ⁵sample_percent_male, ⁶phenotype_description, ⁷ancestry_category
#> # ℹ Use `colnames()` to see all variable names
#>
#> Slot "demographics":
#> # A tibble: 0 × 12
#> # … with 12 variables: ppm_id <chr>, pss_id <chr>, sample_id <int>,
#> # variable <chr>, estimate_type <chr>, estimate <dbl>, unit <chr>,
#> # variability_type <chr>, variability <dbl>, interval_type <chr>,
#> # interval_lower <dbl>, interval_upper <dbl>
#> # ℹ Use `colnames()` to see all variable names
#>
#> Slot "cohorts":
#> # A tibble: 218 × 5
#> ppm_id pss_id sample_id cohort_symbol cohort_name
#> <chr> <chr> <int> <chr> <chr>
#> 1 PPM000001 PSS000001 1 ABCFS Australian Breast Cancer Family …
#> 2 PPM000001 PSS000001 1 MCCS Melbourne Collaborative Cohort S…
#> 3 PPM000001 PSS000001 1 HMBCS Hannover-Minsk Breast Cancer Stu…
#> 4 PPM000001 PSS000001 1 LMBC Leuven Multidisciplinary Breast …
#> 5 PPM000001 PSS000001 1 MTLGEBCS Montreal Gene-Environment Breast…
#> 6 PPM000001 PSS000001 1 CGPS Copenhagen General Population St…
#> 7 PPM000001 PSS000001 1 KBCP Kuopio Breast Cancer Project
#> 8 PPM000001 PSS000001 1 OBCS Oulu Breast Cancer Study
#> 9 PPM000001 PSS000001 1 CECILE CECILE Breast Cancer Study
#> 10 PPM000001 PSS000001 1 BBCC Bavarian Breast Cancer Cases and…
#> # … with 208 more rows
#> # ℹ Use `print(n = ...)` to see more rows
#>
#> Slot "pgs_effect_sizes":
#> # A tibble: 13 × 11
#> ppm_id effec…¹ estim…² estim…³ estim…⁴ unit varia…⁵ varia…⁶ inter…⁷ inter…⁸
#> <chr> <int> <chr> <chr> <dbl> <chr> <chr> <dbl> <chr> <dbl>
#> 1 PPM000… 1 Odds R… OR 1.55 NA se NA ci 1.52
#> 2 PPM000… 1 Odds R… OR 1.46 NA se NA ci 1.42
#> 3 PPM000… 1 Hazard… HR 1.13 NA se NA ci 1.1
#> 4 PPM000… 1 Hazard… HR 1.22 NA se NA ci 1.17
#> 5 PPM000… 1 Hazard… HR 1.21 NA se NA ci 1.14
#> 6 PPM000… 1 Hazard… HR 1.21 NA se NA ci 1.13
#> 7 PPM000… 1 Hazard… HR 1.21 NA se NA ci 1.14
#> 8 PPM000… 1 Hazard… HR 1.21 NA se NA ci 1.13
#> 9 PPM002… 1 Hazard… HR 1.71 NA se NA ci 1.36
#> 10 PPM002… 1 Hazard… HR 2.29 NA se NA ci 1.56
#> 11 PPM002… 1 Hazard… HR 1.43 NA se NA ci 1.04
#> 12 PPM002… 1 Hazard… HR 2.32 NA se NA ci 1.69
#> 13 PPM002… 1 Hazard… HR 1.59 NA se NA ci 1.07
#> # … with 1 more variable: interval_upper <dbl>, and abbreviated variable names
#> # ¹effect_size_id, ²estimate_type_long, ³estimate_type, ⁴estimate,
#> # ⁵variability_type, ⁶variability, ⁷interval_type, ⁸interval_lower
#> # ℹ Use `colnames()` to see all variable names
#>
#> Slot "pgs_classification_metrics":
#> # A tibble: 2 × 11
#> ppm_id class…¹ estim…² estim…³ estim…⁴ unit varia…⁵ varia…⁶ inter…⁷ inter…⁸
#> <chr> <int> <chr> <chr> <dbl> <chr> <chr> <dbl> <chr> <dbl>
#> 1 PPM0000… 1 Concor… C-index 0.622 NA se NA ci 0.619
#> 2 PPM0000… 1 Area U… AUROC 0.603 NA se NA ci NA
#> # … with 1 more variable: interval_upper <dbl>, and abbreviated variable names
#> # ¹classification_metrics_id, ²estimate_type_long, ³estimate_type, ⁴estimate,
#> # ⁵variability_type, ⁶variability, ⁷interval_type, ⁸interval_lower
#> # ℹ Use `colnames()` to see all variable names
#>
#> Slot "pgs_other_metrics":
#> # A tibble: 0 × 11
#> # … with 11 variables: ppm_id <chr>, other_metrics_id <int>,
#> # estimate_type_long <chr>, estimate_type <chr>, estimate <dbl>, unit <chr>,
#> # variability_type <chr>, variability <dbl>, interval_type <chr>,
#> # interval_lower <dbl>, interval_upper <dbl>
#> # ℹ Use `colnames()` to see all variable names
As you can see, when you search by 'PGS000001'
, we get multiple PPM identifiers (PPM000001 included). This is because a PGS could have been assessed multiple independent times, each assessment resulting in its own performance metrics data entry, with its own associated identifier.
Vectorised search
The function get_performance_metrics()
is vectorised over ppm_id
and pgs_id
and you could readily retrieve performance metrics for a set of polygenic scores by providing a vector of identifiers (e.g. PGSes 42 thru 46):
ppm <- get_performance_metrics(pgs_id = sprintf("PGS%06d", 42:46))
print(ppm@performance_metrics, n = Inf)
#> # A tibble: 26 × 5
#> ppm_id pgs_id reported_trait covar…¹ comme…²
#> <chr> <chr> <chr> <chr> <chr>
#> 1 PPM000101 PGS000042 Coeliac disease in HLA-DQ2.5 carriers NA NA
#> 2 PPM000102 PGS000043 Venous thromboembolism age, s… NA
#> 3 PPM000103 PGS000043 Venous thromboembolism age, 1… NA
#> 4 PPM001639 PGS000043 Thromboembolic disease event in individu… Age at… Includ…
#> 5 PPM001640 PGS000043 Thromboembolic disease event in individu… Diseas… Includ…
#> 6 PPM001641 PGS000043 Thromboembolic disease event in in indiv… Age at… Includ…
#> 7 PPM001939 PGS000043 Venous Thromboembolism Age, s… 273 of…
#> 8 PPM001940 PGS000043 Venous Thromboembolism Age, s… 273 of…
#> 9 PPM001941 PGS000043 Venous Thromboembolism NA 273 of…
#> 10 PPM001942 PGS000043 Venous Thromboembolism Age, o… 273 of…
#> 11 PPM001943 PGS000043 Venous Thromboembolism in individuals wi… Age, s… 273 of…
#> 12 PPM001944 PGS000043 Venous Thromboembolism in individuals wi… Age, s… 273 of…
#> 13 PPM000104 PGS000044 Elevated serum prostate-specific antigen… cancer… NA
#> 14 PPM000105 PGS000044 aggressive prostate cancer (Gleason scor… NA NA
#> 15 PPM000106 PGS000045 Breast cancer in BRCA1 mutation carriers Countr… NA
#> 16 PPM000107 PGS000045 Breast cancer in BRCA2 mutation carriers Countr… NA
#> 17 PPM000120 PGS000045 Breast cancer in male carriers of BRCA1/… 3 PCs … PGS pr…
#> 18 PPM002155 PGS000045 Breast cancer in CHEK2 mutation carriers Year o… Only 8…
#> 19 PPM002156 PGS000045 Breast cancer in CHEK2 mutation carriers Year o… Only 8…
#> 20 PPM002157 PGS000045 Breast cancer in CHEK2 mutation carriers… Year o… Only 8…
#> 21 PPM002158 PGS000045 Breast cancer in CHEK2 mutation carriers… Year o… Only 8…
#> 22 PPM002159 PGS000045 Breast cancer in CHEK2 mutation carriers… Year o… Only 8…
#> 23 PPM014912 PGS000045 Breast cancer in BRAC1 PV carriers NA effect…
#> 24 PPM000108 PGS000046 Breast cancer in BRCA1 mutation carriers Countr… NA
#> 25 PPM000109 PGS000046 Breast cancer in BRCA2 mutation carriers Countr… NA
#> 26 PPM000121 PGS000046 Breast cancer in male carriers of BRCA1/… 3 PCs … PGS pr…
#> # … with abbreviated variable names ¹covariates, ²comments