Skip to contents

PGS performance metrics

Performance metrics assess the validity of a PGS in a Sample Set. This assessment is performed on samples not used for score development.

Performance metrics are retrieved with the function get_performance_metrics(). The returned data is provided as an S4 object of class performance_metrics.

Common metrics include:

  • standardized effect sizes: odds ratios or hazard ratios, and regression coefficients \(\beta\), see slot pgs_effect_sizes;
  • classification accuracy metrics: area under the receiver operating characteristic curve, C-index and area under the precision-recall curve, see slot pgs_classification_metrics;
  • other relevant metrics: calibration (\(\chi^2\))), see slot pgs_other_metrics.

The covariates used in the model (most commonly age, sex and genetic principal components to account for the population structure) are also recorded for each set of metrics. These can be found in the slot demographics.

Getting PGS performance metrics

In the PGS Catalog, performance metrics have been catalogued and have an associated identifier that starts with the prefix "PPM". To retrieve the performance metrics associated with one assessment of a polygenic score, you can use directly its identifier:

library(quincunx)

get_performance_metrics(ppm_id = 'PPM000001')
#> An object of class "performance_metrics"
#> Slot "performance_metrics":
#> # A tibble: 1 × 5
#>   ppm_id    pgs_id    reported_trait    covariates comments
#>   <chr>     <chr>     <chr>             <chr>      <chr>   
#> 1 PPM000001 PGS000001 All breast cancer NA         NA      
#> 
#> Slot "publications":
#> # A tibble: 1 × 8
#>   ppm_id    pgp_id    pubmed_id publication_date publication title autho…¹ doi  
#>   <chr>     <chr>     <chr>     <date>           <chr>       <chr> <chr>   <chr>
#> 1 PPM000001 PGP000001 25855707  2015-04-08       J Natl Can… Pred… Mavadd… 10.1…
#> # … with abbreviated variable name ¹​author_fullname
#> 
#> Slot "sample_sets":
#> # A tibble: 1 × 2
#>   ppm_id    pss_id   
#>   <chr>     <chr>    
#> 1 PPM000001 PSS000001
#> 
#> Slot "samples":
#> # A tibble: 1 × 16
#>   ppm_id    pss_id sampl…¹ stage sampl…² sampl…³ sampl…⁴ sampl…⁵ pheno…⁶ ances…⁷
#>   <chr>     <chr>    <int> <chr>   <int>   <int>   <int>   <dbl> <chr>   <chr>  
#> 1 PPM000001 PSS00…       1 eval    67054   33673   33381      NA All br… Europe…
#> # … with 6 more variables: ancestry <chr>, country <chr>,
#> #   ancestry_additional_description <chr>, study_id <chr>, pubmed_id <chr>,
#> #   cohorts_additional_description <chr>, and abbreviated variable names
#> #   ¹​sample_id, ²​sample_size, ³​sample_cases, ⁴​sample_controls,
#> #   ⁵​sample_percent_male, ⁶​phenotype_description, ⁷​ancestry_category
#> # ℹ Use `colnames()` to see all variable names
#> 
#> Slot "demographics":
#> # A tibble: 0 × 12
#> # … with 12 variables: ppm_id <chr>, pss_id <chr>, sample_id <int>,
#> #   variable <chr>, estimate_type <chr>, estimate <dbl>, unit <chr>,
#> #   variability_type <chr>, variability <dbl>, interval_type <chr>,
#> #   interval_lower <dbl>, interval_upper <dbl>
#> # ℹ Use `colnames()` to see all variable names
#> 
#> Slot "cohorts":
#> # A tibble: 33 × 5
#>    ppm_id    pss_id    sample_id cohort_symbol cohort_name                      
#>    <chr>     <chr>         <int> <chr>         <chr>                            
#>  1 PPM000001 PSS000001         1 ABCFS         Australian Breast Cancer Family …
#>  2 PPM000001 PSS000001         1 MCCS          Melbourne Collaborative Cohort S…
#>  3 PPM000001 PSS000001         1 HMBCS         Hannover-Minsk Breast Cancer Stu…
#>  4 PPM000001 PSS000001         1 LMBC          Leuven Multidisciplinary Breast …
#>  5 PPM000001 PSS000001         1 MTLGEBCS      Montreal Gene-Environment Breast…
#>  6 PPM000001 PSS000001         1 CGPS          Copenhagen General Population St…
#>  7 PPM000001 PSS000001         1 KBCP          Kuopio Breast Cancer Project     
#>  8 PPM000001 PSS000001         1 OBCS          Oulu Breast Cancer Study         
#>  9 PPM000001 PSS000001         1 CECILE        CECILE Breast Cancer Study       
#> 10 PPM000001 PSS000001         1 BBCC          Bavarian Breast Cancer Cases and…
#> # … with 23 more rows
#> # ℹ Use `print(n = ...)` to see more rows
#> 
#> Slot "pgs_effect_sizes":
#> # A tibble: 1 × 11
#>   ppm_id   effec…¹ estim…² estim…³ estim…⁴ unit  varia…⁵ varia…⁶ inter…⁷ inter…⁸
#>   <chr>      <int> <chr>   <chr>     <dbl> <chr> <chr>     <dbl> <chr>     <dbl>
#> 1 PPM0000…       1 Odds R… OR         1.55 NA    se           NA ci         1.52
#> # … with 1 more variable: interval_upper <dbl>, and abbreviated variable names
#> #   ¹​effect_size_id, ²​estimate_type_long, ³​estimate_type, ⁴​estimate,
#> #   ⁵​variability_type, ⁶​variability, ⁷​interval_type, ⁸​interval_lower
#> # ℹ Use `colnames()` to see all variable names
#> 
#> Slot "pgs_classification_metrics":
#> # A tibble: 1 × 11
#>   ppm_id   class…¹ estim…² estim…³ estim…⁴ unit  varia…⁵ varia…⁶ inter…⁷ inter…⁸
#>   <chr>      <int> <chr>   <chr>     <dbl> <chr> <chr>     <dbl> <chr>     <dbl>
#> 1 PPM0000…       1 Concor… C-index   0.622 NA    se           NA ci        0.619
#> # … with 1 more variable: interval_upper <dbl>, and abbreviated variable names
#> #   ¹​classification_metrics_id, ²​estimate_type_long, ³​estimate_type, ⁴​estimate,
#> #   ⁵​variability_type, ⁶​variability, ⁷​interval_type, ⁸​interval_lower
#> # ℹ Use `colnames()` to see all variable names
#> 
#> Slot "pgs_other_metrics":
#> # A tibble: 0 × 11
#> # … with 11 variables: ppm_id <chr>, other_metrics_id <int>,
#> #   estimate_type_long <chr>, estimate_type <chr>, estimate <dbl>, unit <chr>,
#> #   variability_type <chr>, variability <dbl>, interval_type <chr>,
#> #   interval_lower <dbl>, interval_upper <dbl>
#> # ℹ Use `colnames()` to see all variable names

Searching by PGS identifier

Alternatively, you could also search by the associated PGS identifier, i.e. "PGS000001":

get_performance_metrics(pgs_id = 'PGS000001')
#> An object of class "performance_metrics"
#> Slot "performance_metrics":
#> # A tibble: 13 × 5
#>    ppm_id    pgs_id    reported_trait                            covar…¹ comme…²
#>    <chr>     <chr>     <chr>                                     <chr>   <chr>  
#>  1 PPM000001 PGS000001 All breast cancer                         NA      NA     
#>  2 PPM000011 PGS000001 Invasive breast cancer                    study,… NA     
#>  3 PPM000114 PGS000001 Breast cancer in BRCA1 mutation carriers  Countr… NA     
#>  4 PPM000117 PGS000001 Breast cancer in BRCA2 mutation carriers  Countr… NA     
#>  5 PPM000944 PGS000001 Metachronous contralateral breast cancer  Country NA     
#>  6 PPM000945 PGS000001 Invasive metachronous contralateral brea… Country NA     
#>  7 PPM000961 PGS000001 Metachronous contralateral breast cancer  Country NA     
#>  8 PPM000962 PGS000001 Invasive metachronous contralateral brea… Country NA     
#>  9 PPM002150 PGS000001 Breast cancer in CHEK2 mutation carriers  Year o… Only 7…
#> 10 PPM002151 PGS000001 Breast cancer in CHEK2 mutation carriers  Year o… Only 7…
#> 11 PPM002152 PGS000001 Breast cancer in CHEK2 mutation carriers… Year o… Only 7…
#> 12 PPM002153 PGS000001 Breast cancer in CHEK2 mutation carriers… Year o… Only 7…
#> 13 PPM002154 PGS000001 Breast cancer in CHEK2 mutation carriers… Year o… Only 7…
#> # … with abbreviated variable names ¹​covariates, ²​comments
#> 
#> Slot "publications":
#> # A tibble: 13 × 8
#>    ppm_id    pgp_id    pubmed_id publication_date publicat…¹ title autho…² doi  
#>    <chr>     <chr>     <chr>     <date>           <chr>      <chr> <chr>   <chr>
#>  1 PPM000001 PGP000001 25855707  2015-04-08       J Natl Ca… Pred… Mavadd… 10.1…
#>  2 PPM000011 PGP000002 30554720  2018-12-13       Am J Hum … Poly… Mavadd… 10.1…
#>  3 PPM000114 PGP000033 28376175  2017-07-01       J Natl Ca… Eval… Kuchen… 10.1…
#>  4 PPM000117 PGP000033 28376175  2017-07-01       J Natl Ca… Eval… Kuchen… 10.1…
#>  5 PPM000944 PGP000109 33022221  2020-10-05       Am J Hum … Brea… Kramer… 10.1…
#>  6 PPM000945 PGP000109 33022221  2020-10-05       Am J Hum … Brea… Kramer… 10.1…
#>  7 PPM000961 PGP000109 33022221  2020-10-05       Am J Hum … Brea… Kramer… 10.1…
#>  8 PPM000962 PGP000109 33022221  2020-10-05       Am J Hum … Brea… Kramer… 10.1…
#>  9 PPM002150 PGP000198 33372680  2020-12-29       J Natl Ca… Perf… Borde J 10.1…
#> 10 PPM002151 PGP000198 33372680  2020-12-29       J Natl Ca… Perf… Borde J 10.1…
#> 11 PPM002152 PGP000198 33372680  2020-12-29       J Natl Ca… Perf… Borde J 10.1…
#> 12 PPM002153 PGP000198 33372680  2020-12-29       J Natl Ca… Perf… Borde J 10.1…
#> 13 PPM002154 PGP000198 33372680  2020-12-29       J Natl Ca… Perf… Borde J 10.1…
#> # … with abbreviated variable names ¹​publication, ²​author_fullname
#> 
#> Slot "sample_sets":
#> # A tibble: 13 × 2
#>    ppm_id    pss_id   
#>    <chr>     <chr>    
#>  1 PPM000001 PSS000001
#>  2 PPM000011 PSS000004
#>  3 PPM000114 PSS000070
#>  4 PPM000117 PSS000071
#>  5 PPM000944 PSS000484
#>  6 PPM000945 PSS000486
#>  7 PPM000961 PSS000484
#>  8 PPM000962 PSS000486
#>  9 PPM002150 PSS001054
#> 10 PPM002151 PSS001054
#> 11 PPM002152 PSS001054
#> 12 PPM002153 PSS001054
#> 13 PPM002154 PSS001054
#> 
#> Slot "samples":
#> # A tibble: 13 × 16
#>    ppm_id   pss_id sampl…¹ stage sampl…² sampl…³ sampl…⁴ sampl…⁵ pheno…⁶ ances…⁷
#>    <chr>    <chr>    <int> <chr>   <int>   <int>   <int>   <dbl> <chr>   <chr>  
#>  1 PPM0000… PSS00…       1 eval    67054   33673   33381      NA "All b… Europe…
#>  2 PPM0000… PSS00…       1 eval    29751   11428   18323       0 "Invas… Europe…
#>  3 PPM0001… PSS00…       1 eval    15252    7797    7455       0 "BRCA1… Europe…
#>  4 PPM0001… PSS00…       1 eval     8211    4330    3881       0 "BRCA2… Europe…
#>  5 PPM0009… PSS00…       1 eval    56068    1027   55041       0 "Women… Europe…
#>  6 PPM0009… PSS00…       1 eval    56068     923   55145       0 "Women… Europe…
#>  7 PPM0009… PSS00…       1 eval    56068    1027   55041       0 "Women… Europe…
#>  8 PPM0009… PSS00…       1 eval    56068     923   55145       0 "Women… Europe…
#>  9 PPM0021… PSS00…       1 eval      760     561     199       0 "All w… Europe…
#> 10 PPM0021… PSS00…       1 eval      760     561     199       0 "All w… Europe…
#> 11 PPM0021… PSS00…       1 eval      760     561     199       0 "All w… Europe…
#> 12 PPM0021… PSS00…       1 eval      760     561     199       0 "All w… Europe…
#> 13 PPM0021… PSS00…       1 eval      760     561     199       0 "All w… Europe…
#> # … with 6 more variables: ancestry <chr>, country <chr>,
#> #   ancestry_additional_description <chr>, study_id <chr>, pubmed_id <chr>,
#> #   cohorts_additional_description <chr>, and abbreviated variable names
#> #   ¹​sample_id, ²​sample_size, ³​sample_cases, ⁴​sample_controls,
#> #   ⁵​sample_percent_male, ⁶​phenotype_description, ⁷​ancestry_category
#> # ℹ Use `colnames()` to see all variable names
#> 
#> Slot "demographics":
#> # A tibble: 0 × 12
#> # … with 12 variables: ppm_id <chr>, pss_id <chr>, sample_id <int>,
#> #   variable <chr>, estimate_type <chr>, estimate <dbl>, unit <chr>,
#> #   variability_type <chr>, variability <dbl>, interval_type <chr>,
#> #   interval_lower <dbl>, interval_upper <dbl>
#> # ℹ Use `colnames()` to see all variable names
#> 
#> Slot "cohorts":
#> # A tibble: 218 × 5
#>    ppm_id    pss_id    sample_id cohort_symbol cohort_name                      
#>    <chr>     <chr>         <int> <chr>         <chr>                            
#>  1 PPM000001 PSS000001         1 ABCFS         Australian Breast Cancer Family …
#>  2 PPM000001 PSS000001         1 MCCS          Melbourne Collaborative Cohort S…
#>  3 PPM000001 PSS000001         1 HMBCS         Hannover-Minsk Breast Cancer Stu…
#>  4 PPM000001 PSS000001         1 LMBC          Leuven Multidisciplinary Breast …
#>  5 PPM000001 PSS000001         1 MTLGEBCS      Montreal Gene-Environment Breast…
#>  6 PPM000001 PSS000001         1 CGPS          Copenhagen General Population St…
#>  7 PPM000001 PSS000001         1 KBCP          Kuopio Breast Cancer Project     
#>  8 PPM000001 PSS000001         1 OBCS          Oulu Breast Cancer Study         
#>  9 PPM000001 PSS000001         1 CECILE        CECILE Breast Cancer Study       
#> 10 PPM000001 PSS000001         1 BBCC          Bavarian Breast Cancer Cases and…
#> # … with 208 more rows
#> # ℹ Use `print(n = ...)` to see more rows
#> 
#> Slot "pgs_effect_sizes":
#> # A tibble: 13 × 11
#>    ppm_id  effec…¹ estim…² estim…³ estim…⁴ unit  varia…⁵ varia…⁶ inter…⁷ inter…⁸
#>    <chr>     <int> <chr>   <chr>     <dbl> <chr> <chr>     <dbl> <chr>     <dbl>
#>  1 PPM000…       1 Odds R… OR         1.55 NA    se           NA ci         1.52
#>  2 PPM000…       1 Odds R… OR         1.46 NA    se           NA ci         1.42
#>  3 PPM000…       1 Hazard… HR         1.13 NA    se           NA ci         1.1 
#>  4 PPM000…       1 Hazard… HR         1.22 NA    se           NA ci         1.17
#>  5 PPM000…       1 Hazard… HR         1.21 NA    se           NA ci         1.14
#>  6 PPM000…       1 Hazard… HR         1.21 NA    se           NA ci         1.13
#>  7 PPM000…       1 Hazard… HR         1.21 NA    se           NA ci         1.14
#>  8 PPM000…       1 Hazard… HR         1.21 NA    se           NA ci         1.13
#>  9 PPM002…       1 Hazard… HR         1.71 NA    se           NA ci         1.36
#> 10 PPM002…       1 Hazard… HR         2.29 NA    se           NA ci         1.56
#> 11 PPM002…       1 Hazard… HR         1.43 NA    se           NA ci         1.04
#> 12 PPM002…       1 Hazard… HR         2.32 NA    se           NA ci         1.69
#> 13 PPM002…       1 Hazard… HR         1.59 NA    se           NA ci         1.07
#> # … with 1 more variable: interval_upper <dbl>, and abbreviated variable names
#> #   ¹​effect_size_id, ²​estimate_type_long, ³​estimate_type, ⁴​estimate,
#> #   ⁵​variability_type, ⁶​variability, ⁷​interval_type, ⁸​interval_lower
#> # ℹ Use `colnames()` to see all variable names
#> 
#> Slot "pgs_classification_metrics":
#> # A tibble: 2 × 11
#>   ppm_id   class…¹ estim…² estim…³ estim…⁴ unit  varia…⁵ varia…⁶ inter…⁷ inter…⁸
#>   <chr>      <int> <chr>   <chr>     <dbl> <chr> <chr>     <dbl> <chr>     <dbl>
#> 1 PPM0000…       1 Concor… C-index   0.622 NA    se           NA ci        0.619
#> 2 PPM0000…       1 Area U… AUROC     0.603 NA    se           NA ci       NA    
#> # … with 1 more variable: interval_upper <dbl>, and abbreviated variable names
#> #   ¹​classification_metrics_id, ²​estimate_type_long, ³​estimate_type, ⁴​estimate,
#> #   ⁵​variability_type, ⁶​variability, ⁷​interval_type, ⁸​interval_lower
#> # ℹ Use `colnames()` to see all variable names
#> 
#> Slot "pgs_other_metrics":
#> # A tibble: 0 × 11
#> # … with 11 variables: ppm_id <chr>, other_metrics_id <int>,
#> #   estimate_type_long <chr>, estimate_type <chr>, estimate <dbl>, unit <chr>,
#> #   variability_type <chr>, variability <dbl>, interval_type <chr>,
#> #   interval_lower <dbl>, interval_upper <dbl>
#> # ℹ Use `colnames()` to see all variable names

As you can see, when you search by 'PGS000001', we get multiple PPM identifiers (PPM000001 included). This is because a PGS could have been assessed multiple independent times, each assessment resulting in its own performance metrics data entry, with its own associated identifier.

The function get_performance_metrics() is vectorised over ppm_id and pgs_id and you could readily retrieve performance metrics for a set of polygenic scores by providing a vector of identifiers (e.g. PGSes 42 thru 46):

ppm <- get_performance_metrics(pgs_id = sprintf("PGS%06d", 42:46))
print(ppm@performance_metrics, n = Inf)
#> # A tibble: 26 × 5
#>    ppm_id    pgs_id    reported_trait                            covar…¹ comme…²
#>    <chr>     <chr>     <chr>                                     <chr>   <chr>  
#>  1 PPM000101 PGS000042 Coeliac disease in HLA-DQ2.5 carriers     NA      NA     
#>  2 PPM000102 PGS000043 Venous thromboembolism                    age, s… NA     
#>  3 PPM000103 PGS000043 Venous thromboembolism                    age, 1… NA     
#>  4 PPM001639 PGS000043 Thromboembolic disease event in individu… Age at… Includ…
#>  5 PPM001640 PGS000043 Thromboembolic disease event in individu… Diseas… Includ…
#>  6 PPM001641 PGS000043 Thromboembolic disease event in in indiv… Age at… Includ…
#>  7 PPM001939 PGS000043 Venous Thromboembolism                    Age, s… 273 of…
#>  8 PPM001940 PGS000043 Venous Thromboembolism                    Age, s… 273 of…
#>  9 PPM001941 PGS000043 Venous Thromboembolism                    NA      273 of…
#> 10 PPM001942 PGS000043 Venous Thromboembolism                    Age, o… 273 of…
#> 11 PPM001943 PGS000043 Venous Thromboembolism in individuals wi… Age, s… 273 of…
#> 12 PPM001944 PGS000043 Venous Thromboembolism in individuals wi… Age, s… 273 of…
#> 13 PPM000104 PGS000044 Elevated serum prostate-specific antigen… cancer… NA     
#> 14 PPM000105 PGS000044 aggressive prostate cancer (Gleason scor… NA      NA     
#> 15 PPM000106 PGS000045 Breast cancer in BRCA1 mutation carriers  Countr… NA     
#> 16 PPM000107 PGS000045 Breast cancer in BRCA2 mutation carriers  Countr… NA     
#> 17 PPM000120 PGS000045 Breast cancer in male carriers of BRCA1/… 3 PCs … PGS pr…
#> 18 PPM002155 PGS000045 Breast cancer in CHEK2 mutation carriers  Year o… Only 8…
#> 19 PPM002156 PGS000045 Breast cancer in CHEK2 mutation carriers  Year o… Only 8…
#> 20 PPM002157 PGS000045 Breast cancer in CHEK2 mutation carriers… Year o… Only 8…
#> 21 PPM002158 PGS000045 Breast cancer in CHEK2 mutation carriers… Year o… Only 8…
#> 22 PPM002159 PGS000045 Breast cancer in CHEK2 mutation carriers… Year o… Only 8…
#> 23 PPM014912 PGS000045 Breast cancer in BRAC1 PV carriers        NA      effect…
#> 24 PPM000108 PGS000046 Breast cancer in BRCA1 mutation carriers  Countr… NA     
#> 25 PPM000109 PGS000046 Breast cancer in BRCA2 mutation carriers  Countr… NA     
#> 26 PPM000121 PGS000046 Breast cancer in male carriers of BRCA1/… 3 PCs … PGS pr…
#> # … with abbreviated variable names ¹​covariates, ²​comments