An S4 class to represent a set of PGS Catalog Sample Sets
Source:R/class-sample_sets.R
sample_sets-class.Rd
The sample_sets object consists of four tables (slots) that combined form a
relational database of a subset of PGS Catalog sample sets. Each sample set
is an observation (row) in the sample_sets
table (first table).
Slots
sample_sets
A table of sample sets. Each sample set (row) is uniquely identified by the column
pss_id
. Columns:- pss_id
A PGS Sample Set identifier. Example:
"PSS000042"
.
samples
A table of samples. Each sample (row) is uniquely identified by the combination of values from the columns:
pss_id
andsample_id
. Columns:- pss_id
A PGS Sample Set identifier. Example:
"PSS000042"
.- sample_id
Sample identifier. This is a surrogate key to identify each sample.
- stage
Sample stage: should be always Evaluation (
"eval"
).- sample_size
Number of individuals included in the sample.
- sample_cases
Number of cases.
- sample_controls
Number of controls.
- sample_percent_male
Percentage of male participants.
- phenotype_description
Detailed phenotype description.
- ancestry_category
Author reported ancestry is mapped to the best matching ancestry category from the NHGRI-EBI GWAS Catalog framework (see
ancestry_categories
) for possible values.- ancestry
A more detailed description of sample ancestry that usually matches the most specific description described by the authors (e.g. French, Chinese).
- country
Author reported countries of recruitment (if available).
- ancestry_additional_description
Any additional description not captured in the other columns (e.g. founder or genetically isolated populations, or further description of admixed samples).
- study_id
Associated GWAS Catalog study accession identifier, e.g.,
"GCST002735"
.- pubmed_id
PubMed identifier.
- cohorts_additional_description
Any additional description about the samples (e.g. sub-cohort information).
demographics
A table of sample demographics' variables. Each demographics' variable (row) is uniquely identified by the combination of values from the columns:
pss_id
,sample_id
, andvariable
. Columns:- pss_id
A PGS Sample Set identifier. Example:
"PSS000042"
.- sample_id
Sample identifier. This is a surrogate identifier to identify each sample.
- variable
Demographics variable. Following columns report about the indicated variable.
- estimate_type
Type of statistical estimate for variable.
- estimate
The variable's statistical value.
- unit
Unit of the variable.
- variability_type
Measure of statistical dispersion for variable, e.g. standard error (se) or standard deviation (sd).
- variability
The value of the measure of dispersion.
- interval_type
Type of statistical interval for variable: range, iqr (interquartile), ci (confidence interval).
- interval_lower
Interval lower bound.
- interval_upper
Interval upper bound.
cohorts
A table of cohorts. Each cohort (row) is uniquely identified by the combination of values from the columns:
pss_id
,sample_id
andcohort_symbol
. Columns:- pss_id
A PGS Sample Set identifier. Example:
"PSS000042"
.- sample_id
Sample identifier. This is a surrogate key to identify each sample.
- cohort_symbol
Cohort symbol.
- cohort_name
Cohort full name.