polygenic - the polygenic scores toolkit
Index
Summary
Polygenic is a toolkit for a wide range of polygenic scores analysis tasks. The most important use cases include computing scores for samples in vcf files, building scores for GWAS results or fetching scores from repositories.
Installation
Installation using pip
pip3 install polygenic
In conda environment
Run conda image
docker run -it conda/miniconda3 /bin/bash
Create python3.8 environment and install polygenic
yes | conda create --name py38 python=3.8
eval "$(conda shell.bash hook)"
conda activate py38
### should be 3.8
python --version
### gcc is missing to build pytabix
apt -qq update
apt -y install build-essential
pip install polygenic
Manual
Tools
pgs
polygenic --vcf [your_vcf_gz] --model [your_model] [other raguments]
Arguments
Required
-
--vcf
vcf.gz file with genotypes (tabix index should be available) -
--model
path to model file
Optional
-
--log_file
log file -
--out_dir
directory for result jsons -
--population
population code -
--models_path
path to a directory containing models -
--af
an indexed vcf.gz file containing allele freq data -
--version
prints version of package
Building models in yml
Index: Model structure Model types Parameters
Model structure
Core structure
Models have two properties which is model
and description
. model
is a specification of computation to be performed and description
is additional information to be included in the result.
model:
description:
Object keys
Each object that is not collection has a set of predefined keys (required or optional) that can be used for computation. For example: diplotype_model
object has a required diplotypes
key.
diplotype_model:
diplotypes:
The computation is first delegated to key specified objects and later aggregated by the top level object itself.
Collections
There is special category of objects that don't have predefined keys but are collections. Each key within collection becomes element of collection. Collections are easy to recognize, because they are specified in plural form like diplotypes
or variants
. Each element of collection will be defined as singular object of collection type. For example key in variants
collection will becomes objects of variant
type.
variants:
rs7041: {diplotype: C/C}
rs4588: {diplotype: T/T}
Variants
Variants can be identified by rsid. Variant value will be computed basing on information provided: diplotype
or effect_allele
.
Accepted sets of fields are:
- diplotypes
diplotype
symbol
- score
effect_allele
effect_size
symbol
Model types
There are currently implemented four types of models:
score_model
diplotype_model
haplotype_model
-
formula_model
The type of model can be specified at the top of yml structure or within themodel
field.
Specification of model type at the top of yml structure
diplotype_model:
description:
model
field
Specification of model type within the model:
diplotype_model:
description:
Parameters
External parameters can be used in formula_model
through @parameters
keyword.
Example parameters file in .json
format:
{"sex": "F"}
Path to file can be provided as argument to polygenic tool:
--parameters /path/to/parameters.json
Example of use of parameters in the formula_model
:
formula_model:
formula:
value: "@female.score_model.value if @parameters.sex == 'F' else @male.score_model.value"
male:
score_model:
variants:
...
female:
score_model:
variants:
Example models
Example diplotype model
This example diplotype model is based on Randolph 2014.
diplotype_model:
diplotypes:
1/1:
variants:
rs7041: {diplotype: C/C}
rs4588: {diplotype: T/T}
1/1s:
variants:
rs7041: {diplotype: C/C}
rs4588: {diplotype: T/G}
1/1f:
variants:
rs7041: {diplotype: C/A}
rs4588: {diplotype: T/G}
1/2:
variants:
rs7041: {diplotype: C/A}
rs4588: {diplotype: T/T}
1s/1s:
variants:
rs7041: {diplotype: C/C}
rs4588: {diplotype: G/G}
1s/1f:
variants:
rs7041: {diplotype: C/A}
rs4588: {diplotype: G/G}
1s/2:
variants:
rs7041: {diplotype: C/A}
rs4588: {diplotype: G/T}
1f/1f:
variants:
rs7041: {diplotype: A/A}
rs4588: {diplotype: G/G}
1f/2:
variants:
rs7041: {diplotype: A/A}
rs4588: {diplotype: G/T}
2/2:
variants:
rs7041: {diplotype: A/A}
rs4588: {diplotype: T/T}
description:
pmid: 24447085
genes: [GC]
result_diplotype_choice:
1/1: Moderate
1/1s: High
1/1f: High
1/2: Low
1s/1s: Very high
1s/1f: Very high
1s/2: Moderate
1f/1f: Very high
1f/2: Moderate
2/2: Very low
Example haplotype model
haplotype_model:
variants:
CYP2D6*1.001:
CYP2D6*1.002:
22-42126963-C-T: {ref: "C", alt: "T", effect_allele: "T"}
CYP2D6*1.003:
22-42128813-G-A: {ref: "G", alt: "A", effect_allele: "A"}
CYP2D6*1.004:
22-42128216-G-T: {ref: "G", alt: "T", effect_allele: "T"}
CYP2D6*1.005:
22-42128922-A-G: {ref: "A", alt: "G", effect_allele: "G"}
CYP2D6*1.006:
22-42129726-A-C: {ref: "A", alt: "C", effect_allele: "C"}
22-42129950-A-C: {ref: "A", alt: "C", effect_allele: "C"}
22-42130482-C-A: {ref: "C", alt: "A", effect_allele: "A"}
Example score model with categories rescaling
score_model:
variants:
rs10012: {effect_allele: G, effect_size: 0.369215857410143}
rs1014971: {effect_allele: T, effect_size: 0.075546961392531}
rs10936599: {effect_allele: C, effect_size: 0.086359830674748}
rs11892031: {effect_allele: C, effect_size: -0.552841968657781}
rs1495741: {effect_allele: A, effect_size: 0.05307844348342}
rs17674580: {effect_allele: C, effect_size: 0.187520720836463}
rs2294008: {effect_allele: T, effect_size: 0.08278537031645}
rs798766: {effect_allele: T, effect_size: 0.093421685162235}
rs9642880: {effect_allele: G, effect_size: 0.093421685162235}
categories:
High risk: {from: 1.371624087, to: 2.581880425, scale_from: 2, scale_to: 3}
Potential risk: {from: 1.169616034, to: 1.371624087, scale_from: 1, scale_to: 2}
Average risk: {from: -0.346748358, to: 1.169616034, scale_from: 0, scale_to: 1}
Low risk: {from: -1.657132197, to: -0.346748358, scale_from: -1, scale_to: 0}
description:
about:
genes: []
result_statement_choice:
Average risk: Avg
Potential risk: Pot
High risk: Hig
Low risk: Low
science_behind_the_test:
test_type: Polygenic Risk Score
trait: Breast cancer
trait_authors:
- taken from the PGS catalog
trait_copyright: Intelliseq all rights reserved
trait_explained: None
trait_heritability: None
trait_pgs_id: PGS000001
trait_pmids:
- 25855707
trait_snp_heritability: None
trait_title: Breast_Cancer
trait_version: 1.0
what_you_can_do_choice:
Average risk:
High risk:
Low risk:
what_your_result_means_choice:
Average risk:
High risk:
Low risk:
Example Formula Model
formula_model:
formula:
brownexp: "math.exp(@brown.score_model.value - 2.0769)"
redexp: "math.exp(@red.score_model.value - 6.3953)"
blackexp: "math.exp(@black.score_model.value - 2.4029)"
sumexp: "@brownexp + @redexp + @blackexp"
brown_prob: "@brownexp / (1 + @sumexp)"
red_prob: "@redexp / (1 + @sumexp)"
black_prob: "@blackexp / (1 + @sumexp)"
blonde_prob: "1 - (@brown_prob + @red_prob + @black_prob)"
brown:
score_model:
variants:
rs796296176: {effect_allele: CA, effect_size: 1.2522}
rs11547464: {effect_allele: A, effect_size: -0.61155}
rs885479: {effect_allele: T, effect_size: 0.2937}
rs1805008: {effect_allele: T, effect_size: -0.50143}
rs1805005: {effect_allele: T, effect_size: 0.21172}
rs1805006: {effect_allele: A, effect_size: 1.9293}
rs1805007: {effect_allele: T, effect_size: -0.32318}
rs1805009: {effect_allele: C, effect_size: 0.60861}
rs1805009: {effect_allele: A, effect_size: 0.25624}
rs2228479: {effect_allele: A, effect_size: -0.054143}
rs1110400: {effect_allele: C, effect_size: -0.56315}
rs28777: {effect_allele: C, effect_size: 0.52168}
rs16891982: {effect_allele: C, effect_size: 0.75284}
rs12821256: {effect_allele: G, effect_size: -0.34957}
rs4959270: {effect_allele: A, effect_size: -0.19171}
rs12203592: {effect_allele: T, effect_size: 1.6475}
rs1042602: {effect_allele: T, effect_size: 0.16092}
rs1800407: {effect_allele: A, effect_size: -0.19111}
rs2402130: {effect_allele: G, effect_size: 0.35821}
rs12913832: {effect_allele: T, effect_size: 1.214}
rs2378249: {effect_allele: C, effect_size: 0.12669}
rs683: {effect_allele: C, effect_size: 0.21172}
red:
score_model:
variants:
rs796296176: {effect_allele: CA, effect_size: 25.508}
rs11547464: {effect_allele: A, effect_size: 2.5381}
rs885479: {effect_allele: T, effect_size: -0.20889}
rs1805008: {effect_allele: T, effect_size: 2.801}
rs1805005: {effect_allele: T, effect_size: 0.93493}
rs1805006: {effect_allele: A, effect_size: 3.65}
rs1805007: {effect_allele: T, effect_size: 3.4408}
rs1805009: {effect_allele: C, effect_size: 4.5868}
rs1805009: {effect_allele: A, effect_size: 22.107}
rs2228479: {effect_allele: A, effect_size: 0.62307}
rs1110400: {effect_allele: C, effect_size: 1.4453}
rs28777: {effect_allele: C, effect_size: 0.70401}
rs16891982: {effect_allele: C, effect_size: -0.41869}
rs12821256: {effect_allele: G, effect_size: -0.57964}
rs4959270: {effect_allele: A, effect_size: 0.24861}
rs12203592: {effect_allele: T, effect_size: 0.90233}
rs1042602: {effect_allele: T, effect_size: 0.45003}
rs1800407: {effect_allele: A, effect_size: -0.27606}
rs2402130: {effect_allele: G, effect_size: 0.28313}
rs12913832: {effect_allele: T, effect_size: -0.093776}
rs2378249: {effect_allele: C, effect_size: 0.76634}
rs683: {effect_allele: C, effect_size: -0.053427}
black:
score_model:
variants:
rs796296176: {effect_allele: CA, effect_size: 2.732}
rs11547464: {effect_allele: A, effect_size: -16.969}
rs885479: {effect_allele: T, effect_size: 0.39983}
rs1805008: {effect_allele: T, effect_size: -0.86062}
rs1805005: {effect_allele: T, effect_size: -0.0029013}
rs1805006: {effect_allele: A, effect_size: -16.088}
rs1805007: {effect_allele: T, effect_size: -1.3757}
rs1805009: {effect_allele: C, effect_size: 0.060631}
rs1805009: {effect_allele: A, effect_size: 3.9824}
rs2228479: {effect_allele: A, effect_size: 0.17012}
rs1110400: {effect_allele: C, effect_size: 0.29143}
rs28777: {effect_allele: C, effect_size: 0.82228}
rs16891982: {effect_allele: C, effect_size: 1.1617}
rs12821256: {effect_allele: G, effect_size: -0.89824}
rs4959270: {effect_allele: A, effect_size: -0.36359}
rs12203592: {effect_allele: T, effect_size: 1.997}
rs1042602: {effect_allele: T, effect_size: 0.065432}
rs1800407: {effect_allele: A, effect_size: -0.49601}
rs2402130: {effect_allele: G, effect_size: 0.26536}
rs12913832: {effect_allele: T, effect_size: 1.9391}
rs2378249: {effect_allele: C, effect_size: -0.089509}
rs683: {effect_allele: C, effect_size: 0.15796}
description:
name: HirisPlex
Description
Model keys glossary
-
model
- generic model that can aggregate results of other model types -
diplotype_model
Required keys:diplotypes
-
description
- all properties to be included in the final results
Updates
2.0.0
- switched to yaml model definitions
- implemented formula, score, haplotype and diplotype model types
- prepared docker image with resources for polygenicmaker
- added gene symbols to description <<<<<<< HEAD
2.0.27
- improved parsing of effect_size (whitespaces and quotes are accepted now)
2.0.28
-
./.
alleles are now treated as missing genotypes