Polygenic is a toolkit for a wide range of polygenic scores analysis tasks. The most important use cases include computing scores for samples in vcf files, building scores for GWAS results or fetching scores from repositories.
pip3 install polygenic
Run conda image
docker run -it conda/miniconda3 /bin/bash
Create python3.8 environment and install polygenic
yes | conda create --name py38 python=3.8
eval "$(conda shell.bash hook)"
conda activate py38
### should be 3.8
python --version
### gcc is missing to build pytabix
apt -qq update
apt -y install build-essential
pip install polygenic
polygenic --vcf [your_vcf_gz] --model [your_model] [other raguments]
-
--vcf
vcf.gz file with genotypes (tabix index should be available) -
--model
path to model file
-
--log_file
log file -
--out_dir
directory for result jsons -
--population
population code -
--models_path
path to a directory containing models -
--af
an indexed vcf.gz file containing allele freq data -
--version
prints version of package
Index: Model structure Model types Parameters
Models have two properties which is model
and description
. model
is a specification of computation to be performed and description
is additional information to be included in the result.
model:
description:
Each object that is not collection has a set of predefined keys (required or optional) that can be used for computation. For example: diplotype_model
object has a required diplotypes
key.
diplotype_model:
diplotypes:
The computation is first delegated to key specified objects and later aggregated by the top level object itself.
There is special category of objects that don't have predefined keys but are collections. Each key within collection becomes element of collection. Collections are easy to recognize, because they are specified in plural form like diplotypes
or variants
. Each element of collection will be defined as singular object of collection type. For example key in variants
collection will becomes objects of variant
type.
variants:
rs7041: {diplotype: C/C}
rs4588: {diplotype: T/T}
Variants can be identified by rsid. Variant value will be computed basing on information provided: diplotype
or effect_allele
.
Accepted sets of fields are:
- diplotypes
diplotype
symbol
- score
effect_allele
effect_size
symbol
There are currently implemented four types of models:
score_model
diplotype_model
haplotype_model
-
formula_model
The type of model can be specified at the top of yml structure or within themodel
field.
diplotype_model:
description:
model:
diplotype_model:
description:
External parameters can be used in formula_model
through @parameters
keyword.
Example parameters file in .json
format:
{"sex": "F"}
Path to file can be provided as argument to polygenic tool:
--parameters /path/to/parameters.json
Example of use of parameters in the formula_model
:
formula_model:
formula:
value: "@female.score_model.value if @parameters.sex == 'F' else @male.score_model.value"
male:
score_model:
variants:
...
female:
score_model:
variants:
This example diplotype model is based on Randolph 2014.
diplotype_model:
diplotypes:
1/1:
variants:
rs7041: {diplotype: C/C}
rs4588: {diplotype: T/T}
1/1s:
variants:
rs7041: {diplotype: C/C}
rs4588: {diplotype: T/G}
1/1f:
variants:
rs7041: {diplotype: C/A}
rs4588: {diplotype: T/G}
1/2:
variants:
rs7041: {diplotype: C/A}
rs4588: {diplotype: T/T}
1s/1s:
variants:
rs7041: {diplotype: C/C}
rs4588: {diplotype: G/G}
1s/1f:
variants:
rs7041: {diplotype: C/A}
rs4588: {diplotype: G/G}
1s/2:
variants:
rs7041: {diplotype: C/A}
rs4588: {diplotype: G/T}
1f/1f:
variants:
rs7041: {diplotype: A/A}
rs4588: {diplotype: G/G}
1f/2:
variants:
rs7041: {diplotype: A/A}
rs4588: {diplotype: G/T}
2/2:
variants:
rs7041: {diplotype: A/A}
rs4588: {diplotype: T/T}
description:
pmid: 24447085
genes: [GC]
result_diplotype_choice:
1/1: Moderate
1/1s: High
1/1f: High
1/2: Low
1s/1s: Very high
1s/1f: Very high
1s/2: Moderate
1f/1f: Very high
1f/2: Moderate
2/2: Very low
haplotype_model:
variants:
CYP2D6*1.001:
CYP2D6*1.002:
22-42126963-C-T: {ref: "C", alt: "T", effect_allele: "T"}
CYP2D6*1.003:
22-42128813-G-A: {ref: "G", alt: "A", effect_allele: "A"}
CYP2D6*1.004:
22-42128216-G-T: {ref: "G", alt: "T", effect_allele: "T"}
CYP2D6*1.005:
22-42128922-A-G: {ref: "A", alt: "G", effect_allele: "G"}
CYP2D6*1.006:
22-42129726-A-C: {ref: "A", alt: "C", effect_allele: "C"}
22-42129950-A-C: {ref: "A", alt: "C", effect_allele: "C"}
22-42130482-C-A: {ref: "C", alt: "A", effect_allele: "A"}
score_model:
variants:
rs10012: {effect_allele: G, effect_size: 0.369215857410143}
rs1014971: {effect_allele: T, effect_size: 0.075546961392531}
rs10936599: {effect_allele: C, effect_size: 0.086359830674748}
rs11892031: {effect_allele: C, effect_size: -0.552841968657781}
rs1495741: {effect_allele: A, effect_size: 0.05307844348342}
rs17674580: {effect_allele: C, effect_size: 0.187520720836463}
rs2294008: {effect_allele: T, effect_size: 0.08278537031645}
rs798766: {effect_allele: T, effect_size: 0.093421685162235}
rs9642880: {effect_allele: G, effect_size: 0.093421685162235}
categories:
High risk: {from: 1.371624087, to: 2.581880425, scale_from: 2, scale_to: 3}
Potential risk: {from: 1.169616034, to: 1.371624087, scale_from: 1, scale_to: 2}
Average risk: {from: -0.346748358, to: 1.169616034, scale_from: 0, scale_to: 1}
Low risk: {from: -1.657132197, to: -0.346748358, scale_from: -1, scale_to: 0}
description:
about:
genes: []
result_statement_choice:
Average risk: Avg
Potential risk: Pot
High risk: Hig
Low risk: Low
science_behind_the_test:
test_type: Polygenic Risk Score
trait: Breast cancer
trait_authors:
- taken from the PGS catalog
trait_copyright: Intelliseq all rights reserved
trait_explained: None
trait_heritability: None
trait_pgs_id: PGS000001
trait_pmids:
- 25855707
trait_snp_heritability: None
trait_title: Breast_Cancer
trait_version: 1.0
what_you_can_do_choice:
Average risk:
High risk:
Low risk:
what_your_result_means_choice:
Average risk:
High risk:
Low risk:
formula_model:
formula:
brownexp: "math.exp(@brown.score_model.value - 2.0769)"
redexp: "math.exp(@red.score_model.value - 6.3953)"
blackexp: "math.exp(@black.score_model.value - 2.4029)"
sumexp: "@brownexp + @redexp + @blackexp"
brown_prob: "@brownexp / (1 + @sumexp)"
red_prob: "@redexp / (1 + @sumexp)"
black_prob: "@blackexp / (1 + @sumexp)"
blonde_prob: "1 - (@brown_prob + @red_prob + @black_prob)"
brown:
score_model:
variants:
rs796296176: {effect_allele: CA, effect_size: 1.2522}
rs11547464: {effect_allele: A, effect_size: -0.61155}
rs885479: {effect_allele: T, effect_size: 0.2937}
rs1805008: {effect_allele: T, effect_size: -0.50143}
rs1805005: {effect_allele: T, effect_size: 0.21172}
rs1805006: {effect_allele: A, effect_size: 1.9293}
rs1805007: {effect_allele: T, effect_size: -0.32318}
rs1805009: {effect_allele: C, effect_size: 0.60861}
rs1805009: {effect_allele: A, effect_size: 0.25624}
rs2228479: {effect_allele: A, effect_size: -0.054143}
rs1110400: {effect_allele: C, effect_size: -0.56315}
rs28777: {effect_allele: C, effect_size: 0.52168}
rs16891982: {effect_allele: C, effect_size: 0.75284}
rs12821256: {effect_allele: G, effect_size: -0.34957}
rs4959270: {effect_allele: A, effect_size: -0.19171}
rs12203592: {effect_allele: T, effect_size: 1.6475}
rs1042602: {effect_allele: T, effect_size: 0.16092}
rs1800407: {effect_allele: A, effect_size: -0.19111}
rs2402130: {effect_allele: G, effect_size: 0.35821}
rs12913832: {effect_allele: T, effect_size: 1.214}
rs2378249: {effect_allele: C, effect_size: 0.12669}
rs683: {effect_allele: C, effect_size: 0.21172}
red:
score_model:
variants:
rs796296176: {effect_allele: CA, effect_size: 25.508}
rs11547464: {effect_allele: A, effect_size: 2.5381}
rs885479: {effect_allele: T, effect_size: -0.20889}
rs1805008: {effect_allele: T, effect_size: 2.801}
rs1805005: {effect_allele: T, effect_size: 0.93493}
rs1805006: {effect_allele: A, effect_size: 3.65}
rs1805007: {effect_allele: T, effect_size: 3.4408}
rs1805009: {effect_allele: C, effect_size: 4.5868}
rs1805009: {effect_allele: A, effect_size: 22.107}
rs2228479: {effect_allele: A, effect_size: 0.62307}
rs1110400: {effect_allele: C, effect_size: 1.4453}
rs28777: {effect_allele: C, effect_size: 0.70401}
rs16891982: {effect_allele: C, effect_size: -0.41869}
rs12821256: {effect_allele: G, effect_size: -0.57964}
rs4959270: {effect_allele: A, effect_size: 0.24861}
rs12203592: {effect_allele: T, effect_size: 0.90233}
rs1042602: {effect_allele: T, effect_size: 0.45003}
rs1800407: {effect_allele: A, effect_size: -0.27606}
rs2402130: {effect_allele: G, effect_size: 0.28313}
rs12913832: {effect_allele: T, effect_size: -0.093776}
rs2378249: {effect_allele: C, effect_size: 0.76634}
rs683: {effect_allele: C, effect_size: -0.053427}
black:
score_model:
variants:
rs796296176: {effect_allele: CA, effect_size: 2.732}
rs11547464: {effect_allele: A, effect_size: -16.969}
rs885479: {effect_allele: T, effect_size: 0.39983}
rs1805008: {effect_allele: T, effect_size: -0.86062}
rs1805005: {effect_allele: T, effect_size: -0.0029013}
rs1805006: {effect_allele: A, effect_size: -16.088}
rs1805007: {effect_allele: T, effect_size: -1.3757}
rs1805009: {effect_allele: C, effect_size: 0.060631}
rs1805009: {effect_allele: A, effect_size: 3.9824}
rs2228479: {effect_allele: A, effect_size: 0.17012}
rs1110400: {effect_allele: C, effect_size: 0.29143}
rs28777: {effect_allele: C, effect_size: 0.82228}
rs16891982: {effect_allele: C, effect_size: 1.1617}
rs12821256: {effect_allele: G, effect_size: -0.89824}
rs4959270: {effect_allele: A, effect_size: -0.36359}
rs12203592: {effect_allele: T, effect_size: 1.997}
rs1042602: {effect_allele: T, effect_size: 0.065432}
rs1800407: {effect_allele: A, effect_size: -0.49601}
rs2402130: {effect_allele: G, effect_size: 0.26536}
rs12913832: {effect_allele: T, effect_size: 1.9391}
rs2378249: {effect_allele: C, effect_size: -0.089509}
rs683: {effect_allele: C, effect_size: 0.15796}
description:
name: HirisPlex
-
model
- generic model that can aggregate results of other model types -
diplotype_model
Required keys:diplotypes
-
description
- all properties to be included in the final results
- switched to yaml model definitions
- implemented formula, score, haplotype and diplotype model types
- prepared docker image with resources for polygenicmaker
- added gene symbols to description <<<<<<< HEAD
- improved parsing of effect_size (whitespaces and quotes are accepted now)
-
./.
alleles are now treated as missing genotypes