Summary Statistics TSV file Validator

A file validator for validating GWAS summary statistics TSV files prior to and post harmonisation using pandas_schema. The purpose is to validate files before their conversion to HDF5.

Installation

Python package:

Requires python3
pip install ss-validate

Alternatively, use the docker image:

docker run ebispot/gwas-sumstats-validator ss-validate --help

Running the validator

To run the validator on a file:

ss-validate -f <file_to_validate.tsv> --logfile <logfile_name>

Information and errors are logged to the console and errors logged to the file specified. A console output might look like:

(INFO): Filename is good!
(INFO): Validating file...
(ERROR): Length of row 7 is: 16 instead of 15
(ERROR): Please fix the table. Some rows have different numbers of columns to the header
(INFO): Rows with different numbers of columns to the header are not validated
(ERROR): {row: 1, column: "p_value"}: "-99" was not in the range [0, 1)

The errors from the output tell us that row seven has too many columns and row one does not have a valid pvalue.

Addional options

--linelimit : int, default 1000

Once this number of erroneous rows has been reached, stop looking for more.
--minrows : int, default 100000

The minimum number of rows the file is required to have in order to validate sucZZcessfully.
--drop-bad-lines : bool, default False

Drops the the lines with errors from the file and writes it to a new file called <file_to_validate.tsv.valid>
--stage : {'standard', 'harmonised', 'curated'}, default 'standard'

The stage the file is in. It is either standard format ('standard'), harmonised ('harmonised') or pre-standard in the custom curated format ('curated'). Recommended to leave as default.

Import ss-validate to another python script

Install as above
Import and use in your python file

import ss_validate.validator as ssv

# initialise a validator object for your summary statistics and settings 
validator = ssv.Validator(file='sumstats.tsv.gz', filetype='gwas-upload', error_limit=1, logfile='logfile.log')

# validate the headers
validator.validate_headers()

# validate the squareness
validator.validate_file_squareness()

# validate the data
validator.validate_data()

ss-validate
Release 0.4.5

Release 0.4.5

1.0.0.dev3

1.0.0.dev2

1.0.0.dev1

1.0.0.dev0

0.4.8

0.4.7

0.4.6

0.4.5

0.4.4

0.4.3

Documentation

Summary Statistics TSV file Validator

Installation

Running the validator

Addional options

Import ss-validate to another python script

Stats

Development practices

Releases

Contributors

ss-validate Release 0.4.5

Release 0.4.5 Toggle Dropdown 1.0.0.dev3 1.0.0.dev2 1.0.0.dev1 1.0.0.dev0 0.4.8 0.4.7 0.4.6 0.4.5 0.4.4 0.4.3

Documentation

Summary Statistics TSV file Validator

Installation

Running the validator

Addional options

Import ss-validate to another python script

Stats

Development practices

Releases

Contributors

ss-validate
Release 0.4.5

Release 0.4.5

1.0.0.dev3

1.0.0.dev2

1.0.0.dev1

1.0.0.dev0

0.4.8

0.4.7

0.4.6

0.4.5

0.4.4

0.4.3