Automated data clean up pipeline for genetic data

bioinformatics, quality, control, genetic
pip install pyGenClean==1.8.3


PyPI version

pyGenClean - Automated Data Clean Up

pyGenClean is an informatics tool to facilitate and standardize the genetic data clean up pipeline with genotyping array data. In conjunction with a source batch-queuing system, the tool minimizes data manipulation errors, it accelerates the completion of the data clean up process and it provides informative graphics and metrics to guide decision making for statistical analysis.

If you use pyGenClean in you project, please cite the published paper describing the tool:

Lemieux Perreault LP, Provost S, Legault MA, Barhdadi A, Dubé MP (2013) pyGenClean: efficient tool for genetic data clean up before association testing. Bioinformatics, 29(13): 1704-1705 [DOI:10.1093/bioinformatics/btt261]


Documentation is available from


Here are the dependencies that must be installed before pyGenClean:


For Linux users, we recommend installing pyGenClean in a Python virtualenv (virtual environment).

pyGenClean should work on Windows and MacOS, even though it hasn't been fully tested for full compatibility. It has been tried on Windows XP (32 bits) and Windows 7 (64 bits, but with a 32 bits Python 2.7 installation) without known problems.

For a step by step installation on both Linux and Windows operation systems, see pyGenClean documentation, located here.