Language data store and linguistic query API

PolyglotDB is a Python package for storing and querying large speech corpora. It constructs various kinds of database, and has a consistent Python API for interacting with the various underlying databases. The online documentation is available at

This package is intended for developers and those experienced with scripting in Python. If you would like to use a graphical interface for querying and interacting with PolyglotDB databases, please see Speech Corpus Tools ( Speech Corpus Tools is currently depreciated and undergoing significant update to match recent development of PolyglotDB.


McAuliffe, Michael, Elias Stengel-Eskin, Michaela Socolof, Arlie Coles, Sarah Mihuc, and Morgan Sonderegger (2017). PolyglotDB [Computer program]. Version 0.0.1 (alpha), retrieved 28 July 2017 from


McAuliffe, Michael, Elias Stengel-Eskin, Michaela Socolof, and Morgan Sonderegger (2017). Polyglot and Speech Corpus Tools: a system for representing, integrating, and querying speech corpora. In Proceedings of Interspeech 2017.