querier

Querying data-frames


Keywords
data-analysis, data-frames, data-mining, data-science, database, datascience, pandas, query-language, sql
License
BSD-3-Clause
Install
pip install querier==0.4.1

Documentation

querier logo


Data Frames are widely used and useful structures for data wrangling. The querier exposes a query language for Python pandas Data Frames, inspired from SQL's relational databases querying logic.

PyPI PyPI - License Downloads Documentation Status

Contents

Installation | Package description | Contributing | Tests | API Documentation | Dependencies | License

Installation

  • From Pypi:
pip install querier 
  • From Github, for the development version:
pip install git+https://github.com/thierrymoudiki/querier.git

Package description

There are currently 9 types of operations available in the querier, with no plan to extend that list much further (to maintain a relatively simple mental model). These verbs will look familiar to dplyr users, but the implementation (I used numpy, pandas and SQLite3) and functions' signatures are different:

  • concat: concatenates 2 Data Frames, either horizontally or vertically
  • delete: deletes rows from a Data Frame based on given criteria
  • drop: drops columns from a Data Frame
  • filtr: filters rows of the Data Frame based on given criteria
  • join: joins 2 Data Frames based on given criteria (available for completeness of the interface, this operation is already straightforward in pandas)
  • select: selects columns from the Data Frame
  • summarize: obtains summaries of data based on grouping columns
  • update: updates a column, using an operation given by the user
  • request: for operations more complex than the previous 8 ones, makes it possible to use a SQL query on the Data Frame

The following notebooks present examples of use of the querier:

Contributing

Your contributions are welcome, and valuable. Please, make sure to read the Code of Conduct first.

If you're not comfortable with Git/Version Control yet, please use this form.

In Pull Requests, let's strive to use black for formatting:

pip install black
black --line-length=80 file_submitted_for_pr.py

Tests

TBD

API documentation

https://querier.readthedocs.io/en/latest/

Dependencies

  • Numpy
  • Pandas
  • SQLite3

License

BSD 3-Clause © Thierry Moudiki, 2019.