Create easy-to-use Query objects that can apply on NumPy structured arrays, astropy Table, and Pandas DataFrame.
Tired of writing lots of brackets and keeping track of variable names when filtering table data? Enter
subtable = table[table["population"] >= 20000] subtable = subtable[subtable["population"] / subtable["area"] >= 1000]
subtable = Query("population >= 20000", "population / area >= 1000").filter(table)
You can install
easyquey from conda-forge:
conda install scipy --channel conda-forge
Or from PyPI:
pip install easyquery
Creating Query objects
The most important concept that easyquery introduces is a Query object, which is an object that represents the queries (conditions) that you want to apply to your table data.
For most simple cases a Query object can be created with a simple string:
q1 = Query('population >= 20000') q2 = Query('population / area >= 1000')
The string will be passed to numexpr and you can find a list of supported operators and math functions here.
You can also combine multiple conditions at once:
q3 = Query('population >= 20000', 'area < 300') # satisfies both
A Query object can also be created with a tuple, where the first element of
the tuple should be a callable, and the rest should be the field names that
correspond to the argument list of the callable.
This construction allows you to specify more complex queries or to use functions
that are not supported by numexpr.
q4 below has the same effect as
q4 = Query((lambda x, y: x / y >= 1000, 'population', 'area'))
You can also use
QueryMaker to create some commonly used conditions that
cannot be easily written as simple string.
# for string operations q5 = QueryMaker.equals('name', 'Paris') q6 = QueryMaker.contains('name', 'New') q7 = QueryMaker.startswith('name', 'San') # for checking if the column values are in another list q8 = QueryMaker.in1d('id', [1, 3, 6, 7])
Query objects can be combined with
^ (xor), and cen be
~ (not). Each of these operation returns a new Query object.
q9 = (~q1 | Query('established_year > 1900'))
Using Query objects
A Query object has three major methods:
All of them can operate on NumPy structured arrays, astropy Tables, and pandas DataFrames:
filterreturns a new table that only has entries satisfying the query;
countreturns the number of entries satisfying the query;
maskreturns a bool array for masking the table.
import numpy as np from easyquery import Query t = np.array([(1, 5, 4.5), (1, 1, 6.2), (3, 2, 0.5), (5, 5, -3.5)], dtype=np.dtype([('a', '<i8'), ('b', '<i8'), ('c', '<f8')])) q = Query('a > 3') q.filter(t) q.count(t) q.mask(t)