special_snowflake

Find the unique indices for a dataset


License
Other
Install
pip install special_snowflake==0.0.9

Documentation

Because datasets are often provided with scant metadata, I want to
infer some of the conventional metadata without depending on special
information. One such sort of metadata is the schema of the dataset.

Special snowflake looks for unique identifiers in arbitrary datasets.
Run it like so. ::

    $ snowflake bus_stops.csv
    route.number, stop.id, n.students
    time, n.students, location
    route.name, n.students, location
    route.name, stop.id
    route.number, stop.id, time

By default, you get all of the combinations of up to three columns
inside ``bus_stops.csv`` that function as unique indices on the
full spreadsheet.

Or call it from Python! ::

    from special_snowflake import fromcsv
    from pprint import pprint
    with open('open-data-index.csv') as fp:
        pprint(fromcsv(csv.DictReader(fp), n_columns = 2, only_adjacent = False))

This program finds all of the combinations of one or two columns
inside ``open-data-index.csv`` that function as unique indices on the
full spreadsheet.