obst

a lightweight low performance lib to save metadata besides files and iterate through them


Keywords
FILE, METADATA
License
GPL-3.0
Install
pip install obst==0.13

Documentation

obst

A lightweight, low performance way to save metadata besides files and iterate through them

pip install obst

TODO\ISSUES:

  • On the second thought, instead of writing obst this way, it could be better to base it completely on pandas tables. One table with all meta data and one column to the path of the data file. One folder with all the files and one file loadable into a pandas table which can be search and grouped with std pandas commands.

    • the interface would be known to many users
    • all meta data are in one place, no easy select and copy files
    • faster then reading all metadata files, also more memory consuming if many data files existing
    • How to build such table? Oh, thats the problem...
  • Documentation

  • make numbers possible in meta dict, and dots in the key name

  • maybe build small server(webservice) around obst

  • add meta file for obst itself for the folder

  • let obj data look optional like a file?

  • add order option for each key to index.order

  • add search/select function

    • make access structure for index.search and index.order same in e.g return [[name, value], ...]
  • Work with it and improve interface...

  • add Save/Load created indices to file

  • add update to index (done)

  • index should ignore files which are not existing anymore instead of throwing an error

  • make index (optional) as generator

  • make obst more independent of the structure of the metafiles, generalize the interface of metafiles. This would give the possibility to choose how the metafiles are looking by replacing the metafile class

  • make a default operator for non json seri. fields

Current Interface Example

Creating Obst and Objects

import obst
obst_object = obst.open_obst("path")

Opens a dictionary or create it if it is not existing. It is assumed that every data file in this dictionary has a metafile.

sub_obst_object = obst_object.sub("sub_dictionary")

Creates a sub dictionary in the main path. This sub_obst_object are just for ordering the files for human access. For iterating through the files it does not matter if the files are in sub dictionaries or not, as long they are in the main path.

obj = obst.object(name='filename', unique_id = True)

Creates a object which can be inserted into a obst dictionary. If unique_id is False the name needs to unique in the obst dictionary, but there can be files with same names in different sub dictionaries. If unique_id is True a random uuid is added to the filename.

obj.meta["key1"] = value1
obj.meta["key2"] = {'key3' : value23}
obj.meta["key4"] = value4
obj.data = pickle.dumps(data)

The meta member is a dictionary which needs to be json serializable. The data needs to be in byte format. There are few keys which are protected, this means they will be overwritten if the meta data is written to a file:

  • _time
  • _datafile
sub_obst_object.insert(obj)

If the obj is inserted into the obst object, two files will be created, one for the meta data and one for the data.

Iterating through objects

index = obst_object.index(["key1", "key2.key3"])

For iterating through objects a index needs to be created. The argument is a list of keys through which values it should be iterated. Sub keys can be access through the dot notation.

for k23, sub_group in index.order("key2.key3", "key1"):
    for k1, all_objects in sub_group:
        for unopend_obj in all_objects:
            obj = unopend_obj() #here also the data file will be loaded into the memory

With the index.order function can be the iteration order set. All object which have the same values for the specified keys are in all_objects. To avoid that all the data files are loaded one has to open the object to load the object from the file.

def filter_key4(meta):
    if 'key4' in meta:
        if meta['key4'] == 0:
            return True
    return False

index = obst_object.index(["key1", "key2.key3"], filter=filter_key4)

One can define filter function which can operate on the meta dictionary. If the filter returns false the regarding file are not added to the index.

indices = obst_object.indices([["key4"], ["key1", "key2.key3"]], idx_filter = [None,filter_key4])
index0 = indices[0]
index1 = indices[1]

At the creation of the index all the metafiles are opened. To make it a little bit more efficient it is possible to create multiple indices at once.

References

Logo: https://commons.wikimedia.org/wiki/File:Fruit.svg