dictpy

Searching and Serializing Python Dictionaries/JSON files.


Keywords
json, python, search
License
BSD-3-Clause
Install
pip install dictpy==0.0.7

Documentation

dictpy (Dictionary Python)

PyPI tests coverage flake8 downloads license

Advanced tools for Python dictionaries.

Included Tools:

  • DictSearch: Search large and complex Python dictionaries/JSON files.
  • Serializer: Make custom JSON serializable Python classes serializable (make safe for conversion to JSON).

Installation

Pip installable package available.

pip install dictpy



Searching (DictSearch)

Imagine you have some big ugly Python dictionary (like the one produced by PubChem when you download the JSON file for CID 6) and you want to extract some specific piece of information. This section will show how DictSearch can make this easy.

To perform the search we can pass the Python dictionary, and a search target (more discussion below on this) to DictSearch. It will find all valid objects for the search. The results of the search will be stored in .result.

import dictpy

search = dictpy.DictSearch(data=json_data, target=target)
print(search.result)

The return object is a list[list[tree, obj]]

  • tree: shows the navigation to get to the data ('.' separated)
    • Keys are recorded for dictionaries
    • Integer are recorded for position in lists
    • Example: Record.Section.1.Description
      {"Record": {
          "Section": [
              ######,
              {"Description": #####}  # A match to the search!
          ]
      }}
  • obj return the object
    • Options:
      • Return current object (default)
        • Returns the object you search for
        • Example:
          • search: {"dog": "*"}; returns: {"dog": "golden retriever"}
          • search: "dog"; returns: {"dog": "golden retriever"}
          • search: {"dog": "golden retriever"}; returns: {"dog": "golden retriever"}
      • Return parent object
        • Returns parent object or whole current level
        • To switch to returning parent objects, change return_func.
          search = dictpy.DictSearch(data=json_data, target=target, return_func=dictpy.DictSearch.return_parent_object)
          
        • Example
          • search: {"dog": "*"}; returns:
            {
            "dog": "golden retriever", 
            "cat": "bangel", 
            "fish": "goldfish"
            }
            • search: "dog"; returns:
            {
            "dog": "golden retriever", 
            "cat": "bangel", 
            "fish": "goldfish"
            }

How to format target

Target can take match accept strings, int, floats, single line dictionaries, and regex (regular expression). Wild cards(*) can also be used for partial dictionary searches.

Example Targets:

  • {"RecordType": "CID"}
    • Will match exactly to both 'key', and 'value' (won't match to list entries)
  • {"RecordNumber": 6}
    • Will match exactly to both 'key', and 'value' (won't match to list entries)
    • With numbers, the default search behavior auto-coverts strings to number.
      • So this would hit to {"RecordNumber": "6"}
      • To change this behavior set op_convert_str_to_num=False
  • 2526
    • Will look for 2526 in either 'key', 'value' or list entry.
  • 3D Conformer
    • Will look for "3D Conformer" in either 'key', 'value' or list entry.
  • {"MoveToTop": "*"}
    • Will look for "MoveToTop" as a dictionary 'key' and the 'value' can be anything. (won't match to list entries)
  • {"*": "Chemical Safety"}
    • Will look for "Chemical Safety" as a dictionary 'value' and the 'key' can be anything. (won't match to list entries)
  • "^[A-I]{3}$"
    • Regular expression search will match in either 'key', 'value' or list entry.
  • {"^RecordT": "*"}
    • Regular expression search will match for 'key' and 'value' can be anything. (won't match to list entries)

For more examples see tests/test_dict_search.py.

Example

This example will extract data from a JSON for "1-Chloro-2,4-dinitrobenzene" download from PubChem.

Example JSON File

First, we will load our example above (change "/path/to/data/" to your file location for the file above):

import json

with open("C:/path/to/data/cid_6.json", "r") as f:
    text = f.read()
    json_data = json.loads(text)

print(json_data)

You will get a massive printout of the 12,000 line JSON file.

import dictpy

search = dictpy.DictSearch(data=json_data, target={"RecordType": "CID"})
print(search.result)

Print out:

[['Record.RecordType', {'RecordType': 'CID'}]]

Integer search target:

search = dictpy.DictSearch(data=json_data, target=2526)
print(search.result)

Print out:

[
    ['Record.Section.3.Section.1.Section.14.Information.1.Value.Number', 2526],
    ['Record.Section.3.Section.1.Section.14.Information.1.Value.Number', 2526]
]


Serialization (Serializer)

Serializer is useful for turning custom python classes into JSON compatible dictionaries.

This serialization class is a useful pre-process step for complex custom python class that contain non-JSON serializable safe objects (Example: datatime objects, custom classes, any classes from other packages, ObjectIDs, etc.)

Inherit Serializer in to your custom python class.

import json
import datetime

import dictpy

class Example(dictpy.Serializer):

    def __init__(self, datetime_obj, stuff2):
        self.datetime_obj = datetime_obj  # NOT JSON serializable object
        self.stuff2 = stuff2
        self.stuff3 = None 


example = Example(datetime.time(), "stuff2")

# json_output = json.dumps(example)  # This will fail with NOT JSON serializable objects

dict_of_example = example.as_dict()
dict_of_example = dictpy.Serializer.dict_cleanup(dict_of_example)  # converts NOT JSON serializable objects to strings. 
dict_of_example = dictpy.Serializer.remove_none(dict_of_example)  # Optional: remove None; self.stuff3 removed

json_output = json.dumps(dict_of_example)