SmartyParse

What is SmartyParse?

SmartyParse is a binary packing/unpacking (aka building/parsing) library for arbitrary formats written for python >= 3.3. If you have a defined binary format (.tar, .bmp, byte-oriented network packets, etc) or are developing one, SmartyParse is a way to convert those formats to and from Python objects. Its most direct alternative is Construct, which is admittedly much more mature.

As an explicit warning, this is a very, very new library, and you are likely to run into some bugs. Pull requests are welcome, and I apologize for the sometimes messy source.

What makes SmartyParse different?

SmartyParse, first and foremost, was built to support self-describing formats. Though it is (to an extent) possible to create these in declarative parsing libraries like Construct, it is very tedious, and requires a substantial amount of extra code.

Fundamentally that means there are three big differences between SmartyParse and Construct:

SmartyParse is highly Pythonic and very intuitive. Construct requires learning a specialized Construct descriptive format.
SmartyParse is imperative. Construct is declarative.
SmartyParse supports running arbitrary callbacks during the parsing process.

Otherwise, Construct and SmartyParse are functionally similar (though for the record, SmartyParse doesn't yet natively support bit-oriented formats, which Construct does).

Installation

Smartyparse is currently in pre-release alpha status. It is available on pip, but you must explicitly allow prerelease versions like this:

pip install --pre smartyparse

Smartyparse has no external dependencies at this time (beyond the standard library), though building it from source will require pandoc and pypandoc:

sudo apt-get install pandoc
pip install pypandoc

Example usage

See /docs for full API documentation.

Declaring a simple length -> data object:

Offset	Length	Description
0	4	Int32 U, n
4	n	Blob

from smartyparse import SmartyParser
from smartyparse import ParseHelper
import smartyparse.parsers

unknown_blob = SmartyParser()
unknown_blob['length'] = ParseHelper(parsers.Int32(signed=False))
unknown_blob['data'] = ParseHelper(parsers.Blob())
unknown_blob.link_length(data_name='data', length_name='length')

Nesting that to define a simple file:

Offset	Length	Description
0	4	Magic 'test'
4	4	Int32 U, n
8	n	Blob
8 + n	4	Int32 U, m
12 + n	m	Blob
12 + n + m	4	Int32 U

test = SmartyParser()
test['magic'] = ParseHelper(parsers.Blob(length=4))
test['blob1'] = unknown_blob
test['blob2'] = unknown_blob
test['tail'] = ParseHelper(parsers.Int32(signed=False))

An object to pack into the above:

test_obj = {
    'magic': b'test',
    'blob1': {
        'data': b'Hello world!'
    },
    'blob2': {
        'data': b'Hello, world?'
    },
    'tail': 123
}

Why the awkward dict for the blobs? Well, because SmartyParser objects aren't usually intended for things as simple as a length <-> value pair. It would make a lot more sense if it were 'header' and 'body', wouldn't it?

Packing and recycling the above object:

>>> packed = test.pack(test_obj)
>>> test_obj_reloaded = test.unpack(packed)
>>> test_obj == test_obj_reloaded
True

Supporting SmartyParse

Smartyparse is under development as part of the Muse protocol implementation used in the Ethyr encrypted email-like messaging application.

Todo

(In no particular order)

Ensure that smartyparsers can be created without parsers, so that callbacks can be registered on them, before their parsers have been defined. Basically, avoid all of these incredibly annoying "Nonetype has no set_callback method" issues by allowing on-the-fly parser declaration, instead of setting the actual field itself to None.
Think about register_callback vs set_callback vs add_callback etc. It would be nice to easily and natively support multiple callbacks. HOWEVER, there's an argument to be made that this should be handled elsewhere, since functions can call other functions.
Allow SmartyParsers with a single "visible" object (example: pascal strings) to be expanded into parent containers, avoiding the awkward double-dict construction
Change SmartyParserObject to use slots for storage, but not for item names (essentially removing attribute-style access, which isn't documented anyways)
Add self-describing format to example usage
Write .bmp library showcase
Move/mirror documentation to readthedocs
Add padding generation method (in addition to constant byte)
Add pip version badge: [![PyPi version](https://pypip.in/v/$REPO/badge.png)](https://github.com/Muterra/py_smartyparse) above.
Support bit orientation
Support endianness of binary blobs (aka transforming from little to big)
Support memoization of static SmartyParsers for extremely performant parsing
Support memoization of partially-static smartyparsers for better-than-completely-dynamic parsing
Autogeneration of integration test suite from API spec in /doc/
Random self-describing format declaration and testing
Performance testing
Add customized pep8 to codeclimate testing, as per (as yet unpublished) Muterra code style guide
Change logic to allow for delayed execution on callbacks for link_length so the content parser can be dynamically specified
Add utility function for generating a single callback from multiple callables

Done!

~~Add passing of parent SmartyParser to callback system.~~ Added in 0.1a4 with the @references(referent) decorator.
~~Clean up callback API.~~ Added in 0.1a4
~~Support for "end flags" for indeterminate-length lists~~ Added in 0.1a5

Misc API notes

SmartyParser fieldnames currently must be valid identifier strings (anything you could assign as an attribute). If you want to programmatically check validity, use 'foo'.isidentifier(), but SmartyParser will raise an error if you try to assign an invalid fieldname. This is the result of using __slots__ for some memory optimization, which is a compromise between default dict behavior and memory use. If you're parsing a ton of objects, it will be very helpful for memory consumption.
Due to numeric imprecision, floats and doubles can potentially break equivalence (ie start == reloaded) when comparing the before and after of packing -> unpacking the same object.

smartyparse
Release 0.1.3

Release 0.1.3

0.1.3

0.1.1

0.1.0

0.1a4

0.1a3

0.1a2

0.1a1

Documentation

SmartyParse

What is SmartyParse?

What makes SmartyParse different?

Installation

Example usage

Supporting SmartyParse

Todo

Done!

Misc API notes

Stats

Development practices

Releases

Contributors

smartyparse Release 0.1.3

Release 0.1.3 Toggle Dropdown 0.1.3 0.1.1 0.1.0 0.1a4 0.1a3 0.1a2 0.1a1

Documentation

SmartyParse

What is SmartyParse?

What makes SmartyParse different?

Installation

Example usage

Supporting SmartyParse

Todo

Done!

Misc API notes

Stats

Development practices

Releases

Contributors

smartyparse
Release 0.1.3

Release 0.1.3

0.1.3

0.1.1

0.1.0

0.1a4

0.1a3

0.1a2

0.1a1