nommy
A python byte and bit parser inspired by Rust's nom.
Installation
From the project root directory:
$ python setup.py install
From pip:
$ pip install nommy
Usage
# Parser
You specify a class wrapped with @nommy.parser that has type hints in the order that variables occur in the bytes.
Example:
import nommy @nommy.parser class Example: magic_str: nommy.string(8) some_unsigned_byte: nommy.le_u8 some_unsigned_16bit: nommy.le_u16 some_flag: nommy.flag next_flag: nommy.flag six_bit_unsigned: nommy.le_u(6) example, rest_of_bytes = Example.parse(b'CAFEBABE\xff\x12\x34\x9f') print(example.magic_str) # prints "CAFEBABE" print(example.some_unsigned_byte) # prints 255, from \xff print(hex(example.some_unsigned_16bit)) # prints 0x3412 , because little endian \x12\x34 # \x9f would be boolean 10011111 # This splits into 2 flags at first, 1 and 0, True and False # Then it contains 011111 or 0x1f, the six bit unsigned int, so 31. print(example.some_flag) # "True" from first bit of \x9f print(example.next_flag) # "False" from next bit print(example.six_bit_unsigned) # \x1f or 31
To run this, see examples/readme_example.py
# Endianedness and Signedness
There are several little-endian and big-endian types to use, such as:
@parser class LittleEndianUnsigned: eight_bit: le_u8 sixteen_bit: le_u16 thirtytwo_bit: le_u32 sixtyfour_bit: le_u64 one_bit: le_u(1) two_bit: le_u(2) ... seven_bit: le_u(7)
You also have signed sizes, like le_i8, le_i16, le_i32, and le_i64. For each of those, you also have big-endian: be_u16, ...
# Strings
There are three string types you can parse.
You can parse a static length string:
static_len: string(12)
You can parse a null-terminated string:
null_term: string(None)
And you also can parse pascal strings:
some_str: pascal_string
# Flag
You also can trivially extract a bit as a boolean variable:
debug: nommy.flag
# Enum
You can also create an le_enum or be_enum if you want to parse something like a DNS rtype, to have easy named values:
from enum import Enum from nommy import le_enum, parser @le_enum(4) # 4 bit size class DNSRType(Enum): A = 1 NS = 2 MD = 3 MF = 4 ... @parser class DNSRecord: rtype: DNSRType ... data, rest = DNSRecord.parse(b'\x10...') assert data == DNSRecord(rtype=DNSRType.A, ...)
# Nested Parser
Parsers can be split up into multiple classes, then combined:
from nummy import parser, le_u8, string @parser class Header: id: le_u8 recipient: string(None) sender: string(None) @parser class Body: subject: string(None) text: string(None) @parser class Email: header: Header body: Body
See examples/nested.py
# Repeating
Sometimes a field in a structure specifies the number of repeating fields, such as in DNS you have QDCOUNT and ANCOUNT for the number of queries and answers that will be in a following section. Nommy supports this with the repeating class, which allows you to specify a data type that repeats the number of times specified by a previous field, likely in the header.
The format is: repeating(SomeDataType, 'integer_field_name')
We also have repeating_until_null so that you can handle items that keep repeating indefinitely until a null byte is reached, for example, in DNS names that are repeating pascal strings essentially.
Examples:
@parser class SomeStruct: # Total size, 1 byte. some_flag1: flag some_flag2: flag some_flag3: flag some_flag4: flag some_four_bit_nibble: le_u(4) @parser class HasRepeats: name_ct: le_u8 names: repeating(string(None), 'name_ct') struct_ct: le_u8 structs: repeating(SomeStruct, 'struct_ct') labels: repeating_until_null(string(4)) data, rest = HasRepeats.parse( # 4 names, null terminated strings b'\x04foo\0bar\0baz\0quux\0' # 2 structs, 1 byte each # First is \xff, so all true flags and 15 value nibble # Second is \x0a, so all false flags and 10 value nibble b'\x02\xff\x0a' # Labels keep going until they hit a null byte b'ALFA' b'BETA' b'GAMA' b'DLTA' b'\x00' )
See examples/readme_repeating_example.py
You can even reference other parser values by splitting the field with a period like header.payload_ct:
from nommy import parser, repeating, le_u8, string @parser class Header: id: le_u8 payload_ct: le_u8 @parser class Payload: name: string(None) @parser class Message: header: Header string_ct: le_u8 strings: repeating(string(None), 'string_ct') payloads: repeating(Payload, 'header.payload_ct')
See examples for more.
For a full example that shows nested parsers with repeating values that closely matches an actual DNS parser, check examples/dns.py
Release Notes
0.3.3: | Fix first example of readme |
---|---|
0.3.2: | Fix readme and add examples/readme_repeating_example.py |
0.3.1: | Add repeating_until_null to handle DNS names |
0.3.0: | Added support for nested fields and repeating values. |
0.2.0: | Added enums. |
0.1.0: | Works for major types, with strings and flags. |
0.0.1: | Project created |