nommy

A python byte and bit parser inspired by Rust's nom.


Keywords
bytes, struct, nom
License
GPL-3.0
Install
pip install nommy==0.3.1

Documentation

nommy

A python byte and bit parser inspired by Rust's nom.

Installation

From the project root directory:

$ python setup.py install

From pip:

$ pip install nommy

Usage

# Parser

You specify a class wrapped with @nommy.parser that has type hints in the order that variables occur in the bytes.

Example:

import nommy

@nommy.parser
class Example:
    magic_str: nommy.string(8)
    some_unsigned_byte: nommy.le_u8
    some_unsigned_16bit: nommy.le_u16
    some_flag: nommy.flag
    next_flag: nommy.flag
    six_bit_unsigned: nommy.le_u(6)


example, rest_of_bytes = Example.parse(b'CAFEBABE\xff\x12\x34\x9f')
print(example.magic_str)  # prints "CAFEBABE"
print(example.some_unsigned_byte)  # prints 255, from \xff
print(hex(example.some_unsigned_16bit))  # prints 0x3412 , because little endian \x12\x34
# \x9f would be boolean 10011111
# This splits into 2 flags at first, 1 and 0, True and False
# Then it contains 011111 or 0x1f, the six bit unsigned int, so 31.
print(example.some_flag)  # "True" from first bit of \x9f
print(example.next_flag)  # "False" from next bit
print(example.six_bit_unsigned)  # \x1f or 31

To run this, see examples/readme_example.py

# Endianedness and Signedness

There are several little-endian and big-endian types to use, such as:

@parser
class LittleEndianUnsigned:
    eight_bit: le_u8
    sixteen_bit: le_u16
    thirtytwo_bit: le_u32
    sixtyfour_bit: le_u64
    one_bit: le_u(1)
    two_bit: le_u(2)
    ...
    seven_bit: le_u(7)

You also have signed sizes, like le_i8, le_i16, le_i32, and le_i64. For each of those, you also have big-endian: be_u16, ...

# Strings

There are three string types you can parse.

You can parse a static length string:

static_len: string(12)

You can parse a null-terminated string:

null_term: string(None)

And you also can parse pascal strings:

some_str: pascal_string

# Flag

You also can trivially extract a bit as a boolean variable:

debug: nommy.flag

# Enum

You can also create an le_enum or be_enum if you want to parse something like a DNS rtype, to have easy named values:

from enum import Enum
from nommy import le_enum, parser

@le_enum(4)  # 4 bit size
class DNSRType(Enum):
   A = 1
   NS = 2
   MD = 3
   MF = 4
   ...

@parser
class DNSRecord:
    rtype: DNSRType
    ...

data, rest = DNSRecord.parse(b'\x10...')
assert data == DNSRecord(rtype=DNSRType.A, ...)

# Nested Parser

Parsers can be split up into multiple classes, then combined:

from nummy import parser, le_u8, string

@parser
class Header:
    id: le_u8
    recipient: string(None)
    sender: string(None)

@parser
class Body:
    subject: string(None)
    text: string(None)

@parser
class Email:
    header: Header
    body: Body

See examples/nested.py

# Repeating

Sometimes a field in a structure specifies the number of repeating fields, such as in DNS you have QDCOUNT and ANCOUNT for the number of queries and answers that will be in a following section. Nommy supports this with the repeating class, which allows you to specify a data type that repeats the number of times specified by a previous field, likely in the header.

The format is: repeating(SomeDataType, 'integer_field_name')

We also have repeating_until_null so that you can handle items that keep repeating indefinitely until a null byte is reached, for example, in DNS names that are repeating pascal strings essentially.

Examples:

@parser
class SomeStruct:
    # Total size, 1 byte.
    some_flag1: flag
    some_flag2: flag
    some_flag3: flag
    some_flag4: flag
    some_four_bit_nibble: le_u(4)

@parser
class HasRepeats:
    name_ct: le_u8
    names: repeating(string(None), 'name_ct')
    struct_ct: le_u8
    structs: repeating(SomeStruct, 'struct_ct')
    labels: repeating_until_null(string(4))

data, rest = HasRepeats.parse(
    # 4 names, null terminated strings
    b'\x04foo\0bar\0baz\0quux\0'
    # 2 structs, 1 byte each
    # First is \xff, so all true flags and 15 value nibble
    # Second is \x0a, so all false flags and 10 value nibble
    b'\x02\xff\x0a'
    # Labels keep going until they hit a null byte
    b'ALFA'
    b'BETA'
    b'GAMA'
    b'DLTA'
    b'\x00'
)

See examples/readme_repeating_example.py

You can even reference other parser values by splitting the field with a period like header.payload_ct:

from nommy import parser, repeating, le_u8, string

@parser
class Header:
    id: le_u8
    payload_ct: le_u8

@parser
class Payload:
    name: string(None)

@parser
class Message:
    header: Header
    string_ct: le_u8
    strings: repeating(string(None), 'string_ct')
    payloads: repeating(Payload, 'header.payload_ct')

See examples for more.

For a full example that shows nested parsers with repeating values that closely matches an actual DNS parser, check examples/dns.py

Release Notes

0.3.3: Fix first example of readme
0.3.2: Fix readme and add examples/readme_repeating_example.py
0.3.1: Add repeating_until_null to handle DNS names
0.3.0: Added support for nested fields and repeating values.
0.2.0: Added enums.
0.1.0: Works for major types, with strings and flags.
0.0.1: Project created