benc: Python Bencode decoding and encoding (Python >= 3.6)
Bencode is a serialization/encoding format used by BitTorrent for transmitting loosely structured data. Unlike JSON or msgpack it is notable because input/output are consistent between implementions for a given data structure, there is a bijection between the two, resulting in one (and only one) (de)serialization for any given input.
Benc
Copyright Peter Tripp (@notpeter) 2018, MIT Licensed
This is a complete re-implementation of bencode for Python 3.6 only. This allows us to use newer Python constructs resulting in a more readible code base. This is intentionally built as a single file implementation which can be vendored into codebases as necessary.
Types
Bencode supports four data types:
- int:
-42
->i-42e
- str:
'spam'
->4:spam
- list:
['XYZ', 4321]
->l3:XYZi4321ee
- dict:
{'XYZ': 4321}'
->d3:XYZi4321ee
Gotchas:
- Dictionary keys must be strings,
int
keys are not supported (like JSON) - Unsupported types:
- null
- float
- Types which are coerced; bijection only after after initial encoding/decoding
- set (list)
- bool (int)
New-ish Python constructs used by this library:
- Single Dispatch Generic Functions (see PEP 443)
- Type Hints (see PEP 483, PEP 484 and PEP 526)
- Byte Literals (see PEP3112)
Why another bencode library?
The original bencode.py from the Mainline BitTorrent client (written by @ppetru) has not been significantly updated it's initial release in 2004. Originally licensed the BitTorrent Open Source License subsequent releases have been licensed under OSI-Approved licenses, notably GPLv3 (see BitTorrent-4.0.0-GPL.tar.gz) and later Python Software Foundation License Version 2.3 (see BitTorrent-5.3-GPL.tar.gz).
I've taken a TDD approach starting with a set of tests cases from existing bencode implementations. If you need a library that's compatible with 2.7-3.x, this isn't it,
Install
pip3 install -U benc
Usage
from benc import bencode, bedecode
CLI:
# benc '{"key": 1234, "abc": 9876}'
d3:abci9876e3:keyi1234ee
Known issues/todo
- Better documentation of ASCII vs UTF-8 byte literals
- Explicitly expose ASCII-only binary mode.