PyTiJo - Text In JSON Out
Structures semi-structured text, useful when parsing command line output from unix systems and networking devices.
What is it
Well that’s where pytijo tries to help. It lets you define
the payload you wish came back to you, and with a sprinkle of the right
regular expressions it does!
Installation
With pip:
pip install pytijo
From source
make install
Usage
Pass your text and a "structure" (python dictionary) to the parser modules parse method.
from pytijo import parser
output = """
eth0 Link encap:Ethernet HWaddr 00:11:22:3a:c4:ac
inet addr:192.168.1.2 Bcast:192.168.1.255 Mask:255.255.255.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:147142475 errors:0 dropped:293854 overruns:0 frame:0
TX packets:136237118 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:17793317674 (17.7 GB) TX bytes:46525697959 (46.5 GB)
eth1 Link encap:Ethernet HWaddr 00:11:33:4a:c8:ad
inet addr:192.168.1.3 Bcast:192.168.1.255 Mask:255.255.255.0
inet6 addr: fe80::225:90ff:fe4a:c8ad/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:51085118 errors:0 dropped:251 overruns:0 frame:0
TX packets:3447162 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:4999277179 (4.9 GB) TX bytes:657283496 (657.2 MB)
"""
struct = {
'interfaces': [{
'#id': '(eth\d{1,2})',
'ipv4_address': 'inet addr:(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})',
'mac_address': 'HWaddr\s((?:[a-fA-F0-9]{2}[:|\-]?){6})'
}]
}
parsed = parser.parse(output, struct)
print parsed
This will return the python dictionary
{
'interfaces': [
{
'id': 'eth0',
'ipv4_address': '192.168.1.2',
'mac_address': '00:11:22:3a:c4:ac'
},
{
'id': 'eth1',
'ipv4_address': '192.168.1.3',
'mac_address': '00:11:33:4a:c8:ad'
}
]
}
Which you can then do with as you please, maybe return as JSON as part of a REST service...
The Struct
{}, a list [], or a
regular expression string [a-z](\d) with one group (to populate
the value).The structure is recursively parsed, populating the dictionary/structure that was provided with values from the input text.
#id or #start the difference being #start
key/value is dropped from the resulting output.#id or #start marks the beginning and end for each
“chunk” that you’d like parsed.#end key and regex value.An example is useful here.
E.g. The following structure.
{
'tables': [
{
'#id': '\[TABLE (\d{1,2})\]',
'flows': [
{
'#id': '\[FLOW_ID(\d+)\]',
'info': 'info\s+=\s+(.*)'
}
]
}
]
}
Will create a “chunk/block” from the following output
[TABLE 0] Total entries: 3
[FLOW_ID1]
info = related to table 0 flow 1
[TABLE 1] Total entries: 31
[FLOW_ID1]
info = related to table 1 flow 1
That will be parsed as:
{
'tables': [{
'id': '0',
'flows': [{ 'id': '1', 'info': 'related to table 0 flow 1' }],
}, {
'id': '1',
'flows': [{ 'id': '1', 'info': 'related to table 1 flow 1' }]
}]
}
See under tests/test_parser_api.py for more usage examples.