logmole

An Extendable and Versatile Logparsing System


Keywords
arnold, json, log, logs, parser, parsing-library, pattern-matching, python, regular-expressions
License
MIT
Install
pip install logmole==0.9.0

Documentation

Build Status Coverage Status License: MIT Downloads

Logmole

An Extendable and Versatile Logparsing System

Logmole allows you dealing with regex pattern chaining in a simple way to create extensions for different types of log files.

Table of Contents

What can it do for you?

  • provide a framework to create reusable and modular logparsers based on regular expressions
  • simplify the process of chaining multiple regex patterns
  • dynamic object and fields creation based on named capturing groups and representatives
  • help with automatic and robust type conversions
  • offer some pre-build extensions

Installation

Logmole can be installed via pip.

pip install logmole

How to use

The LogContainer

The LogContainer class is a component that represents the content of a regex pattern or patterns of its sub-containers.

Attribute Type Description
pattern str The regex pattern a container will use to parse the data. Be aware that you always have to provide a named capturing group. Each match on the named group will end up as its own attribute on the container or the declared container representative.
representative str A name that represents one or multiple containers and defines where to store a containers matched data.
sub_containers str Defines the association of a container with child containers.
assumptions sublcass of BaseAssumptions An assumptions object to declare actions on matched data.
infer_type bool If True (default) it will use the declared assumptions to convert the type of a match automatically.
Methods Returns Description
dump(filepath=str, **kwargs) None Serialize LogContainer representation as a JSON formatted stream to the given filepath. Uses the same signature as json.dump()
get_value(str) str Get the value of an attribute using a dot separated like foo.bar.foobar

Understand By Example

Lets have a look at some examples to demonstrate the main concepts.

Input Log Content

19:22:40 | INFO     line 8 in <module> | Movie started
19:22:40 | WARNING  line 10 in <module> | Found 10000 ghosts
19:22:41 | DEBUG    line 12 in <module> | Scene contains 3 Monsters
19:22:43 | DEBUG    line 13 in <module> | Scene contains 1 Girl
19:22:46 | INFO     line 14 in <module> | Movie ends

Patterns

Assume our extension only includes a pattern like this that shall provide the start end end time.

from logmole import LogContainer

class MovieLog(LogContainer):
    pattern = "(?P<start_time>.\d+\:\d+:\d+).*started|(?P<end_time>.\d+\:\d+:\d+).*ends"
>>> log = MovieLog("/tmp/some.log")
>>> print(log)

{
    "end_time": "19:22:46",
    "start_time": "19:22:40"
}

The LogContainer gets represented as prettified dictionary. But contrary to that you can use it as object that holds attributes for each capturing group.

>>> print(log.start_time)
>>> print(log.end_time)

19:22:40
19:22:46

Grouping Containers

Instead of dealing with naming conventions categorize your matches you can define a representative for them. This doesn't makes sense necessarily if you are working with a small amount of containers, but it will help when creating more complex nestings.

class TimesContainer(LogContainer):
    pattern = "(?P<start>.\d+\:\d+:\d+).*started|(?P<end>.\d+\:\d+:\d+).*ends"
    representative = "times"


class MovieLog(LogContainer):
    sub_containers = [TimesContainer]
>>> log = MovieLog("/tmp/some.log")
>>> print(log)
>>> print("-"*10)
>>> print(log.times.start)
>>> print(log.times.end)

{
    "times": {
        "start": "19:22:40",
        "end": "19:22:46"
    }
}
----------
19:22:40
19:22:46

As you can see it will create a parent representative and attaches the matches to it.


Grouping of containers only makes sense if you use the representative, right?

class GhostsContainer(LogContainer):
    pattern = r"(?P<spooky_ghosts>\d+)\s+ghosts?"
    representative = "scene"


class EntitiesContainer(LogContainer):
    pattern = r"contains\s(?P<entities>\d+\s.*)"
    representative = "scene"


class TimesContainer(LogContainer):
    pattern = r"(?P<start>.\d+\:\d+:\d+).*started|(?P<end>.\d+\:\d+:\d+).*ends"
    representative = "times"


class MovieLog(LogContainer):
    sub_containers = [
        TimesContainer,
        GhostsContainer,
        EntitiesContainer
    ]
>>> log = MovieLog("/tmp/some.log")
>>> print log

{
    "scene": {
        "entities": [
            "3 Monsters",
            "1 Girl"
        ],
        "spooky_ghosts": 10000
    },
    "times": {
        "start": "19:22:40",
        "end": "19:22:46"
    }
}

But this doesn't mean that a sub container can't have its own sub containers. Rewriting the extension to look like this would give us the same result. You are flexible how to stack and layer your containers.

class GhostsContainer(LogContainer):
    pattern = r"(?P<spooky_ghosts>\d+)\s+ghosts?"


class EntitiesContainer(LogContainer):
    pattern = r"contains\s(?P<entities>\d+\s.*)"


class SceneContainer(LogContainer):
    sub_containers = [
        GhostsContainer,
        EntitiesContainer
    ]
    representative = "scene"


class TimesContainer(LogContainer):
    pattern = r"(?P<start>.\d+\:\d+:\d+).*started|(?P<end>.\d+\:\d+:\d+).*ends"
    representative = "times"


class MovieLog(LogContainer):
    sub_containers = [
        TimesContainer,
        SceneContainer
]

Assumptions

An Assumptions object defines a set of regex patterns and associates them with actions that gets called in case there is a match.

Take a look back at the created output again:

{
    "scene": {
        "entities": [
            "3 Monsters",
            "1 Girl"
        ],
        "spooky_ghosts": 10000
    },
    "times": {
        "start": "19:22:40",
        "end": "19:22:46"
    }
}

Notice that the scene.spooky_ghosts entry is not a string anymore. This is because the logmole.LogContainer.assumptions assigns a default logmole.TypeAssumptions object that handles simple conversions automatically.


Native Type Assumptions

As long as infer_type is set to True the LogContainer will always try to convert native types.

This includes support for:

  • int: ^(\-?\d+)$
  • float: ^(\-?\d+)$
  • None: ((N|n)one)$|^NONE$|^((N|n)ull)$|^NULL$|^((N|n)il)$|^NIL$

You can define whether your container should infer the type or not and disable it by setting infer_type to False. This only applies to the container itself and doesn't get inherited from parent containers. Find out more about native type assumptions:


Custom Type Assumptions

You can also extend existing assumptions or create an individual set of assumptions per container. Lets demonstrate this on our TimesContainer using a custom available TimeType object.

from logmole import (
    TypeAssumptions,
    TimeType
)
class TimesContainer(LogContainer):
    assumptions = TypeAssumptions({".*": TimeType()})
    pattern = r"(?P<start>.\d+\:\d+:\d+).*started|(?P<end>.\d+\:\d+:\d+).*ends"
    representative = "times"
>>> log = MovieLog("/tmp/some.log")
>>> print(type(log.times.start))
<type 'datetime.time'>

A TypeAssumptions class has to be initialized with a dictionary defining patterns and their corresponding types. In our case we can expect that everything that was matched by our TimesContainer.pattern before will be a string of a valid H:M:S format. So we don't need a more precise pattern within our TypeAssumptions and can expect those string would always fulfill the criteria to be convertable by our TimeType object. The TypeAssumptions class always allows us to inherit existing assumptions from parent containers. This is set by default. You can ignore parent assumptions when initializing the TypeAssumptions class using inherit=False. This way you can avoid potential match conflicts when using more sloppy patterns.

But generally spoken your patterns should be as precise as possible when using them on containers that hold a bunch of sub-containers.


Custom Types

Native Type conversions might not be sufficient enough for you. There might be cases where you want to convert your extracted information to a more specific type. There are custom types that can help you doing that or you can write your own.

KeyValueType

TO BE CONTINUED

TimeType

This object doesn't need any extra information. It will check for a valid input string and return a datatime.time instance.

TwoDimensionalNumberArray

An object helpful to convert a string into an even sized two dimensional array with automatic float conversion for each item. It always expects a number named match group within the pattern.

Example:

>>> array_type_1 = TwoDimensionalNumberArray("(?P<number>-?\d+)", item_array_size=1)
>>> array_type_2 = TwoDimensionalNumberArray("(?P<number>-?\d+)", item_array_size=2)
>>> array_type_3 = TwoDimensionalNumberArray("(?P<number>-?\d+)", item_array_size=3)

>>> input = "1, 2, 4 -4, -10, 1"
>>> print(array_type_1(input))
>>> print(array_type_2(input))
>>> print(array_type_3(input))

[[1.0], [2.0], [4.0], [-4.0], [-10.0], [1.0]]
[[1.0, 2.0], [4.0, -4.0], [-10.0, 1.0]]
[[1.0, 2.0, 4.0], [-4.0, -10.0, 1.0]]

Versioning

Logmole follows semantic versioning.


Extensions

ArnoldMole - An Extension for the lovely Arnold Renderer