Logmole

An Extendable and Versatile Logparsing System

Logmole allows you dealing with regex pattern chaining in a simple way to create extensions for different types of log files.

Project Goals
Installation
How to use
Extensions

What can it do for you?

provide a framework to create reusable and modular logparsers based on regular expressions
simplify the process of chaining multiple regex patterns
dynamic object and fields creation based on named capturing groups and representatives
help with automatic and robust type conversions
offer some pre-build extensions

Installation

Logmole can be installed via pip.

pip install logmole

How to use

The LogContainer

The LogContainer class is a component that represents the content of a regex pattern or patterns of its sub-containers.

Attribute	Type	Description
`pattern`	`str`	The regex pattern a container will use to parse the data. Be aware that you always have to provide a named capturing group. Each match on the named group will end up as its own attribute on the container or the declared container representative.
`representative`	`str`	A name that represents one or multiple containers and defines where to store a containers matched data.
`sub_containers`	`str`	Defines the association of a container with child containers.
`assumptions`	sublcass of `BaseAssumptions`	An assumptions object to declare actions on matched data.
`infer_type`	`bool`	If True (default) it will use the declared assumptions to convert the type of a match automatically.

Methods	Returns	Description
`dump(filepath=str, **kwargs)`	`None`	Serialize LogContainer representation as a JSON formatted stream to the given filepath. Uses the same signature as json.dump()
`get_value(str)`	`str`	Get the value of an attribute using a dot separated like `foo.bar.foobar`

Understand By Example

Lets have a look at some examples to demonstrate the main concepts.

Input Log Content

19:22:40 | INFO     line 8 in <module> | Movie started
19:22:40 | WARNING  line 10 in <module> | Found 10000 ghosts
19:22:41 | DEBUG    line 12 in <module> | Scene contains 3 Monsters
19:22:43 | DEBUG    line 13 in <module> | Scene contains 1 Girl
19:22:46 | INFO     line 14 in <module> | Movie ends

Patterns

Assume our extension only includes a pattern like this that shall provide the start end end time.

from logmole import LogContainer

class MovieLog(LogContainer):
    pattern = "(?P<start_time>.\d+\:\d+:\d+).*started|(?P<end_time>.\d+\:\d+:\d+).*ends"

>>> log = MovieLog("/tmp/some.log")
>>> print(log)

{
    "end_time": "19:22:46",
    "start_time": "19:22:40"
}

The LogContainer gets represented as prettified dictionary. But contrary to that you can use it as object that holds attributes for each capturing group.

>>> print(log.start_time)
>>> print(log.end_time)

19:22:40
19:22:46

Grouping Containers

Instead of dealing with naming conventions categorize your matches you can define a representative for them. This doesn't makes sense necessarily if you are working with a small amount of containers, but it will help when creating more complex nestings.

class TimesContainer(LogContainer):
    pattern = "(?P<start>.\d+\:\d+:\d+).*started|(?P<end>.\d+\:\d+:\d+).*ends"
    representative = "times"


class MovieLog(LogContainer):
    sub_containers = [TimesContainer]

>>> log = MovieLog("/tmp/some.log")
>>> print(log)
>>> print("-"*10)
>>> print(log.times.start)
>>> print(log.times.end)

{
    "times": {
        "start": "19:22:40",
        "end": "19:22:46"
    }
}
----------
19:22:40
19:22:46

As you can see it will create a parent representative and attaches the matches to it.

Grouping of containers only makes sense if you use the representative, right?

class GhostsContainer(LogContainer):
    pattern = r"(?P<spooky_ghosts>\d+)\s+ghosts?"
    representative = "scene"


class EntitiesContainer(LogContainer):
    pattern = r"contains\s(?P<entities>\d+\s.*)"
    representative = "scene"


class TimesContainer(LogContainer):
    pattern = r"(?P<start>.\d+\:\d+:\d+).*started|(?P<end>.\d+\:\d+:\d+).*ends"
    representative = "times"


class MovieLog(LogContainer):
    sub_containers = [
        TimesContainer,
        GhostsContainer,
        EntitiesContainer
    ]

>>> log = MovieLog("/tmp/some.log")
>>> print log

{
    "scene": {
        "entities": [
            "3 Monsters",
            "1 Girl"
        ],
        "spooky_ghosts": 10000
    },
    "times": {
        "start": "19:22:40",
        "end": "19:22:46"
    }
}

But this doesn't mean that a sub container can't have its own sub containers. Rewriting the extension to look like this would give us the same result. You are flexible how to stack and layer your containers.

class GhostsContainer(LogContainer):
    pattern = r"(?P<spooky_ghosts>\d+)\s+ghosts?"


class EntitiesContainer(LogContainer):
    pattern = r"contains\s(?P<entities>\d+\s.*)"


class SceneContainer(LogContainer):
    sub_containers = [
        GhostsContainer,
        EntitiesContainer
    ]
    representative = "scene"


class TimesContainer(LogContainer):
    pattern = r"(?P<start>.\d+\:\d+:\d+).*started|(?P<end>.\d+\:\d+:\d+).*ends"
    representative = "times"


class MovieLog(LogContainer):
    sub_containers = [
        TimesContainer,
        SceneContainer
]

Assumptions

An Assumptions object defines a set of regex patterns and associates them with actions that gets called in case there is a match.

Take a look back at the created output again:

{
    "scene": {
        "entities": [
            "3 Monsters",
            "1 Girl"
        ],
        "spooky_ghosts": 10000
    },
    "times": {
        "start": "19:22:40",
        "end": "19:22:46"
    }
}

Notice that the scene.spooky_ghosts entry is not a string anymore. This is because the logmole.LogContainer.assumptions assigns a default logmole.TypeAssumptions object that handles simple conversions automatically.

Native Type Assumptions

As long as infer_type is set to True the LogContainer will always try to convert native types.

This includes support for:

int: ^(\-?\d+)$
float: ^(\-?\d+)$
None: ((N|n)one)$|^NONE$|^((N|n)ull)$|^NULL$|^((N|n)il)$|^NIL$

You can define whether your container should infer the type or not and disable it by setting infer_type to False. This only applies to the container itself and doesn't get inherited from parent containers. Find out more about native type assumptions:

Custom Type Assumptions

You can also extend existing assumptions or create an individual set of assumptions per container. Lets demonstrate this on our TimesContainer using a custom available TimeType object.

from logmole import (
    TypeAssumptions,
    TimeType
)

class TimesContainer(LogContainer):
    assumptions = TypeAssumptions({".*": TimeType()})
    pattern = r"(?P<start>.\d+\:\d+:\d+).*started|(?P<end>.\d+\:\d+:\d+).*ends"
    representative = "times"

>>> log = MovieLog("/tmp/some.log")
>>> print(type(log.times.start))
<type 'datetime.time'>

A TypeAssumptions class has to be initialized with a dictionary defining patterns and their corresponding types. In our case we can expect that everything that was matched by our TimesContainer.pattern before will be a string of a valid H:M:S format. So we don't need a more precise pattern within our TypeAssumptions and can expect those string would always fulfill the criteria to be convertable by our TimeType object. The TypeAssumptions class always allows us to inherit existing assumptions from parent containers. This is set by default. You can ignore parent assumptions when initializing the TypeAssumptions class using inherit=False. This way you can avoid potential match conflicts when using more sloppy patterns.

But generally spoken your patterns should be as precise as possible when using them on containers that hold a bunch of sub-containers.

Custom Types

Native Type conversions might not be sufficient enough for you. There might be cases where you want to convert your extracted information to a more specific type. There are custom types that can help you doing that or you can write your own.

KeyValueType

TO BE CONTINUED

TimeType

This object doesn't need any extra information. It will check for a valid input string and return a datatime.time instance.

TwoDimensionalNumberArray

An object helpful to convert a string into an even sized two dimensional array with automatic float conversion for each item. It always expects a number named match group within the pattern.

Example:

>>> array_type_1 = TwoDimensionalNumberArray("(?P<number>-?\d+)", item_array_size=1)
>>> array_type_2 = TwoDimensionalNumberArray("(?P<number>-?\d+)", item_array_size=2)
>>> array_type_3 = TwoDimensionalNumberArray("(?P<number>-?\d+)", item_array_size=3)

>>> input = "1, 2, 4 -4, -10, 1"
>>> print(array_type_1(input))
>>> print(array_type_2(input))
>>> print(array_type_3(input))

[[1.0], [2.0], [4.0], [-4.0], [-10.0], [1.0]]
[[1.0, 2.0], [4.0, -4.0], [-10.0, 1.0]]
[[1.0, 2.0, 4.0], [-4.0, -10.0, 1.0]]

Versioning

Logmole follows semantic versioning.

Extensions

ArnoldMole - An Extension for the lovely Arnold Renderer

logmole
Release 0.9.0

Release 0.9.0

0.9.0

0.8.0

0.7.3

0.7.2

0.7.1

0.7.0

Documentation

Logmole

An Extendable and Versatile Logparsing System

Table of Contents

What can it do for you?

Installation

How to use

The LogContainer

Understand By Example

Patterns

Grouping Containers

Assumptions

Native Type Assumptions

Custom Type Assumptions

Custom Types

KeyValueType

TimeType

TwoDimensionalNumberArray

Versioning

Extensions

Stats

Releases

Contributors

logmole Release 0.9.0

Release 0.9.0 Toggle Dropdown 0.9.0 0.8.0 0.7.3 0.7.2 0.7.1 0.7.0

Documentation

Logmole

An Extendable and Versatile Logparsing System

Table of Contents

What can it do for you?

Installation

How to use

The LogContainer

Understand By Example

Patterns

Grouping Containers

Assumptions

Native Type Assumptions

Custom Type Assumptions

Custom Types

KeyValueType

TimeType

TwoDimensionalNumberArray

Versioning

Extensions

Stats

Releases

Contributors

logmole
Release 0.9.0

Release 0.9.0

0.9.0

0.8.0

0.7.3

0.7.2

0.7.1

0.7.0