MazeRL is a development framework for building applied reinforcement learning systems, addressing real-world decision problems. It supports the complete development life cycle of RL applications, ranging from simulation engineering up to agent development, training and deployment.

applied-reinforcement-learning, reinforcement-learning, machine-learning, deep-learning, distributed, optimization, applied-machine-learning, automation, data-science, decision-making, documentation, framework, monitoring, python, simulation
pip install maze-rl==0.1.8


Language grade: Python PyPI PyPI - Python Version Maze Docker Image Read the Docs contributions welcome

Applied Reinforcement Learning with Python

MazeRL is an application oriented Deep Reinforcement Learning (RL) framework, addressing real-world decision problems. Our vision is to cover the complete development life cycle of RL applications ranging from simulation engineering up to agent development, training and deployment.

This is a preliminary, non-stable release of Maze. It is not yet complete and not all of our interfaces have settled yet. Hence, there might be some breaking changes on our way towards the first stable release.

Spotlight Features

Below we list a few selected Maze features.

  • Design and visualize your policy and value networks with the Perception Module. It is based on PyTorch and provides a large variety of neural network building blocks and model styles. Quickly compose powerful representation learners from building blocks such as: dense, convolution, graph convolution and attention, recurrent architectures, action- and observation masking, self-attention etc.
  • Create the conditions for efficient RL training without writing boiler plate code, e.g. by supporting best practices like pre-processing and normalizing your observations.
  • Maze supports advanced environment structures reflecting the requirements of real-world industrial decision problems such as multi-step and multi-agent scenarios. You can of course work with existing Gym-compatible environments.
  • Use the provided Maze trainers (A2C, PPO, Impala, SAC, Evolution Strategies), which are supporting dictionary action and observation spaces as well as multi-step (auto-regressive policies) training. Or stick to your favorite tools and trainers by combining Maze with other RL frameworks.
  • Out of the box support for advanced training workflows such as imitation learning from teacher policies and policy fine-tuning.
  • Keep even complex application and experiment configuration manageable with the Hydra Config System.

Get Started

  • You can try Maze without prior installation! We provide a series of Getting started notebooks to help you get familiar with Maze. These notebooks can be viewed and executed in Google Colab - just pick any of the included notebooks and click on the Colab button.

  • If you want to install Maze locally, make sure PyTorch is installed and then get the latest released version of Maze as follows:

    pip install -U maze-rl
    # optionally install RLLib if you want to use it in combination with Maze (currently pinned to the version 1.4.1)
    pip install ray[rllib]==1.4.1 tensorflow  

    Read more about other options like the installation of the latest development version.

    We encourage you to start with Python 3.7, as many popular environments like Atari or Box2D can not easily be installed in newer Python environments. Maze itself supports newer Python versions, but for Python 3.9 you might have to install additional binary dependencies manually

  • Alternatively you can work with Maze in a Docker container with pre-installed Jupyter lab: Run docker run -p 8888:8888 enliteai/maze:playground and open localhost:8888 in your browser. This loads Jupyter

  • To see Maze in action, check out a first example. Training and deploying your agent is as simple as can be:

    from maze.api.run_context import RunContext
    from maze.core.wrappers.maze_gym_env_wrapper import GymMazeEnv
    rc = RunContext(env=lambda: GymMazeEnv('CartPole-v0'), algorithm="ppo")
    # Run trained policy.
    env = GymMazeEnv('CartPole-v0')
    obs = env.reset()
    done = False
    while not done:
        action = rc.compute_action(obs)
        obs, reward, done, info = env.step(action)
  • Try your own Gym env or visit our Maze step-by-step tutorial.

First Example
First Example
Step by Step Tutorial

Learn more about Maze

The documentation is the starting point to learn more about the underlying concepts, but most importantly also provides code snippets and minimum working examples to get you started quickly.


Maze is freely available for research and non-commercial use. A commercial license is available, if interested please contact us on our company website or write us an email.

We believe in Open Source principles and aim at transitioning Maze to a commercial Open Source project, releasing larger parts of the framework under a permissive license in the near future.