pyRDDLGym: RDDL automatic generation tool for OpenAI Gym


Keywords
gym-environments, model-based, rddl, reinforcement-learning
License
MIT
Install
pip install pyRDDLGym==2.0

Documentation

pyRDDLGym

Warning

As of Feb 9, 2024, the pyRDDLGym API has been updated to version 2.0, and is no longer backwards compatible with the previous stable version 1.4.4. While we strongly recommend that you update to 2.0, in case you require the old API, you can install the last stable version with pip: pip install pyRDDLGym==1.4.4, or directly from github pip install git+https://github.com/pyrddlgym-project/pyRDDLGym@version_1.4.4_stable.

A Python toolkit for auto-generation of OpenAI Gym environments from Relational Dynamic Influence Diagram Language (RDDL) description files. This is currently the official parser, simulator and evaluation system for RDDL in Python, with new features and enhancements to the RDDL language.

Contents

Purpose and Benefits

Installation

We require Python 3.8+ and the following packages: ply, pillow>=9.2.0, numpy>=1.22, matplotlib>=3.5.0, gymnasium, pygame, termcolor. You can install our package, along with all of its prerequisites, using pip

pip install pyRDDLGym

Since pyRDDLGym does not come with any premade environments, you can either load RDDL documents from your local file system, or install rddlrepository for easy access to preexisting domains

pip install rddlrepository

Usage

Running the Example

pyRDDLGym comes with several run scripts as starting points for you to use in your own scripts. To simulate an environment, from the install directory of pyRDDLGym, type the following into a shell supporting the python command (you need rddlrepository):

python -m pyRDDLGym.examples.run_gym "Cartpole_Continuous" "0" 1

which loads instance "0" of the CartPole control problem with continuous actions from rddlrepository and simulates it with a random policy for one episode.

Loading an Environment

Instantiation of an existing environment by name is as easy as:

import pyRDDLGym
env = pyRDDLGym.make("Cartpole_Continuous", "0")

Loading your own domain files is just as straightforward

import pyRDDLGym
env = pyRDDLGym.make("/path/to/domain.rddl", "/path/to/instance.rddl")

Both versions above instantiate env as an OpenAI gym environment, so that the usual reset() and step() calls work as intended.

You can also pass custom settings to the make command, i.e.:

import pyRDDLGym
env = pyRDDLGym.make("Cartpole_Continuous", "0", enforce_action_constraints=True, ...)

Creating your Own Visualizer

You can design your own visualizer by subclassing from pyRDDLGym.core.visualizer.viz.BaseViz and overriding the render(state) method. Then, changing the visualizer of the environment is easy

viz_class = ...   # the class name of your custom viz
env.set_visualizer(viz_class)

Recording Movies

You can record an animated gif or movie of the agent interaction with an environment (described below). To do this, simply pass a MovieGenerator object to the set_visualizer method:

from pyRDDLGym.core.visualizer.movie import MovieGenerator
movie_gen = MovieGenerator("/path/where/to/save", "env_name")
env.set_visualizer(viz_class, movie_gen=movie_gen)

Interacting with an Environment

Agents map states to actions through the sample_action(obs) function, and can be used to interact with an environment. For example, to initialize a random agent:

from pyRDDLGym.core.policy import RandomAgent
agent = RandomAgent(action_space=env.action_space, num_actions=env.max_allowed_actions)

All agent instances support one-line evaluation in a given environment:

stats = agent.evaluate(env, episodes=1, verbose=True, render=True)

which returns a dictionary of summary statistics (e.g. "mean", "std", etc...), and which also visualizes the domain in real time. Of course, if you wish, the standard OpenAI gym interaction is still available to you:

total_reward = 0
state, _ = env.reset()
for step in range(env.horizon):
    env.render()
    action = agent.sample_action(state)
    next_state, reward, terminated, truncated, _ = env.step(action)
    print(f'state = {state}, action = {action}, reward = {reward}')
    total_reward += reward
    state = next_state
    done = terminated or truncated
    if done:
        break
print(f'episode ended with reward {total_reward}')

# release all viz resources, and finish logging if used
env.close()

Note

All observations (for a POMDP), states (for an MDP) and actions are represented by dict objects, whose keys correspond to the appropriate fluents as defined in the RDDL description. Here, the syntax is pvar-name___o1__o2..., where pvar-name is the pvariable name, followed by 3 underscores, and object parameters o1, o2... are separated by 2 underscores.

Warning

There are two known issues not documented with RDDL:

  1. the minus (-) arithmetic operation must have spaces on both sides, otherwise there is ambiguity whether it refers to a mathematical operation or to variables
  2. aggregation-union-precedence parsing requires for encapsulating parentheses around aggregations, e.g., (sum_{}[]).

Status

A complete archive of past and present RDDL problems, including all IPPC problems, is also available to clone\pip

Software for related simulators:

The parser used in this project is based on the parser from Thiago Pbueno's pyrddl (used in rddlgym).

Citing pyRDDLGym

Please see our paper describing pyRDDLGym. If you found this useful, please consider citing us:

@article{taitler2022pyrddlgym,
      title={pyRDDLGym: From RDDL to Gym Environments},
      author={Taitler, Ayal and Gimelfarb, Michael and Gopalakrishnan, Sriram and Mladenov, Martin and Liu, Xiaotian and Sanner, Scott},
      journal={arXiv preprint arXiv:2211.05939},
      year={2022}}

License

This software is distributed under the MIT License.

Contributors

  • Michael Gimelfarb (University of Toronto, CA)
  • Jihwan Jeong (University of Toronto, CA)
  • Sriram Gopalakrishnan (Arizona State University/J.P. Morgan, USA)
  • Martin Mladenov (Google, BR)
  • Jack Liu (University of Toronto, CA)