gato-toolkit: An open-source toolkit for AI alignment

NOTE: This project is unstable, you should consider all functionality experimental and subject to change without warning.

This project is intended to further research in AI alignment and the control problem. In particular, the approach adopted here is inspired by the GATO Framework, a comprehensive methodology for promoting positive intentions in AI systems worldwide.

As this is an ongoing effort, the GATO Toolkit will evolve along with the research. In this current iteration, the focus is on dataset generation and model alignment.

Capabilities

Come up with new scenarios to test

You can generate all kinds of scenarios ranging from inconsequential personal problems to catastrophic global disasters. These scenarios serve as the basis for new investigations.

Determine an appropriate action for any scenario

Once you've got a scenario, you can ask the model how it would attempt to handle the situation.

Compare different actions to see which is most aligned

Given a particular scenario, you can provide a number of different possible actions to see which one the model believes is best aligned with the heuristic imperatives.

Evaluate the suitability of an action based on its consequences

Given a particular scenario, action, and result, you can ask the model to assess the effectiveness of that action and reflect on the repercussions of that action.

Break actions down into manageable tasks

Starting with a broad action plan, you can have the model break things up into a list of tasks that would be needed to execute that plan.

Additional Resources

Learn more about heuristic imperatives.

gato-toolkit
Release 0.2.0

Release 0.2.0

0.1.0

0.1.0rc1

0.1.0rc2

0.1.0rc3

0.2.0

0.3.0

0.3.1

Documentation

gato-toolkit: An open-source toolkit for AI alignment

Capabilities

Come up with new scenarios to test

Determine an appropriate action for any scenario

Compare different actions to see which is most aligned

Evaluate the suitability of an action based on its consequences

Break actions down into manageable tasks

Additional Resources

Stats

Development practices

Releases

Contributors

gato-toolkit Release 0.2.0

Release 0.2.0 Toggle Dropdown 0.1.0 0.1.0rc1 0.1.0rc2 0.1.0rc3 0.2.0 0.3.0 0.3.1

Documentation

gato-toolkit: An open-source toolkit for AI alignment

Capabilities

Come up with new scenarios to test

Determine an appropriate action for any scenario

Compare different actions to see which is most aligned

Evaluate the suitability of an action based on its consequences

Break actions down into manageable tasks

Additional Resources

Stats

Development practices

Releases

Contributors

gato-toolkit
Release 0.2.0

Release 0.2.0

0.1.0

0.1.0rc1

0.1.0rc2

0.1.0rc3

0.2.0

0.3.0

0.3.1