A tool for optimising systems parameterised by environment variables, based on Taguchi/orthogonal array systems of experiments.


License
MIT
Install
pip install taguchi==1.0.1

Documentation

taguchi

A tool for optimising systems parameterised by environment variables, based on Taguchi/orthogonal array systems of experiments.

PyPI - Version

Disclaimer

All that I know and understand about this technique is at best a lossy interpretation of this Youtube video by @Nighthawkinlight, and at worst a complete misunderstanding of the basic scientific method. The video does a better job at explaining than I can hope for, so go check it out to understand the background.

Installation

pip install taguchi

Usage Example

All that is needed to set up this type of experiment is to navigate to the directory where your tests can be executed, and create a file called taguchi.yaml. This file contains a command, and parameters that will vary the output of the command. For example:

In ./examples there is a taguchi.yaml file:

command: ipython test_function.py
PARAM_A:
  - 1
  - 2
  - 3
PARAM_B: 
  - -25
  - -20
  - -15
PARAM_C:
  - 8
  - 11
  - 14

This file tells taguchi to set PARAM_A, PARAM_B, and PARAM_C environment variables, with those values before running the command:

ipython test_function.py

taguchi will then run that command for various settings of those parameters, and collect the last numerical value in the printed output of the command. If we inspect ./examples/test_function.py we see:

import os

a = float(os.environ["PARAM_A"])
b = float(os.environ["PARAM_B"])
c = float(os.environ["PARAM_C"])

print("annoying print message")

f = (a-3.5)**2 + (b-(-20))**2 + (c-10)**2

print(f)

This script computes some number based on environment variables, and then prints the result to the standard output. Note that any printed numbers or strings before the last number are ignored.

Let's inspect the output of running taguchi from the command-line:

$ taguchi
PARAM_A     
           1 :    29.916667
           2 :    25.916667
           3 :    23.916667
PARAM_B     
         -25 :    34.916667
         -20 :     9.916667
         -15 :    34.916667
PARAM_C     
           8 :    23.583333
          11 :    20.583333
          14 :    35.583333

This print-out gives us an indication of the average performance for each parameter at each state level. If one were trying to minimise this function, setting PARAM_B to be close to -20 seems to be a good choice, since it seems to have the biggest impact from these tests.

The novelty of doing this using the taguchi method is that reasonably informative results can be obtained over far few experiment runs. For a full search of 3 parameters with 3 values each, one would need $27=3^3$ experiment runs, but using the taguchi method, it is done using only 9 experiments.

This library supports up to 20 parameters, with up to 5 states each. In that extreme case, the full search would contain $5^{20}\approx10^{14}$ experiments, whereas the taguchi method only requires 100 experiments. If each experiment only takes 1 second to run, then the former method would complete in about 3 million years, and the latter would take less than 2 minutes.

--dense

Sometimes efficiency is scary, so if you want to run the experiment using every possible combination of parameters and then collate the results, you can do so by running with the dense flag, e.g.,

taguchi --dense

which will run 27 experiments in the example case, rather than the efficient 9 experiments generated by the orthogonal array/Taguchi method.

Multiple .yaml files in the same directory

If you would like to have multiple taguchi.yaml files in the same directory, or if you would like to call a .yaml file from outside of the working directory, you can specify the config file in the taguchi call. By default, taguchi looks for ./taguchi.yaml, so the following two lines are equivalent:

taguchi
taguchi ./taguchi.yaml

or using your taguchi_file_with_a_different_name.yaml:

taguchi ./taguchi_file_with_a_different_name.yaml

Wishlist

This project is not without its shortcomings.

  • There is currently no capacity to have a different number of states in the taguchi.yaml file for each parameter. If you want 1 parameter to have 5 different state values, you must give all parameters 5 different state values (or 1, since it is not varied in that case).
  • The documentation is lacking (arguably non-existent).
  • There is no capacity for parallelisation of the experiments. This might be straightforward in some cases, but in my test cases I can only have one experiment per GPU and that complicates things.