sqfuffle

Quickly solves differentiable multi-objective optimization problems.


License
AGPL-3.0
Install
pip install sqfuffle==0.0.1

Documentation

square-fuffle

Alpha Stage -- API will change. Quickly solves differentiable multi-objective optimization problems.

Installation

Coming soon.

Purpose

Multi-objective solvers exist to help find compromises in the presence of multiple competing goals -- for example, maximizing profits while also maximizing customer satisfaction. These two goals are usually at odds to some extent (what customer wouldn't be happier if a business dropped its prices down to its break-even point?), and an MO solver allows you to consciously weigh the tradeoffs in different solutions.

square-fuffle is one of many techniques for solving MO problems. As a rule of thumb, you would choose square-fuffle over other solutions when the objectives are differentiable and when the problem is sufficiently complicated. High-dimensional input spaces are especially problematic for evolutionary MO solvers if any kind of precision is required, but they are straightforward for square-fuffle.

Examples

When trying to minimize two parabaloids, one situated on the origin and one at the all ones point, the only points that are locally non-dominated are those where y=x and which are directly between the two parabolic minima. Our create_single_objective function has a group of minima precisely in that location.

"""
Plots our two objectives with inputs in the unit square and colors them
according to the output of create_single_objective().
"""

from sqfuffle.objectives import create_single_objective
import jax.numpy as np
from jax import jit
import matplotlib.pyplot as plt

def f(x):
    return np.array([np.sum(x**2), np.sum((x-1)**2)])

points = [np.array([a, b]) for a in np.linspace(0, 1, 30) for b in np.linspace(0, 1, 30)]
x, y, c = (list(map(h, points)) for h in (lambda p: f(p)[0], lambda p: f(p)[1], jit(create_single_objective(f))))

plt.scatter(x, y, c=c)
plt.xlabel('$x^2 + y^2$')
plt.ylabel('$(x-1)^2 + (y-1)^2$')
plt.title('Pareto Front')
plt.xticks([])
plt.yticks([])
plt.axis('scaled')
plt.show()

Pareto Front

Caveats

  1. square-fuffle depends on jax for its differentiation, and that is exposed to the end user in that if they supply a function jax does not understand then square-fuffle will throw a number of exceptions. This isn't fundamentally necessary, and in a future iteration we will probably use jax internally and expose the ability for users to supply differentiable functions, but we will also expose hooks where vjp, jvp, or hessians can be provided directly (raw jacobians won't suffice without numerical approximation shenanigans).
  2. square-fuffle does NOT only return points from the Pareto front. In practice, high-precision points from the Pareto front along with points from some kinds of other interesting manifolds in the context of an MO problem are returned, and square-fuffle's results should be followed with a non-dominated filter. We might provide one in future API versions.

Theory

TLDR; The main idea is that being locally non-dominated is a constraint on the directional gradients, and since all of those can be represented as Jacobian-vector products this implies that the Jacobian is singular. For problems where singular Jacobians are rare this narrows the search space substantially.

Warning: Without some substantial re-working, the ideas here only work on differentiable MOO problems with at least 2 variables and objectives. We rely heavily on the Jacobian having all non-zero singular values with probability 1, so duplicated information in the jacobians (e.g. having one objective be a constant multiple of another) will prevent meaningful solutions from being found.

Trying to minimize multiple functions at once isn't all that different from just minimizing a single function. Recall how in ordinary calculus it suffices to restrict the solution set to places where the derivative is 0 or undefined. In multi-objective optimization a point can have the property of being locally non-dominated, and the only possible solutions to an MOO problem are places which are locally non-dominated and places where that notion is not defined.

As a quick recap, a point is locally non-dominated if no matter which direction you try to go, if you walk a little bit in that direction you can't find a solution that is at least as good in every coordinate and strictly better in at least one coordinate.

What's interesting about this is that "walking a little bit in a direction" can be represented via the directional gradient in that direction -- and that gradient is just the Jacobian multiplied by an appropriate unit vector. If a point is locally non-dominated then there are some directional gradients that just aren't possible (e.g. the all ones vector), which means our Jacobian has a non-trivial null space.

There are all kinds of reasons a Jacobian might have a non-trivial null space. In particular, if the objectives and variables aren't equal in number we're guaranteed to have a null singular value, but a linear dependance in the objectives or other subtler artifacts can suffice just as well. However, if the objectives and variables are equinumerous then the set of locally non-dominated points is a subset of the set of points yielding a singular Jacobian. Moreover, across the space of all differentiable functions this set will have measure 0 almost surely, so in some sense we drastically reduce the search space if we can find that set of points.

In practice, one easy way to find a singular Jacobian is to minimize the least singular value of the Jacobian. The singular values are non-negative, and the Jacobian is singular iff one of those is zero, so our problem can really be re-stated as finding roots/minima of the least singular value.

There is also the issue of undifferentiability in the presence of doubled singular values or any other abnormal behavior. That isn't always the deal-breaker it might seem. Undifferentiable eigenvalues behave similarly in many respects to the absolute value function in a higher coordinate system. Much like the standard 1D counterpart, functional composition can easily create situations where the components are not differentiable but the composition is computationally easy to work with. For example, squaring the absolute value just yields another quadratic which is easily differentiable.

Even in the event that our eigenvalues have tons of undifferentiable locations, they almost surely fall onto a set of measure 0. In practice, many minimization algorithms that assume differentiability in their underpinnings will perform perfectly satisfactorily on exactly this kind of performance landscape. In the event that poorly conditioned Hessians actually throw off your computations then it might be prudent to switch to a less opinionated solver.

Extended Theory

For problems with differing numbers of inputs and outputs, the Jacobian is always singular, and we need a way around that problem. The method we've chosen is to widen the search space to include some potentially extraneous results that only happen with probability 0.

In particular, a solution is locally non-dominated at a point iff it is locally non-dominated with respect to every directional derivative at that point. This state of affairs would certainly imply being non-dominated with respect to a restricted set of directional derivatives. We fix all but two coordinates and examine the Jacobian of the resulting partially applied function. This affords additional numerical precision in that the smallest eigenvalue has a simple closed form, and we can increase our confidence by simply repeating the procedure for more pairs of input coordinates. In general we're searching a larger solution space than before, but we've opened the technique to a much larger space of functions.

It does not suffice to only consider the input space -- reducing from n to 2. We still have the problem that the number of inputs with this theory must match the number of outputs, and if we reduce every input to a size of 2 then we need corresponding 2D outputs to get square 2x2 Jacobians. What we've chosen to do is exploit the fact that if a point is non-dominated, then there must be 2 output dimensions for which the point is non-dominated. Hence, we can compute the minimum eigen-score across all pairs of distinct output dimensions to give a condition which is necessary for a point to be locally non-dominated.

One thing to note is that the minimum behaves extremely poorly in optimization algorithms and does not have a continuous derivative. It is reasonable to use other similar functions with smoother, nicer characteristics to create more easily differentiable objectives. We supply a variety of those to choose from in sqfuffle.minimizers.