<!– –>

*Figure 1: Offline Model-Based Optimization (MBO): The goal of offline MBO is to optimize an unknown objective function $f(x)$ with respect to $x$, provided access to only as static, previously-collected dataset of designs.*

Machine learning methods have shown tremendous promise on prediction problems: predicting the efficacy of a drug, predicting how a protein will fold, or predicting the strength of a composite material. But can we use machine learning for design? Conventionally, such problems have been tackled with black-box optimization procedures that repeatedly query an objective function. For instance, if designing a drug, the algorithm will iteratively modify the drug, test it, then modify it again. But when evaluating the efficacy of a candidate design involves conducting a real-world experiment, this can quickly become prohibitive. An appealing alternative is to create designs from data. Instead of requiring active synthesis and querying, can we devise a method that simply examines a large dataset of previously tested designs (e.g., drugs that have been evaluated before), and comes up with a new design that is better? We call this **offline model-based optimization (offline MBO)**, and in this post, we discuss offline MBO methods and some recent advances.

# Offline Model-Based Optimization (Offline MBO)

Formally, the goal in offline model-based optimization is to maximize a black-box objective function $f(x)$ with respect to its input $x$, where the access to the true objective function is not available. Instead, the algorithm is provided access to a static dataset $mathcal{D} = {(x_i, y_i)}$ of designs $x_i$ and corresponding objective values $y_i$. The algorithm consumes this dataset and produces an optimized candidate design, which is evaluated against the true objective function. Abstractly, the objective for offline MBO can be written as $argmax_{x = mathcal{A}(D)} f(x)$, where $x = mathcal{A}(D)$ indicates the design $x$ is a function of our dataset $mathcal{D}$.

## What makes offline MBO challenging?

The offline nature of the problem prevents the algorithm from querying the ground truth objective, which makes the offline MBO problem much more difficult than the online counterpart. One obvious approach to tackle an offline MBO problem is to learn a model $hat{f}(x)$ of the objective function using the dataset, and then applying methods from the more standard online optimization problem by treating the learned objective model as the true objective.

*Figure 2: Overestimation at unseen inputs in the naive objective model fools the optimizer. Our conservative model prevents overestimation, and mitigates the optimizer from finding bad designs with erroneously high values.*

However, this generally does not work: optimizing the design against the learned proxy model will produce **out-of-distribution** designs that “fool” the learned objective model into outputting a high value, similar to adversarial examples (see Figure 2 for an illustration). This is because that the learned model is trained on the dataset and therefore is only accurate for **in-distribution** designs. A naive strategy to address this out-of-distribution issue is to constrain the design to stay close to the data, but this is also problematic, since in order to produce a design that is better than the best training point, it is usually necessary to deviate from the training data, at least somewhat. Therefore, the conflict between the need to remain close to the data to avoid out-of-distribution inputs and the need to deviate from the data to produce better designs is one of the core challenges of offline MBO. This challenge is often exacerbated in real-world settings by the high dimensionality of the design space and the sparsity of the available data. A good offline MBO method needs to carefully balance these two sides, producing optimized designs that are good, but not too far from the data distribution.

## What prevents offline MBO from simply copying over the best design in the dataset?

One of the fundamental requirements for any effective offline MBO method is that it must improve over the best design observed in the training dataset. If this requirement is not met, one could simply return the best design from the dataset, without needing to run any kind of learning algorithm. When is such an improvement achievable in offline MBO problems? Offline MBO methods can improve over the best design in the dataset when the underlying design space exhibits “compositional structure”. For gaining intuition, consider an example, where the objective function can be represented as a sum of functions of independent partitions of the design variables, i.e., $f(x) = f_1(x[1]) + f_2(x[2]) + cdots + f_N(x[N]))$, where $x[1], cdots, x[N]$ denotes disjoint subsets of design variables $x$. The dataset of the offline MBO problem contains optimal design variable for each partition, but not the combination. If an algorithm can identify the compositional structure of the problem, it would be able to combine the optimal design variable for each partition together to obtain overall optimal design and therefore improving the performance over the best design in the dataset. To better demonstrate this idea, we created a toy problem in 2 dimensions and applied a naive MBO method that learns a model of the objective function via supervised regression, and then optimizes the learned estimate, as shown in the figure below. We can clearly see that the algorithm obtains the combined optimal $x$ and $y$, outperforming the best design in the dataset.

*Figure 3: Offline MBO finds designs better than the best in the observed dataset by exploiting compositional structure of the objective function $f(x, y) = -x^2 – y^2$ . Left: datapoints in a toy quadratic function MBO task over 2D space with optimum at $(0,0)$ in blue, MBO found design in red. Right: Objective value for optimal design is much higher than that observed in the dataset.*

# Prior Algorithms for Offline MBO

Given an offline dataset, the obvious starting point is to learn a model $hat{f}_theta(x)$ of the objective function from the dataset. Most offline MBO methods would indeed employ some form of learned model $hat{f}_theta(x)$ trained on the dataset to predict the objective value and guide the optimization process. As discussed previously, a very simple and naive baseline for offline MBO is to treat $hat{f}_theta(x)$ as the proxy to the true objective model and use **gradient ascent** to optimize $hat{f}_theta(x)$ with respect to $x$. However, this method often fails in practice, as gradient ascent can easily find designs that “fool” the model to predict a high objective value, similar to how adversarial examples are generated. Therefore, a successful approach using the learned model must prevent out-of-distribution designs that cause the model to overestimate the objective values, and the prior works have adopted different strategies to accomplish this.

A straightforward idea for preventing out-of-distribution data is to explicitly model the data distribution and constraint our designs to be within the distribution. Often the data distribution modeling is done via a generative model. CbAS and Autofocusing CbAS use a variational auto-encoder to model the distribution of designs, and MINs use a conditional generative adversarial network to model the distribution of designs conditioned on the objective value. However, generative modeling is a difficult problem. Furthermore, in order to be effective, generative models must be accurate near the tail ends of the data distribution as offline MBO must deviate from being close to the dataset to find improved designs. This imposes a strong feasibility requirement on such generative models.

# Conservative Objective Models

Can we devise an offline MBO method that does not utilize generative models, but also avoids the problems with the naive gradient-ascent based MBO method? To prevent this simple gradient ascent optimizer from getting “fooled” by the erroneously high values $hat{f}_theta(x)$ at out-of-distribution inputs, our approach, conservative objective models (COMs) performs a simple modification to the naive approach of training a model of the objective function. Instead of training a model $hat{f}_theta(x)$ via standard supervised regression, COMs applies an additional regularizer that minimizes the value of the learned model $hat{f}_theta(x^-)$ on *adversarial* designs $x^-$ that are likely to attain erroneously overestimated values. Such adversarial designs are the ones that likely appear falsely optimistic under the learned model, and by minimizing their values $hat{f}_theta(x^-)$, COMs prevents the optimizer from finding poor designs. This procedure superficially resembles a form of adversarial training.

**How can we obtain such adversarial designs** $x^-$? A straightforward approach for finding such adversarial designs is by running the optimizer”which will be used to finally obtain optimized designs after training on a partially trained function $hat{f}_theta$. For example, in our experiments on continuous-dimensional design spaces, we utilize a gradient-ascent optimizer, and hence, run a few iterations of gradient ascent on a given snapshot of the learned function to obtain $x^-$. Given these designs, the regularizer in COMs pushes down the learned value $hat{f}_theta(x^-)$. To counter balance this push towards minimizing function values, COMs also additionally maximizes the learned $hat{f}_theta(x)$ on the designs observed in the dataset, $x sim mathcal{D}$, for which the ground truth value of $f(x)$ is known. This idea is illustratively depicted below.

*Figure 4: A schematic procedure depicting training in COMs: COM performs supervised regression on the training data, pushes down the value of adversarially generated designs and counterbalances the effect by pushing up the value of the learned objective model on the observed datapoints*

Denoting the samples found by running gradient-ascent in the inner loop as coming from a distribution $mu(x)$, the training objective for COMs is given by:

[theta^* leftarrow arg min_theta {alpha left(mathbb{E}_{x^- sim mu(x)}[hat{f}_theta(x^-)] – mathbb{E}_{x sim mathcal{D}}[hat{f}_theta(x)] right)} + frac{1}{2} mathbb{E}_{(x, y) sim mathcal{D}} [(hat{f}_theta(x) – y)^2].]This objective can be implemented as shown in the following (python) code snippet:

```
def mine_adversarial(x_0, current_model):
x_i = x_0
for i in range(T):
# gradient of current_model w.r.t. x_i
x_i = x_i + grad(current_model, x_i)
return x_i
def coms_training_loss(x, y):
mse_loss = (model(x) - y)**2
regularizer = model(mine_adversarial(x, model)) - model(x)
return mse_loss * 0.5 + alpha * regularizer
```

Non-generative offline MBO methods can also be designed in other ways. For example, instead of training a conservative model as in COMs, we can instead train model to capture uncertainty in the predictions of a standard model. One example of this is NEMO, which uses a normalized maximum likelihood (NML) formulation to provide uncertainty estimates.

# How do COMs Perform in Practice?

We evaluated COMs on a number of design problems in biology (designing a GFP protein to maximize fluorescence, designing DNA sequences to maximize binding affinity to various transcription factors), materials design (designing a superconducting material with the highest critical temperature), robot morphology design (designing the morphology of Dâ€™Kitty and Ant robots to maximize performance) and robot controller design (optimizing the parameters of a neural network controller for the Hopper domain in OpenAI Gym). These tasks consist of domains with both discrete and continuous design spaces and span both low and high-dimensional tasks. We found that COMs outperform several prior approaches on these tasks, a subset of which is shown below. Observe that COMs consistently find a better design than the best in the dataset, and outperforms other generative modeling based prior MBO approaches (MINs, CbAS, Autofocusing CbAS) that pay a price for modeling the manifold of the design space, especially in problems such as Hopper Controller ($geq 5000$ dimensions).

*Table 1: Comparing the performance of COMs with prior offline MBO methods. Note that COMs generally outperform prior approaches, including those based on generative models, which especially struggle in high-dimensional problems such as Hopper Controller.*

Empirical results on other domains can be found in our paper. To conclude our discussion of empirical results, we note that a recent paper devises an offline MBO approach to optimize hardware accelerators in a real hardware-design workflow, building on COMs. As shown in Kumar et al. 2021 (Tables 3, 4), this COMs-inspired approach finds better designs than various prior state-of-the-art online MBO methods that access the simulator via time-consuming simulation. While, in principle, one can always design an online method that should perform better than any offline MBO method (for example, by wrapping an offline MBO method within an active data collection strategy), good performance of offline MBO methods inspired by COMs indicates the efficacy and the potential of offline MBO approaches in solving design problems.

# Discussion, Open Problems and Future Work

While COMs present a simple and effective approach for tackling offline MBO problems, there are several important open questions that need to be tackled. Perhaps the most straightforward open question is to devise better algorithms that combine the benefits of both generative approaches and COMs-style conservative approaches. Beyond algorithm design, perhaps one of the most important open problems is designing effective **cross-validation strategies:** in supervised *prediction* problems, a practitioner can adjust model capacity, add regularization, tune hyperparameters and make design decisions by simply looking at validation performance. Improving the validation performance will likely also improve the test performance because validation and test samples are distributed identically and generalization guarantees for ERM theoretically quantify this. However, such a workflow cannot be applied directly to offline MBO, because cross-validation in offline MBO requires assessing the accuracy of counterfactual predictions under distributional shift. Some recent work utilizes practical heuristics such as validation performance computed on a held-out dataset consisting of only â€œspecialâ€ designs (e.g., only the top-k best designs) for cross-validation of COMs-inspired methods, which seems to perform reasonably well in practice. However, it is not clear that this is the optimal strategy one can use for cross-validation. We expect that much more effective strategies can be developed by understanding the effects of various factors (such as the capacity of the neural network representing $hat{f}_theta(x)$, the hyperparameter $alpha$ in COMs, etc.) on the dynamics of optimization of COMs and other MBO methods.

Another important open question is **characterizing properties of datasets and data distributions** that are amenable to effective offline MBO methods. The success of deep learning indicates that not just better methods and algorithms are required for good performance, but that the performance of deep learning methods heavily depends on the data distribution used for training. Analogously, we expect that the performance of offline MBO methods also depends on the quality of data used. For instance, in the didactic example in Figure 3, no improvement could have been possible via offline MBO if the data were localized along a thin line parallel to the x-axis. This means that understanding the relationship between offline MBO solutions and the data-distribution, and effective dataset design based on such principles is likely to have a large impact. We hope that research in these directions, combined with advances in offline MBO methods, would enable us to solve challenging design problems in various domains.

* We thank Sergey Levine for valuable feedback on this post. We thank Brandon Trabucco for making Figures 1 and 2 of this post. This blog post is based on the following paper:*

**Conservative Objective Models for Effective Offline Model-Based Optimization**

Brandon Trabucco*, Aviral Kumar*, Xinyang Geng, Sergey Levine.

*In International Conference on Machine Learning (ICML), 2021.* arXiv code website

Short descriptive video: https://youtu.be/bMIlHl3KIfU