A Practical Guide to Optimization under Uncertainty

Optimization under Uncertainty has lots of use cases in many real world and practical problems in the area of supply chain, transportation, retail, finance, etc. However, current available resources provide limited help to industry practitioners. If you look at some online resources on Two-stage Stochastic Programming, Dynamic Programming and Robust Optimization, you see they get too technical too soon which puts a burden on their applicability. In this guide, we are providing a simple tutorial on how to solve optimization problems (mainly linear and mixed integer linear programming) when we face some uncertainty. We also give a simple practical example along with sample codes in Python for reference.

Modeling Uncertainty

Before explaining possible approaches one may take to solve an optimization problem under uncertainty, let us refresh our memory on deterministic LP/MIP problems. A typical LP/MIP problem has a mathematical form like this:

\[ \left. \begin{array}{lll} \text{min}& c_1x_1+c_2x_2+\cdots+c_nx_n&\\ \text{s.t.}\\ &a_{11}x_{1}+a_{21}x_{2}+\cdots+a_{n1}x_{n}&\leq b_1\\ &\vdots&\vdots\\ &a_{1m}x_{1}+a_{2m}x_{2}+\cdots+a_{nm}x_{n}&\leq b_m\\ &x_1, x_2, \cdots, x_n \geq 0&\\ &\text{Some}\ x_i\text{s are Integer} \end{array} \right. \]

Here, $c_1, c_2, \cdots, c_n$ are objective coefficient parameters, $a_{11}, a_{12}, \cdots, a_{nm}$ are constraints coefficient parameters, and $b_{1}, b_{2}, \cdots, b_{m}$ are right hand side parameters. Either of these parameters or all of them can be uncertain or random, and you may have their probability distribution, or not. Depending on which set is random, one might adopt different approaches to model and solve the underlying problem. In general, there are four common approaches for optimization under uncertainty:

Robust Optimization Models

In these models, one considers the worst possible outcome and optimizes decisions based on that. Finding the worst possible outcome is itself an optimization problem which is called the subproblem and must be solved. As mentioned here, in robust optimization, the uncertainty is not stochastic, but rather deterministic and set-based. For example, assume that in the above optimization problem, constraint coefficients belong to an uncertainty set like:

\[ l_{ij}\leq a_{ij} \leq u_{ij}\quad i=1,2,\cdots,n,\quad j=1,2,\cdots,m \]

then, for $j=1,2,\cdots,m,$ we have a subproblem as:

\[ \left. \begin{array}{lll} \text{max}& \sum_{i=1}^{n}a_{ij}x_i&\\ \text{s.t.}\\ &\ a_{ij}\leq u_{ij}& i=1,2,\cdots,n\\ &-a_{ij}\leq -l_{ij}& i=1,2,\cdots,n, \end{array} \right. \]

and we want $\{\text{max}\ \sum_{i=1}^{n}a_{ij}x_i\}\leq b_j$ (since the constraint is a "$\leq$" constraint). Note that in this subproblem $a_{ij}$s are decision variables not $x_{i}$s. If we write the dual of this subproblem, we get:

\[ \left. \begin{array}{lll} \text{min}& \sum_{i=1}^{n}\lambda_{ij}u_{ij}-\sum_{i=1}^{n}\mu_{ij}l_{ij}&\\ \text{s.t.}\\ &x_{i} = \lambda_{ij} - \mu_{ij} & i=1,2,\cdots,n\\ &\lambda_{ij}\geq 0,\ \mu_{ij}\geq0& i=1,2,\cdots,n, \end{array} \right. \]

Notice that the objective function in the dual subproblem is independent of $x_{i}$s, and based on the strong duality that always holds for linear programs we can use $\{\text{min}\ \sum_{i=1}^{n}\lambda_{ij}u_{ij}-\sum_{i=1}^{n}\mu_{ij}l_{ij}\}\leq b_j$, instead of $\{\text{max}\ \sum_{i=1}^{n}a_{ij}x_i\}\leq b_j$. Therefore, we can write the best-worst (robust) version of the original optimization problem as:

\[ \left. \begin{array}{lll} \text{min}& \sum_{i=1}^{n}{c_ix_i}&\\ \text{s.t.}\\ &\sum_{i=1}^{n}{a_{ij}x_i}\leq b_j&j=1,2,\cdots, m\\ &\sum_{i=1}^{n}{\lambda_{ij}u_{ij}}-\sum_{i=1}^{n}{\mu_{ij}l_{ij}}\leq b_j&j=1,2,\cdots, m\\ &x_{i}-\lambda_{ij}+\mu_{ij} = 0&i=1,2,\cdots, n\quad j=1,2,\cdots, m\\ &x_{i}\geq 0 & i=1,2,\cdots, n\quad j=1,2,\cdots, m\\ &\lambda_{ij}\geq 0,\ \mu_{ij}\geq0& i=1,2,\cdots, n\quad j=1,2,\cdots, m\\ &\text{Some}\ x_i\text{s are Integer} \end{array} \right. \]

This problem is larger in size and its size grows polynomially in the dimensions of the uncertainty set. We won't go much into details in this tutorial. There are plenty of good resources online that interested readers may consult with.

Deterministic Equivalent Models
In these models, we replace the problem with uncertain parameters with a deterministic equivalent formulation. These models are generally easy to understand and some of them are practical. There are three common methods to obtain a deterministic equivalent model:

Estimate method

$$\textbf{min}\ c_1x_1+\cdots+c_nx_n$$

$c_1, c_2, \cdots, c_n$

iid

$\mu_i, i=1,2,\cdots,n$

$\sigma_{i}, i=1,2,\cdots,n$

$$\textbf{min}\ \mu_1x_1+\cdots+\mu_nx_n.$$

$$\textbf{min}\ (\mu_1+\mathcal{z}_{0.95}\sigma_1)x_1+ \cdots+(\mu_n+\mathcal{z}_{0.95}\sigma_n)x_n.$$

$\mu_i+\mathcal{z}_{0.95}\sigma_i$

$c_i$

Quantile Regression Forests

$c_1, c_2, \cdots, c_n$

$\mu=[\mu_i], i=1,2,\cdots,n$

$\Sigma=[\sigma_{ij}], i,j=1,2,\cdots,n$

$\mu_1x_1+\cdots+\mu_nx_n$

$\sum_{i=1}^{n}{\sigma_{ii}x^2_i}+2\sum_{i=1}^{n}\sum_{j=i+1}^{n}{\sigma_{ij}x_ix_j}$

$$\textbf{min}\ \mu_1x_1+\cdots+\mu_nx_n + \mathcal{z}_{0.95} \sqrt{\sum_{i=1}^{n}{\sigma_{ii}x^2_i}+2\sum_{i=1}^{n} \sum_{j=i+1}^{n}{\sigma_{ij}x_ix_j}},$$

reference

Expected violation penalty method

$$a_1x_1+\cdots+a_nx_n \leq b$$

$a_1, a_2, \cdots, a_n$

iid

$\mu_i, i=1,2,\cdots,n$

$\sigma_{i}, i=1,2,\cdots,n$

$$\mu_1x_1+\cdots+\mu_nx_n \leq b.$$

$\geq$

$$ (\mu_1+\mathcal{z}_{0.95}\sigma_1)x_1+ \cdots+(\mu_n+\mathcal{z}_{0.95}\sigma_n)x_n\leq b.$$

Chance constraint optimization

Chance constraint optimization

$$a_1x_1+\cdots+a_nx_n \leq b$$

$$Pr\{a_1x_1+\cdots+a_nx_n \leq b\}\geq 1-\epsilon.$$

$a_1, a_2, \cdots, a_n$

$E(a_i)=\mu_i, i\in \{1,2, \cdots, n\}$

$Var(a_i)=\sigma_{ii}, i\in \{1,2, \cdots, n\}$

$Cov(a_i, a_j)=\sigma_{ij}, i \neq j\in \{1,2, \cdots, n\}$

$$Pr\{a_1x_1+\cdots+a_nx_n \leq b\}\geq 1-\epsilon,$$

$$\sum_{i=1}^{n}{\mu_ix_i}+\Phi^{-1}(1-\epsilon)\sqrt{\sum_{i=1}^{n}\sum_{j=1}^{n}{x_ix_j\sigma_{ij}}} \leq b.$$

Second-order Conic Constraint

$a_1, a_2, \cdots, a_n$

$K$

$a_i = a^k_i, i\in \{1,2,\cdots,n\}, k\in\{1,2,\cdots,K\}$

$\pi_k, k\in\{1,2,\cdots,K\}.$

\[ \left. \begin{array}{lll} \sum_{i=1}^{n}{a_i^kx_i} - \textbf{M} z_k\leq b & k=1,2,\cdots,K\\ \sum_{k=1}^{K}{\pi_kz_k}\leq \epsilon&\\ z_k\in\{0,1\}&k=1,2,\cdots,K \end{array} \right. \]

$\sum_{k=1}^{K}{\pi_kz_k}\leq \epsilon$

$Pr\{a_1x_1+\cdots+a_nx_n \leq b\}\geq 1-\epsilon$

Expected violation penalty method

$a_1x_1+\cdots+a_nx_n \leq b$

$\delta = \text{max}\{0, a_1x_1+\cdots+a_nx_n -b\},$

$a_1x_1+\cdots+a_nx_n = b$

$\delta = \text{max}\{0, a_1x_1+\cdots+a_nx_n -b\} +\text{max}\{0, b - a_1x_1+\cdots+a_nx_n\}$

$a_i$s

$b$

\[ \left. \begin{array}{l} \text{min} \quad E[\delta]\\ \text{or}\\ E[\delta] \leq \epsilon \end{array} \right. \]

$E[\delta]$

$a_1x_1+\cdots+a_nx_n \leq b$

$b$

$\mu$

$\sigma^2$

\[ \left. \begin{array}{ll} E[\delta]&=E[\text{max}\{0, a_1x_1+\cdots+a_nx_n -b\}]\\ &= \int_{-\infty}^{a_1x_1+\cdots+a_nx_n}{(a_1x_1+\cdots+a_nx_n-b)f(b)db}\\ &= \sigma [\mathcal L(z)+z]\\ \end{array} \right. \]

$z = \frac{a_1x_1+\cdots+a_nx_n-\mu}{\sigma}$

$\mathcal L(z)$

Standard Normal Loss Function.

$E[\delta]$

$b$

$E[\delta]$

$b=b_k$

$Pr(b=b_k)=\pi_k$

$k=1,2,\cdots, K.$

\[ \min E[\text{max}\{0, a_1x_1+\cdots+a_nx_n -b\}]\approx \left\{ \begin{array}{ll} \min \sum_{k=1}^{K}{\pi_kv_k}&\\ \text{s.t.}&\\ v_k\geq 0&\forall k\\ v_k\geq a_1x_1+\cdots+a_nx_n -b_k&\forall k\\ \end{array} \right. \]

Recourse Models
In this approach generally you can take corrective actions when you observe the true value of an uncertain parameter. These models are also known as stochastic programming models which include two main models: Two-Stage Models and Multi-Stage Models.

Two-Stage Models

here-and-now

Multi-Stage Models

In this tutorial, we don't go into details of stochastic programming. Interested readers can consult with Lectures on Stochastic Programming by Shapiro et. al and/or these course materials by Jeff Linderoth.

Dynamic Programming

In Dynamic programming (DP), the assumption is that we are in some State, and based on the current state, we need to make a decision (Action), then randomness happens, and this randomness with some probability (Transition Probability) takes us to another state and gives us some reward based on our earlier decision. Now, we are in a new state and we need to make a new decision and go on. You can see that our decision is a function of the current state. This function that maps states to decisions is called Policy, and we want to find a policy that on average yields higher sums of rewards. Such policy is called Optimal Policy.
Optimal policy satisfies a set of equations known as Bellman Optimality Equations, and DP is an algorithm that finds the optimal policy based on Bellman equations in discrete state-action spaces.

A Simple Example

Here, we provide a simple numerical example to demonstrate what we have discussed so far. However, for the sake of simplicity, we just implement Expected violation penalty method. All input data can be found in ./codes/data directory in this github repository.
Consider a production plant which produces $N$ products ($n=1,2,\cdots,N$). This plant can either produce, or source products to satisfy customers' monthly demand ( $D_{nt}, \ t=1,2,\cdots,12$) for a year, and Demand is not known ahead of decision making. The production cost for product $n$ is $c_n,$ and this plant sells each product at price $p_{n}\geq c_{n}.$ In the case of shortage, this plant needs to source at a higher price $p^H_{n},$ and in the case of overage it has to sell at lower price $p^L_{n}\leq c_n.$ Product $n$ is produced with efficiency rate of $r_{n}\leq 1$ in this plant. Total production capacity in each month depends on the availability of raw material, and $L_{t}$ is the total monthly capacity in units. We are looking for the best set of sourcing and production decisions that maximizes total profit. Let $X_{nt}$, $U_{nt}$, and $O_{nt}$ be production, shortage and overage amounts for product $n$ at month $t$, respectively. Here is the mathematical formulation:

\[ \left. \begin{array}{lll} \text{max}& \sum_{n=1}^{N}\sum_{t=1}^{12}{(p_{n}-c_n)X_{nt}+(p^L_n-c_n)O_{nt}-p^H_nU_{nt}}&\\ \text{s.t.}\\ &\sum_{n=1}^{N}{r_{n}x_{nt}}\leq L_{t}&\forall t\\ &X_{nt}+U_{nt} - O_{nt}= D_{nt}&\forall t\quad \forall n\\ &O_{nt}, X_{nt}, U_{nt} \geq 0&\forall t\quad \forall n\\ \end{array} \right. \]

This problem is simple and it is worth mentioning that it is a version of News Vendor problem. If we want to use stochastic programming to solve this problem, we need to use Two-Stage model where $X_{nt}s$ are first stage (here-and-now) decision variables, and $O_{nt}s$ and $U_{nt}s$ are second stage (recourse) decision variables. Also, note that if we can hold inventory and use it to buffer against uncertainty, we have to use either Dynamic programming or Multi-Stage stochastic programming. However, assume that we are given a set of possible scenarios with their respective probabilities on what value demand of product $n$ at month $t$ might take. This means given scenario $k=1,2,\cdots, K$, we have demand value $d^k_{nt}$ with probability $\pi_k$ for product $n$ at month $t$. Therefore, we can write the deterministic equivalent of the above optimization problem as:

\[ \left. \begin{array}{lll} \text{max}& \sum_{n=1}^{N}\sum_{t=1}^{12}{[(p_{n}-c_n)X_{nt}+\sum_{k=1}^{K}{\pi_k((p^L_n-c_n)O^k_{nt}-p^H_nU^n_{nt}})]}&\\ \text{s.t.}\\ &\sum_{n=1}^{N}{r_{n}x_{nt}}\leq L_{t}&\forall t\\ &O^k_{nt}\geq X_{nt} - d^k_{nt}&\forall t\quad \forall n \quad \forall k\\ &U^k_{nt}\geq d^k_{nt} - X_{nt}&\forall t\quad \forall n \quad \forall k\\ &O^k_{nt}, X_{nt}, U^k_{nt} \geq 0&\forall t\quad \forall n \quad \forall k\\ \end{array} \right. \]

We use PuLp to model and solve a numerical instance of this problem. Please refer to /codes directory here for more details. In this directory, there are two Python files: optimization.py and run_analysis.py . The former one contains three classes for loading data, preparing data for optimization and building an optimization model for analysis. The latter one is acting like a main method, i.e., $python ./run_analysis.py will run the whole optimization given the right input data in ./codes/data/ directory.

In run_analysis.py, you first need to load data. Class DataLoader in optimization.py will take care of it for you. You just need to instantiate a DataLoader object with a list of all paths pointing to input CSVs. Next, we need to instantiate a Data object with a DataLoader object as an input. A Data object holds information such as parameters, sets, etc for optimization. Finally, we need to instantiate an Optimizer object to setup the optimization problem and solve it. It has a method called optimize which calls a solver to solve the optimization model. Here the default solver is CBC. Finally, if the solve is successful, we write optimal solutions as CSV files. Here is the code snippet of what we have outlined:

import glob import optimization list_all_input_csvs = glob.glob("./data/*.csv") dl = optimization.DataLoader(input_files=list_all_input_csvs) data = optimization.Data(dl) optimizer = optimization.Optimizer(data) status = optimizer.optimize(WriteLpFlag=True) if status == 1: output_df_dict = data.get_output_reports(optimizer) for key, df in output_df_dict.items(): out_name = 'optimizer_' + key df.to_csv(''.join(["./data/", out_name, '.csv']), index=False)

A Practical Guide to Optimization under Uncertainty

by Mohsen Moarefdoost, Ph.D.

Prerequisites

Modeling Uncertainty

A Simple Example