A Practical Guide to Optimization under Uncertainty

by Mohsen Moarefdoost, Ph.D.

Optimization under Uncertainty has lots of use cases in many real world and practical problems in the area of supply chain, transportation, retail, finance, etc. However, current available resources provide limited help to industry practitioners. If you look at some online resources on Two-stage Stochastic Programming, Dynamic Programming and Robust Optimization, you see they get too technical too soon which puts a burden on their applicability. In this guide, we are providing a simple tutorial on how to solve optimization problems (mainly linear and mixed integer linear programming) when we face some uncertainty. We also give a simple practical example along with sample codes in Python for reference.

Prerequisites

We assume a reader of this tutorial is familiar with basics of Linear and Mixed Integer Programming, and knows what Decision Variables, Constraints, Objective Function, and Non-negativity restrictions are.

Modeling Uncertainty

Before explaining possible approaches one may take to solve an optimization problem under uncertainty, let us refresh our memory on deterministic LP/MIP problems. A typical LP/MIP problem has a mathematical form like this:

\[ \left. \begin{array}{lll} \text{min}& c_1x_1+c_2x_2+\cdots+c_nx_n&\\ \text{s.t.}\\ &a_{11}x_{1}+a_{21}x_{2}+\cdots+a_{n1}x_{n}&\leq b_1\\ &\vdots&\vdots\\ &a_{1m}x_{1}+a_{2m}x_{2}+\cdots+a_{nm}x_{n}&\leq b_m\\ &x_1, x_2, \cdots, x_n \geq 0&\\ &\text{Some}\ x_i\text{s are Integer} \end{array} \right. \]

Here, $c_1, c_2, \cdots, c_n$ are objective coefficient parameters, $a_{11}, a_{12}, \cdots, a_{nm}$ are constraints coefficient parameters, and $b_{1}, b_{2}, \cdots, b_{m}$ are right hand side parameters. Either of these parameters or all of them can be uncertain or random, and you may have their probability distribution, or not. Depending on which set is random, one might adopt different approaches to model and solve the underlying problem. In general, there are four common approaches for optimization under uncertainty:

  1. Robust Optimization Models
  2. In these models, one considers the worst possible outcome and optimizes decisions based on that. Finding the worst possible outcome is itself an optimization problem which is called the subproblem and must be solved. As mentioned here, in robust optimization, the uncertainty is not stochastic, but rather deterministic and set-based. For example, assume that in the above optimization problem, constraint coefficients belong to an uncertainty set like:
    \[ l_{ij}\leq a_{ij} \leq u_{ij}\quad i=1,2,\cdots,n,\quad j=1,2,\cdots,m \]
    then, for $j=1,2,\cdots,m,$ we have a subproblem as:
    \[ \left. \begin{array}{lll} \text{max}& \sum_{i=1}^{n}a_{ij}x_i&\\ \text{s.t.}\\ &\ a_{ij}\leq u_{ij}& i=1,2,\cdots,n\\ &-a_{ij}\leq -l_{ij}& i=1,2,\cdots,n, \end{array} \right. \]
    and we want $\{\text{max}\ \sum_{i=1}^{n}a_{ij}x_i\}\leq b_j$ (since the constraint is a "$\leq$" constraint). Note that in this subproblem $a_{ij}$s are decision variables not $x_{i}$s. If we write the dual of this subproblem, we get:
    \[ \left. \begin{array}{lll} \text{min}& \sum_{i=1}^{n}\lambda_{ij}u_{ij}-\sum_{i=1}^{n}\mu_{ij}l_{ij}&\\ \text{s.t.}\\ &x_{i} = \lambda_{ij} - \mu_{ij} & i=1,2,\cdots,n\\ &\lambda_{ij}\geq 0,\ \mu_{ij}\geq0& i=1,2,\cdots,n, \end{array} \right. \]
    Notice that the objective function in the dual subproblem is independent of $x_{i}$s, and based on the strong duality that always holds for linear programs we can use $\{\text{min}\ \sum_{i=1}^{n}\lambda_{ij}u_{ij}-\sum_{i=1}^{n}\mu_{ij}l_{ij}\}\leq b_j$, instead of $\{\text{max}\ \sum_{i=1}^{n}a_{ij}x_i\}\leq b_j$. Therefore, we can write the best-worst (robust) version of the original optimization problem as:
    \[ \left. \begin{array}{lll} \text{min}& \sum_{i=1}^{n}{c_ix_i}&\\ \text{s.t.}\\ &\sum_{i=1}^{n}{a_{ij}x_i}\leq b_j&j=1,2,\cdots, m\\ &\sum_{i=1}^{n}{\lambda_{ij}u_{ij}}-\sum_{i=1}^{n}{\mu_{ij}l_{ij}}\leq b_j&j=1,2,\cdots, m\\ &x_{i}-\lambda_{ij}+\mu_{ij} = 0&i=1,2,\cdots, n\quad j=1,2,\cdots, m\\ &x_{i}\geq 0 & i=1,2,\cdots, n\quad j=1,2,\cdots, m\\ &\lambda_{ij}\geq 0,\ \mu_{ij}\geq0& i=1,2,\cdots, n\quad j=1,2,\cdots, m\\ &\text{Some}\ x_i\text{s are Integer} \end{array} \right. \]
    This problem is larger in size and its size grows polynomially in the dimensions of the uncertainty set. We won't go much into details in this tutorial. There are plenty of good resources online that interested readers may consult with.

  3. Deterministic Equivalent Models
    In these models, we replace the problem with uncertain parameters with a deterministic equivalent formulation. These models are generally easy to understand and some of them are practical. There are three common methods to obtain a deterministic equivalent model:
    1. Estimate method
    2. This is the simplest approach to deal with uncertain parameters in a mathematical optimization problem where we set a deterministic value for each uncertain parameter. This deterministic value can be their mean, their quantile estimation, or any other statistical measures depending on business requirements. For example, assume that a minimization objective function like $$\textbf{min}\ c_1x_1+\cdots+c_nx_n$$ has uncertain coefficients, and $c_1, c_2, \cdots, c_n$ are random iid normal variables with mean $\mu_i, i=1,2,\cdots,n$, and standard deviation $\sigma_{i}, i=1,2,\cdots,n$. Here, one can ignore variability and replace uncertain objective function with $$\textbf{min}\ \mu_1x_1+\cdots+\mu_nx_n.$$ Or, one might be interested in minimizing when uncertain parameters are at 95%-quantile (maximizing 5%-quantile when objective function is maximization) which leads to $$\textbf{min}\ (\mu_1+\mathcal{z}_{0.95}\sigma_1)x_1+ \cdots+(\mu_n+\mathcal{z}_{0.95}\sigma_n)x_n.$$ Here, $\mu_i+\mathcal{z}_{0.95}\sigma_i$ is the 95%-quantile of the uncertain parameter $c_i$. This 95%-quantile can be derived analytically as what we showed above, or be a non-parametric estimate via a Machine Learning algorithm such as Quantile Regression Forests.
      Note that in the above formulation, we consider 95%-quantile values for each coefficient individually. One may instead want to minimize 95%-quantile of the objective function, which is nonlinear. Assume that $c_1, c_2, \cdots, c_n$ have a multivariate normal distribution with mean vector $\mu=[\mu_i], i=1,2,\cdots,n$, and covariance matrix $\Sigma=[\sigma_{ij}], i,j=1,2,\cdots,n$. Therefore, the above objective function will be normally distributed with mean $\mu_1x_1+\cdots+\mu_nx_n$ and variance $\sum_{i=1}^{n}{\sigma_{ii}x^2_i}+2\sum_{i=1}^{n}\sum_{j=i+1}^{n}{\sigma_{ij}x_ix_j}$. Then, objective function for minimizing 95%-quantile is $$\textbf{min}\ \mu_1x_1+\cdots+\mu_nx_n + \mathcal{z}_{0.95} \sqrt{\sum_{i=1}^{n}{\sigma_{ii}x^2_i}+2\sum_{i=1}^{n} \sum_{j=i+1}^{n}{\sigma_{ij}x_ix_j}},$$ which is not linear anymore. This reference provides more technical details of how one can solve this optimization problem.

      Above reformulations are for cases where objective function is uncertain. However, there are many problems in practice that we have uncertainty in constraints. Here, we explain how to deal with inequality constraints. For equality constraint, this method does not produce meaningful deterministic formulation. Expected violation penalty method which we explain later is a better choice when we have uncertainty in equality constraints. Now, assume an inequality constraint like $$a_1x_1+\cdots+a_nx_n \leq b$$ has uncertain coefficients, and $a_1, a_2, \cdots, a_n$ are random iid normal variables with mean $\mu_i, i=1,2,\cdots,n$, and standard deviation $\sigma_{i}, i=1,2,\cdots,n$. Again, one may ignore variability and replace uncertain parameters with their mean and get $$\mu_1x_1+\cdots+\mu_nx_n \leq b.$$ Or, one might be interested in cases where this constraint is satisfied when uncertain parameters are at 95%-quantile (5%-quantile for $\geq$ inequality sign) which leads to $$ (\mu_1+\mathcal{z}_{0.95}\sigma_1)x_1+ \cdots+(\mu_n+\mathcal{z}_{0.95}\sigma_n)x_n\leq b.$$ Same as what we observe for uncertain objective function where we are interested in 95%-quantile of the constraint expression, the formulation is different. In the next section (Chance constraint optimization), we explain how to deal with such cases.

    3. Chance constraint optimization
    4. In this approach, we estimate or guess the highly possible outcome of random parameters. Our goal is to find an optimal solution while allowing a subset of the constraints to be violated at an acceptable confidence level, or achieving an acceptable level of reliability. For an inequality constraint like $$a_1x_1+\cdots+a_nx_n \leq b$$ we want the following equation holds: $$Pr\{a_1x_1+\cdots+a_nx_n \leq b\}\geq 1-\epsilon.$$ We need to translate this probability constraint into its deterministic equivalent. If we know probability distribution of the parameter values ($a_1, a_2, \cdots, a_n$), we can write the deterministic formulation, however, the deterministic formulation is not always convex. It is convex if the probability distribution of the parameters is symmetric (like normal distribution), or positively skewed (like exponential distribution). Lets assume that $a_1, a_2, \cdots, a_n$ are random normal variables, and $E(a_i)=\mu_i, i\in \{1,2, \cdots, n\}$, $Var(a_i)=\sigma_{ii}, i\in \{1,2, \cdots, n\}$, and $Cov(a_i, a_j)=\sigma_{ij}, i \neq j\in \{1,2, \cdots, n\}$. Then, the deterministic equivalent of $$Pr\{a_1x_1+\cdots+a_nx_n \leq b\}\geq 1-\epsilon,$$ is $$\sum_{i=1}^{n}{\mu_ix_i}+\Phi^{-1}(1-\epsilon)\sqrt{\sum_{i=1}^{n}\sum_{j=1}^{n}{x_ix_j\sigma_{ij}}} \leq b.$$ As you can see, this constraint is not linear anymore (This constraint is now a Second-order Conic Constraint). You can see even with the simplest assumption, the resulting formulation is hard. Another way is to use MIP-approximation to obtain a deterministic formulation for that probability constraint. Assume that you don't know the probability distribution of the parameter values ($a_1, a_2, \cdots, a_n$), but you can sample large enough scenarios. Assume that we have $K$ scenarios and $a_i = a^k_i, i\in \{1,2,\cdots,n\}, k\in\{1,2,\cdots,K\}$ with probability $\pi_k, k\in\{1,2,\cdots,K\}.$ Thus, the deterministic equivalent formulation is
      \[ \left. \begin{array}{lll} \sum_{i=1}^{n}{a_i^kx_i} - \textbf{M} z_k\leq b & k=1,2,\cdots,K\\ \sum_{k=1}^{K}{\pi_kz_k}\leq \epsilon&\\ z_k\in\{0,1\}&k=1,2,\cdots,K \end{array} \right. \]
      Here, M is a big number, and each binary variable takes the value of 0 if the corresponding constraint is satisfied, and 1 otherwise. Also, $\sum_{k=1}^{K}{\pi_kz_k}\leq \epsilon$ is a knapsack constraint to ensure that $Pr\{a_1x_1+\cdots+a_nx_n \leq b\}\geq 1-\epsilon$ is satisfied.

    5. Expected violation penalty method
    6. This method is well suited for problems where uncertainty is present in constraints. In the case of uncertain constraints, we are interested in containing or minimizing their violations rather than making sure if they are satisfied or not. In other word, uncertain constraints are treated as soft constraints.
      For an inequality constraint like $a_1x_1+\cdots+a_nx_n \leq b$ violation is defined as $\delta = \text{max}\{0, a_1x_1+\cdots+a_nx_n -b\},$
      and for an equality constraint like $a_1x_1+\cdots+a_nx_n = b$ violation is defined as $\delta = \text{max}\{0, a_1x_1+\cdots+a_nx_n -b\} +\text{max}\{0, b - a_1x_1+\cdots+a_nx_n\}$.
      In the presence of uncertain parameters, either on $a_i$s or $b$, we may be interested in minimizing the expected value of violation or making sure the violation's expected value is bounded, i.e.,
      \[ \left. \begin{array}{l} \text{min} \quad E[\delta]\\ \text{or}\\ E[\delta] \leq \epsilon \end{array} \right. \]
      Here we need to find analytical form of $E[\delta]$. But it is usually a difficult task and in some cases impossible. Also there is no guarantee of linearity even if you can find the right analytical form. Lets explain it with a simple example where we are dealing with an inequality constraint with uncertain right hand side, i.e., we have $a_1x_1+\cdots+a_nx_n \leq b$ where $b$ is uncertain. For simplicity assume $b$ is normally distributed with mean $\mu$ and variance $\sigma^2$. Therefore,
      \[ \left. \begin{array}{ll} E[\delta]&=E[\text{max}\{0, a_1x_1+\cdots+a_nx_n -b\}]\\ &= \int_{-\infty}^{a_1x_1+\cdots+a_nx_n}{(a_1x_1+\cdots+a_nx_n-b)f(b)db}\\ &= \sigma [\mathcal L(z)+z]\\ \end{array} \right. \]
      where $z = \frac{a_1x_1+\cdots+a_nx_n-\mu}{\sigma}$ and $\mathcal L(z)$ is the Standard Normal Loss Function. It is clear that even in the simplest case of single variable normal distribution, $E[\delta]$ is not linear. However, we can draw large enough sample for $b$ and approximate $E[\delta]$. Here is how:
      Lets assume that $b=b_k$ with $Pr(b=b_k)=\pi_k$ where $k=1,2,\cdots, K.$ Then, we have:
      \[ \min E[\text{max}\{0, a_1x_1+\cdots+a_nx_n -b\}]\approx \left\{ \begin{array}{ll} \min \sum_{k=1}^{K}{\pi_kv_k}&\\ \text{s.t.}&\\ v_k\geq 0&\forall k\\ v_k\geq a_1x_1+\cdots+a_nx_n -b_k&\forall k\\ \end{array} \right. \]
      This approximation increases the size of problem but preserves the linearity.

  4. Recourse Models
    In this approach generally you can take corrective actions when you observe the true value of an uncertain parameter. These models are also known as stochastic programming models which include two main models: Two-Stage Models and Multi-Stage Models.
    1. Two-Stage Models
    2. In two stage models, we have two types of decision variables. First stage decision variables, also known as here-and-now decisions, are those that we make BEFORE a realization of random parameters become known. And, second stage decisions are recourse decision which we make AFTER a realization of random parameters become known.

    3. Multi-Stage Models
    4. Multi-Stage models are those that we need to make recourse and corrective decisions at successive stages because the information is being known at every stage we move forward. A good example is when we are planning for multiple time periods and in each time period we only have access to past information. Therefore, in these models we need to be careful to satisfy nonanticipativity constraint which is a constraint that limits the choice of the decisions to the information that has been revealed so far.

  5. In this tutorial, we don't go into details of stochastic programming. Interested readers can consult with Lectures on Stochastic Programming by Shapiro et. al and/or these course materials by Jeff Linderoth.

  6. Dynamic Programming
  7. In Dynamic programming (DP), the assumption is that we are in some State, and based on the current state, we need to make a decision (Action), then randomness happens, and this randomness with some probability (Transition Probability) takes us to another state and gives us some reward based on our earlier decision. Now, we are in a new state and we need to make a new decision and go on. You can see that our decision is a function of the current state. This function that maps states to decisions is called Policy, and we want to find a policy that on average yields higher sums of rewards. Such policy is called Optimal Policy.
    Optimal policy satisfies a set of equations known as Bellman Optimality Equations, and DP is an algorithm that finds the optimal policy based on Bellman equations in discrete state-action spaces.

A Simple Example

Here, we provide a simple numerical example to demonstrate what we have discussed so far. However, for the sake of simplicity, we just implement Expected violation penalty method. All input data can be found in ./codes/data directory in this github repository.
Consider a production plant which produces $N$ products ($n=1,2,\cdots,N$). This plant can either produce, or source products to satisfy customers' monthly demand ( $D_{nt}, \ t=1,2,\cdots,12$) for a year, and Demand is not known ahead of decision making. The production cost for product $n$ is $c_n,$ and this plant sells each product at price $p_{n}\geq c_{n}.$ In the case of shortage, this plant needs to source at a higher price $p^H_{n},$ and in the case of overage it has to sell at lower price $p^L_{n}\leq c_n.$ Product $n$ is produced with efficiency rate of $r_{n}\leq 1$ in this plant. Total production capacity in each month depends on the availability of raw material, and $L_{t}$ is the total monthly capacity in units. We are looking for the best set of sourcing and production decisions that maximizes total profit. Let $X_{nt}$, $U_{nt}$, and $O_{nt}$ be production, shortage and overage amounts for product $n$ at month $t$, respectively. Here is the mathematical formulation:

\[ \left. \begin{array}{lll} \text{max}& \sum_{n=1}^{N}\sum_{t=1}^{12}{(p_{n}-c_n)X_{nt}+(p^L_n-c_n)O_{nt}-p^H_nU_{nt}}&\\ \text{s.t.}\\ &\sum_{n=1}^{N}{r_{n}x_{nt}}\leq L_{t}&\forall t\\ &X_{nt}+U_{nt} - O_{nt}= D_{nt}&\forall t\quad \forall n\\ &O_{nt}, X_{nt}, U_{nt} \geq 0&\forall t\quad \forall n\\ \end{array} \right. \]

This problem is simple and it is worth mentioning that it is a version of News Vendor problem. If we want to use stochastic programming to solve this problem, we need to use Two-Stage model where $X_{nt}s$ are first stage (here-and-now) decision variables, and $O_{nt}s$ and $U_{nt}s$ are second stage (recourse) decision variables. Also, note that if we can hold inventory and use it to buffer against uncertainty, we have to use either Dynamic programming or Multi-Stage stochastic programming. However, assume that we are given a set of possible scenarios with their respective probabilities on what value demand of product $n$ at month $t$ might take. This means given scenario $k=1,2,\cdots, K$, we have demand value $d^k_{nt}$ with probability $\pi_k$ for product $n$ at month $t$. Therefore, we can write the deterministic equivalent of the above optimization problem as:

\[ \left. \begin{array}{lll} \text{max}& \sum_{n=1}^{N}\sum_{t=1}^{12}{[(p_{n}-c_n)X_{nt}+\sum_{k=1}^{K}{\pi_k((p^L_n-c_n)O^k_{nt}-p^H_nU^n_{nt}})]}&\\ \text{s.t.}\\ &\sum_{n=1}^{N}{r_{n}x_{nt}}\leq L_{t}&\forall t\\ &O^k_{nt}\geq X_{nt} - d^k_{nt}&\forall t\quad \forall n \quad \forall k\\ &U^k_{nt}\geq d^k_{nt} - X_{nt}&\forall t\quad \forall n \quad \forall k\\ &O^k_{nt}, X_{nt}, U^k_{nt} \geq 0&\forall t\quad \forall n \quad \forall k\\ \end{array} \right. \]

We use PuLp to model and solve a numerical instance of this problem. Please refer to /codes directory here for more details. In this directory, there are two Python files: optimization.py and run_analysis.py . The former one contains three classes for loading data, preparing data for optimization and building an optimization model for analysis. The latter one is acting like a main method, i.e., $python ./run_analysis.py will run the whole optimization given the right input data in ./codes/data/ directory.

In run_analysis.py, you first need to load data. Class DataLoader in optimization.py will take care of it for you. You just need to instantiate a DataLoader object with a list of all paths pointing to input CSVs. Next, we need to instantiate a Data object with a DataLoader object as an input. A Data object holds information such as parameters, sets, etc for optimization. Finally, we need to instantiate an Optimizer object to setup the optimization problem and solve it. It has a method called optimize which calls a solver to solve the optimization model. Here the default solver is CBC. Finally, if the solve is successful, we write optimal solutions as CSV files. Here is the code snippet of what we have outlined:

      
          import glob
          import optimization

          list_all_input_csvs = glob.glob("./data/*.csv")

          dl = optimization.DataLoader(input_files=list_all_input_csvs)
          data = optimization.Data(dl)
          optimizer = optimization.Optimizer(data)
          status = optimizer.optimize(WriteLpFlag=True)

          if status == 1:
            output_df_dict = data.get_output_reports(optimizer)
            for key, df in output_df_dict.items():
                out_name = 'optimizer_' + key
                df.to_csv(''.join(["./data/", out_name, '.csv']), index=False)