Back

Benders decomposition

Benders decomposition is a framework that has found success in solving certain difficult optimization problems, most notably in job scheduling. It started as a method for solving structured linear programming problems which are sometimes large-scale. It has been extended to mixed-integer programming and to even more general settings. For an introduction to such extensions, see the book Integrated Methods for Optimization by John N. Hooker. In these notes, we will not consider the full generality of Benders decomposition.

We consider optimization problems of the form \begin{align*} \min \ & \mathbf{c}^\mathsf{T} \mathbf{x} + \mathbf{d}^\mathsf{T} \mathbf{y} \\ (P)~~~~\text{s.t.} \ & \mathbf{A}\mathbf{x} + \mathbf{B}\mathbf{y} \geq \mathbf{b} \\ & \mathbf{x} \in \mathcal{X}, \mathbf{y} \geq \mathbf{0} \end{align*} where \(\mathbf{A} \in \mathbb{R}^{m\times n_x}\), \(\mathbf{B} \in \mathbb{R}^{m\times n_y}\), \(\mathbf{c} \in \mathbb{R}^{n_x}\), \(\mathbf{d} \in \mathbb{R}^{n_y}\), and \(\mathcal{X} \subseteq \mathbb{R}^{n_x}\).

In many applications, \(\mathcal{X}\) is a mixed-integer set. If \(\mathcal{X}\) is a polyhedron, the problem is a linear programming problem.

To simplify discussion, we make the assumption that \(\mathcal{X}\) is bounded and that \((P)\) has at least one optimal solution.

Using the convention that an infeasible minimization problem has optimal value \(\infty\), we can write \((P)\) as \begin{align*} \min \ & \mathbf{c}^\mathsf{T} \mathbf{x} + z(\mathbf{x}) \\ (P')~~~~\text{s.t.} \ & \mathbf{x} \in \mathcal{X} \end{align*} where \[z(\mathbf{x}) = \min \{ \mathbf{d}^\mathsf{T} \mathbf{y} : \mathbf{B}\mathbf{y} \geq \mathbf{b} - \mathbf{A}\mathbf{x},\, \mathbf{y} \geq \mathbf{0}\}.\]

To see this, let \(\nu\) be the optimal value of \((P)\) and let \(\mathbf{x}^*,\mathbf{y}^*\) be an optimal solution to \((P)\). Then \(\mathbf{x}^*\) is a feasible solution to \((P')\) with objective function value \(\mathbf{c}^\mathsf{T}\mathbf{x}^*+ z(\mathbf{x}^*) \leq \mathbf{c}^\mathsf{T}\mathbf{x}^*+\mathbf{d}^\mathsf{T}\mathbf{y}^* = \nu\). Hence, the optimal value of \((P')\) is bounded above by \(\nu\).

Now, consider any \(\mathbf{x}^*\) feasible to \((P')\) having finite objective function value \(\nu'\). Let \(\mathbf{y}^*\) be an optimal solution to \[\min \{ \mathbf{d}^\mathsf{T} \mathbf{y} : \mathbf{B}\mathbf{y} \geq \mathbf{b} - \mathbf{A}\mathbf{x}^*,\, \mathbf{y} \geq \mathbf{0}\}.\] Then, \(\mathbf{x}^*,\mathbf{y}^*\) is a feasible solution to \((P)\). Hence, \(\nu' \geq \nu\). It follows that the optimal value of \((P')\) must equal \(\nu\).

Computing \(z(\mathbf{x})\) for a given \(\mathbf{x}\) involves solving a linear programming problem with dual \begin{align*} \max \ & \mathbf{u}^\mathsf{T}\left(\mathbf{b}-\mathbf{A}\mathbf{x}\right) \\ \text{s.t.} \ & \mathbf{u}^\mathsf{T} \mathbf{B} \leq \mathbf{d}^\mathsf{T} \\ & \mathbf{u} \geq \mathbf{0}. \end{align*}

Let \(P\) denote the polyhedron \(\left\{ \mathbf{u} : \mathbf{u}^\mathsf{T} \mathbf{B} \leq \mathbf{d}^\mathsf{T},~\mathbf{u} \geq \mathbf{0} \right\}\) and let \(C= \operatorname{recc}(P)\). By Proposition 10.4, \(C = \left\{ \mathbf{u} : \mathbf{u}^\mathsf{T} \mathbf{B} \leq \mathbf{0},~\mathbf{u} \geq \mathbf{0} \right\}\). Since by assumption, \(\mathcal{X}\) is bounded and \((P)\) has at least one optimal solution, we must have that \(z(\mathbf{x}) \gt -\infty\) and that \(z(\mathbf{x}')\) is finite for some \(\mathbf{x}' \in \mathcal{X}\). Hence, \(P\) must be nonempty. Since \(P\) does not contain a line, by Theorem 8.6, \(P\) is pointed and so is \(C\). Thus, if \(z(\mathbf{x}) \lt \infty\), the dual problem must have an optimal solution that is an extreme point of \(P\) (Theorem 8.7).

By a version of the Farkas' Lemma, \(z(\mathbf{x}) \lt \infty\) if and only if \(\mathbf{u}^\mathsf{T}\left(\mathbf{b} - \mathbf{A}\mathbf{x}\right)\leq 0\) for all \(\mathbf{u} \in C\). Since \(C\) is a polyhedral cone, it follows from Corollary 10.6 that \(z(\mathbf{x}) \lt \infty\) if and only if \(\mathbf{u}^\mathsf{T}\left(\mathbf{b} - \mathbf{A}\mathbf{x}\right)\leq 0\) for all \(\mathbf{u} \in \{ \mathbf{r}^{1},\ldots,\mathbf{r}^{k}\}\) where \(\mathbf{r}^{1},\ldots,\mathbf{r}^{k}\) form a complete set of extreme rays of \(C\). Hence, we may add the constraints \[{\mathbf{r}^{i}}^\mathsf{T}\left(\mathbf{b} - \mathbf{A}\mathbf{x}\right) \leq 0,\quad \forall i \in \{1,\ldots,k\}.\] to \((P')\) to obtain: \begin{align*} \min \ & \mathbf{c}^\mathsf{T} \mathbf{x} + z(\mathbf{x}) \\ (P'')~~~~\text{s.t.} \ & {\mathbf{r}^{i}}^\mathsf{T}\left(\mathbf{b} - \mathbf{A}\mathbf{x}\right) \leq 0\quad \forall i \in \{1,\ldots,k\},\\ & \mathbf{x} \in \mathcal{X}. \end{align*} With \(\mathbf{x} \in \mathcal{X}\) such that \({\mathbf{r}^{i}}^\mathsf{T}\left(\mathbf{b} - \mathbf{A}\mathbf{x}\right) \leq 0,~\forall i \in \{1,\ldots,k\}\), we have from Theorem 3.1 that \(z(\mathbf{x}) = \max \{ \mathbf{u}^\mathsf{T}\left(\mathbf{b}-\mathbf{A}\mathbf{x}\right): \mathbf{u} \in P\}\). As mentioned above, there exists an optimal solution \(\mathbf{p}\) that is an extreme point of \(P\). Hence, if \(\mathbf{p}^{1},\ldots,\mathbf{p}^{\ell}\) are the extreme points of \(P\), then \[z(\mathbf{x}) = \max \left\{ {\mathbf{p}^{i}}^\mathsf{T} \left(\mathbf{b}-\mathbf{A}\mathbf{x}\right) : i \in \{1,\ldots, \ell\}\right\}.\]

Hence, \((P'')\) is equivalent to \begin{align*} \min \ & \mathbf{c}^\mathsf{T} \mathbf{x} + z \\ (BP)~~~~\text{s.t.} \ & {\mathbf{r}^{i}}^\mathsf{T}\left(\mathbf{b} - \mathbf{A}\mathbf{x}\right) \leq 0, & \forall i \in \{1,\ldots,k\},\\ & {\mathbf{p}^{i}}^\mathsf{T} \left(\mathbf{b}-\mathbf{A}\mathbf{x}\right) \leq z & \forall i \in \{1,\ldots,\ell\}, \\ & \mathbf{x} \in \mathcal{X}. \end{align*}

Notice how we have completely removed the variables \(\mathbf{y}\) at the expense of obtaining the extreme rays of \(C\) and the extreme points of \(P\). In general, there can be an exponential number of these. We will see later that there is no need to compute all of them ahead of time; we will be generating them “lazily”.

Worked examples

  1. Explain what happens if \(P = \left\{ \mathbf{u} : \mathbf{u}^\mathsf{T} \mathbf{B} \leq \mathbf{d}^\mathsf{T},~\mathbf{u} \geq \mathbf{0}\right\}\) is empty.