Dantzig-Wolfe Decomposition

We consider linear programming problems of the form \[\begin{array}{rrcrcrcrcl} \min \ & {\mathbf{c}^{(1)}}^\mathsf{T}\mathbf{x}^{(1)} & + & {\mathbf{c}^{(2)}}^\mathsf{T}\mathbf{x}^{(2)} & + & \cdots & + & {\mathbf{c}^{(k)}}^\mathsf{T}\mathbf{x}^{(k)} \\ (P)\quad\text{s.t.} & \mathbf{A}^{(1)}\mathbf{x}^{(1)} & + & \mathbf{A}^{(2)}\mathbf{x}^{(2)} & + & \cdots & + & \mathbf{A}^{(k)}\mathbf{x}^{(k)} & = & \mathbf{b} \\ & \mathbf{x}^{(1)} & & & & & & & \in & P_1 \\ & & & \mathbf{x}^{(2)} & & & & & \in& P_2 \\ & & & & & \ddots & & & & \vdots \\ & & & & & & & \mathbf{x}^{(k)} & \in & P_k. \end{array}\] where \(\mathbf{b} \in \mathbb{R}^m\) for some positive integer \(m\), and for \(i = 1,\ldots,k\), \(n_i\) is a positive integer, \(\mathbf{A}^{(i)} \in \mathbb{R}^{m \times n_i}\), \(\mathbf{c}^{(i)} \in \mathbb{R}^{n_i}\), and \(P_i\) is a pointed polyhedron.

For each \(i = 1,\ldots,k\), let \(\mathbf{x}^{(i,j)}\), \(j = 1,\ldots,p_i\) be the extreme points of \(P_i\) and let \(\mathbf{d}^{(i,j)}\), \(j = 1,\ldots,r_i\) give a complete set of extreme rays of \(P_i\) where \(p_i\) and \(r_i\) are nonnegative integers.

By Theorem 12.1, \(\mathbf{x}^{(i)} \in P_i\) can be written as \[\sum_{j = 1}^{p_i} \lambda_{(i,j)} \mathbf{x}^{(i,j)} + \sum_{j = 1}^{r_i} \gamma_{(i,j)} \mathbf{d}^{(i,j)}\] for some \(\lambda_{(i,j)} \geq 0\), \(j = 1,\ldots, p_i\) and \(\gamma_{(i,j)} \geq 0\), \(j = 1,\ldots, r_i\) where \(\sum_{j=1}^{p_i} \lambda_{(i,j)} = 1\).

Then \((P)\) can be reformulated as \begin{align*} \min \ & \sum_{i=1}^k \left(\sum_{j = 1}^{p_i} \hat{c}_{(i,j)} \lambda_{(i,j)} + \sum_{j = 1}^{r_i} \tilde{c}_{(i,j)} \gamma_{(i,j)}\right) \\ (DW)\quad\text{s.t.} \ & \sum_{i=1}^k \left(\sum_{j = 1}^{p_i} \lambda_{(i,j)} \hat{\mathbf{a}}^{(i,j)} + \sum_{j = 1}^{r_i} \gamma_{(i,j)} \tilde{\mathbf{a}}^{(i,j)} \right) = \mathbf{b} \\ & \sum_{j=1}^{p_i} \lambda_{(i,j)} = 1 & i = 1,\ldots, k, \\ & \lambda_{(i,j)} \geq 0 & i = 1,\ldots,k,\ j = 1,\ldots, p_i,\\ & \gamma_{(i,j)} \geq 0 & i = 1,\ldots,k,\ j = 1,\ldots, r_i. \end{align*} where for \(i = 1,\ldots, k\) we have \begin{align*} \hat{c}_{(i,j)} & = {\mathbf{c}^{(i)}}^\mathsf{T} \mathbf{x}^{(i,j)} \\ \hat{\mathbf{a}}^{(i,j)} & = \mathbf{A}^{(i)} \mathbf{x}^{(i,j)} \end{align*} for \(j = 1,\ldots,p_i\), and \begin{align*} \tilde{c}_{(i,j)} & = {\mathbf{c}^{(i)}}^\mathsf{T} \mathbf{d}^{(i,j)} \\ \tilde{\mathbf{a}}^{(i,j)} & = \mathbf{A}^{(i)} \mathbf{d}^{(i,j)} \end{align*} for \(j = 1,\ldots,r_i\).

Example. Consider \begin{align*} \min \ & x_1 + 2x_2 + 3y_1 + 4y_2 \\ (P)~~~~\text{s.t.} \ & -x_1 + 5 x_2 +7y_1 - 6y_2 = 1 \\ & \begin{bmatrix} x_1 \\ x_2 \end{bmatrix} \in P \\ & \begin{bmatrix} y_1 \\ y_2 \end{bmatrix} \in Q \end{align*} where \(P = \left\{ \begin{bmatrix} x_1 \\ x_2 \end{bmatrix} : \begin{array}{r} x_1 + x_2 \geq 1 \\ -x_1 + 2x_2 \geq 0 \\ x_2 \geq 1 \end{array} \right\}\) and \(Q = \left\{ \begin{bmatrix} y_1 \\ y_2 \end{bmatrix} : \begin{array}{r} 3y_1 - y_2 \geq 2 \\ y_1 + 2y_2 \geq 3 \end{array} \right\}.\) Solving this directly gives an optimal solution \(\begin{bmatrix} x_1 \\x_2\\y_1\\y_2\end{bmatrix} = \begin{bmatrix} 2 \\ 1 \\ \frac{14}{11} \\ \frac{20}{11} \end{bmatrix}\) with optimal value \(\frac{166}{11}\).

Now, \(P\) has extreme points \(\begin{bmatrix} 0 \\ 1 \end{bmatrix}\) and \(\begin{bmatrix} 2 \\ 1 \end{bmatrix}\) and a complete set of extreme rays \(\left\{ \begin{bmatrix} -1 \\ 1 \end{bmatrix}, \begin{bmatrix} 2 \\ 1 \end{bmatrix} \right\}\) where as \(Q\) has one extreme point \(\begin{bmatrix} 1 \\ 1 \end{bmatrix}\) and a complete set of extreme rays \(\left\{ \begin{bmatrix} 1 \\ 3 \end{bmatrix}, \begin{bmatrix} 2 \\ -1 \end{bmatrix} \right\}\). Hence, \((P)\) can be reformulated as \begin{align*} \min \ & 2\lambda_{(1,1)} + 4\lambda_{(1,2)} + \gamma_{(1,1)} + 4\gamma_{(1,2)} + 7\lambda_{(2,1)} + 15\gamma_{(2,1)} + 2\gamma_{(2,2)} \\ \text{s.t.} \ & 5\lambda_{(1,1)} + 3\lambda_{(1,2)} + 6\gamma_{(1,1)} + 3\gamma_{(1,2)} + \lambda_{(2,1)} - 11\gamma_{(2,1)} + 20\gamma_{(2,2)} = 1 \\ & \lambda_{(1,1)}+\lambda_{(1,2)} = 1 \\ & \lambda_{(2,1)} = 1 \\ & \lambda_{(1,1)}, \lambda_{(1,2)}, \gamma_{(1,1)},\gamma_{(1,2)}, \lambda_{(2,1)}, \gamma_{(2,1)}, \gamma_{(2,2)} \geq 0 \end{align*} Now, solving this directly gives an optimal solution \(\lambda_{(1,2)} = \lambda_{(2,1)} = 1\), \(\gamma_{(2,1)} = \frac{3}{11}\), and the remainder of the variables equal to 0. The optimal value is again \(\frac{166}{11}\) and the solution gives the point \(\lambda_{(1,2)} \begin{bmatrix} 2 \\ 1 \end{bmatrix} = \begin{bmatrix} 2 \\ 1 \end{bmatrix}\) in \(P\) and \(\lambda_{(2,1)} \begin{bmatrix} 1 \\ 1 \end{bmatrix} + \gamma_{(2,1)} \begin{bmatrix} 1 \\ 3\end{bmatrix} = \begin{bmatrix} 1+ \frac{3}{11} \\ 1+\frac{9}{11}\end{bmatrix} = \begin{bmatrix} \frac{14}{11} \\ \frac{20}{11}\end{bmatrix}\) in \(Q\).

Note that \((DW)\) is a linear programming problem in standard form. But it might have too many variables compared to \((P)\). To solve \((DW)\), we will use the simplex method with column generation. We illustrate the concepts on the problem \begin{align*} \min \ & \mathbf{c}^\mathsf{T}\mathbf{x} \\ (SP)~~~~~\text{s.t.}\ & \mathbf{A}\mathbf{x} = \mathbf{b} \\ & \mathbf{x} \geq \mathbf{0}. \end{align*}

Suppose that \(B\) is a basis determining a basic feasible solution. Recall that the only part of \(\mathbf{A}\) that is needed to compute the associated basic feasible solution is the submatrix \(\mathbf{A}_B\). To determine if the basic feasible solution is optimal, we solve for \(\mathbf{y}\) in \(\mathbf{y}^\mathsf{T}\mathbf{A}_B = \mathbf{c}_B^\mathsf{T}\) and check if it is feasible to the dual; i.e. if \(\mathbf{y}^\mathsf{T}\mathbf{A}_j \leq c_j\) for all \(j \notin B\). If not, we have a \(k \notin B\) such that \(\mathbf{y}^\mathsf{T}\mathbf{A}_k \gt c_k\) and \(x_k\) is the entering variable. The leaving variable can be obtained knowing only \(\mathbf{A}_{B\cup\{k\}}\), \(\mathbf{c}_{B\cup\{k\}}\), and \(\mathbf{b}\). Therefore, the only step that involves knowledge of the entire \(\mathbf{A}\) is in finding \(k\).

In the case of \((DW)\), the check can be performed without the explicit knowledge of the entire \(\mathbf{A}\). To see this, let such \(\mathbf{y}\) for the dual of \((DW)\) be given by \(\begin{bmatrix} \mathbf{u} \\ \zeta_1 \\ \vdots \\ \zeta_k\end{bmatrix}\). We therefore need to check for each \(i \in \{1,\ldots,k\}\) if there exists \(j \in \{1,\ldots, n_i\}\) such that \[ \mathbf{u}^\mathsf{T}\hat{\mathbf{a}}^{(i,j)} +\zeta_i \gt \hat{c}_{(i,j)} \] or if there exists \(j \in \{1,\ldots, r_i\}\) such that \[ \mathbf{u}^\mathsf{T}\tilde{\mathbf{a}}^{(i,j)} \gt \tilde{c}_{(i,j)}. \]

These inequalities can be rewritten as \[\zeta_i \gt \left({\mathbf{c}^{(i)}}^\mathsf{T}- \mathbf{u}^\mathsf{T}\mathbf{A}^{(i)}_j\right)\mathbf{x}^{(i,j)}\] and \[0 \gt \left({\mathbf{c}^{(i)}}^\mathsf{T}- \mathbf{u}^\mathsf{T}\mathbf{A}^{(i)}_j\right)\mathbf{d}^{(i,j)},\] respectively. We can handle both sets of inequalities for a particular \(i \in \{1,\ldots,k\}\) by solving the subproblem \begin{align*} \min \ & \left({\mathbf{c}^{(i)}}^\mathsf{T}- \mathbf{u}^\mathsf{T}\mathbf{A}^{(i)}_j\right)\mathbf{x} \\ \text{s.t.} \ & \mathbf{x} \in P_i. \end{align*} Suppose that it has an optimal solution. By Theorem 8.7 there is one that is an extreme point \(\mathbf{x}^{(i)}\) of \(P_i\). If the optimal value is less than \(\zeta_i\), then \[\zeta_i \gt \left({\mathbf{c}^{(i)}}^\mathsf{T}- \mathbf{u}^\mathsf{T}\mathbf{A}^{(i)}_j\right)\mathbf{x}^{(i)}\] and so we have found an entering variable for \((DW)\).

Suppose that it is unbounded. By Theorem 10.5 there is an extreme ray \(\mathbf{d}^i\) of \(P_i\) so that \[0 \gt \left({\mathbf{c}^{(i)}}^\mathsf{T}- \mathbf{u}^\mathsf{T}\mathbf{A}^{(i)}_j\right)\mathbf{d}^{i}.\] Again, we have found an entering variable for \((DW)\).

If none of the above applies for all \(i = 1,\ldots,k\), then \((DW)\) has been solved to optimality.

The computational advantage of the above method for solving \((DW)\) instead of solving \((P)\) directly is that if \(k\) is large, the problem \((P)\) has many constraints. So in the case when \(m\) is small and each of the subproblem is easy to solve, each iteration of the simplex method has only modest memory requirements, making it possible to solve very large instances of \((P)\).

Dantzig-Wolfe decomposition