In this chapter, we describe extensions that have been made to the PIPS-IPM solver to improve performance and robustness. The initial content for this chapter has been taken from Chapter 4.1 of the [11].

While commercial generic LP solvers are often extremely powerful, they usually cannot run (efficiently) on distributed systems. Moreover, for special problems (such as perhaps most famously the traveling salesman problem), hand-tailored solvers may significantly outperform general-purpose commercial software.

Also several large energy system model instances have shown to be intractable within a time frame of several hours for commercial solvers, so that effort has been spent on developing specialized parallel algorithms that exploit the structure of such instances. One such specialized implementation is the PIPS-IPM++ solver, which tailors the interior-point method to exploit the block-diagonal structure for a distributed parallel solution approach.

Exploiting the problem structure within an interior-point algorithm

The approach to solve block-diagonal LPs of the form (BlockLP) is based on the parallel interior-point solver PIPS-IPM [8], that was developed for solving stochastic linear programs. The form of its designated problem class also exhibits block-diagonal structures, although only with linking variables and without linking constraints. In this way, PIPS-IPM in its original form cannot handle problems of the form shown above.

To overcome this limitation, PIPS-IPM has been extended to handle LPs with both linking constraints and linking variables. The resulting solver is called PIPS-IPM++.

The pivotal ingredient of the algorithm is the parallelization of the linear system that has to be solved in each iteration of an interior point algorithm. Below, a short overview over the most impactful extensions and their schematic mathematical description are given.

PIPS-IPM++ makes use of the Message Passing Interface (MPI) for communication between their (parallel) processes, which in the following will be referred to as MPI-processes. Without going into too much technical detail, an important feature is that the whole LP can be distributed among the MPI-processes, with no process needing to store the entire problem. This allows tackling problems that are too large to even be stored in the main memory of a single desktop machine. The main principle is that for each index \(i \in \{0,1,...,N \}\), all \(x_i\), \(h_i\), \(T_i\), and \(W_i\) (for \(i > 0\)) need to be directly available to only one MPI-process; \(h_{N+1}\) needs to be assigned to the MPI-process handling \(i=0\). Moreover, each MPI-process needs access to the current value of \(x_0\). The maximum of MPI-processes that can be used is \(N\).

Improving performance and scalability

The first prototype of PIPS-IPM++ showed deficits in both convergence and running time: While it was possible to solve some energy system models to optimality, for others the algorithm could not find an optimal solution — it also could not outperform the interior point algorithm of the commercial solver CPLEX, which was used as the reference. Several steps have been taken to improve performance and convergence.

Solving the Schur complement

To begin with, interior point algorithms rely on the successive solution of LPs. In the extended solver, these systems have the form

\begin{equation}\begin{bmatrix} K_1 & & & B_1 \\ & \ddots & & \vdots \\ & & K_N & B_N \\ B_1^T & \cdots & B_N^T & K_0 \end{bmatrix} \begin{bmatrix} \Delta z_1 \\ \vdots \\ \Delta z_N \\ \Delta z_0 \end{bmatrix} = \begin{bmatrix} b_1 \\ \vdots \\ b_N \\b_0 \end{bmatrix}, \tag{BlockLPCompact} \end{equation}

with matrices \(K_i\), \(B_i\), and some right-hand side vector. These matrices each are composed out of the sub-matrices shown in (BlockLP) and we do not go into detail about their structure here (see [9] instead).

The solver applies a Schur complement approach to the system (BlockLPCompact), which decomposes into the following procedure:

Multiply each row \(i = 1,\ldots,N\) of (BlockLPCompact) by \(-B_i^T K_i^{-1}\).
Sum up all rows.
Solve \( \big(K_0 - \sum_{i = 1}^{N} B_i^TK_i^{-1} B_i \big) \Delta z_0 = b_0 -\sum_{i = 1}^{N}B_i^TK_i^{-1}b_i\).
For each row \(i=1,\ldots,N\) insert \(\Delta z_0\) and compute \(\Delta z_i\).

The matrix

\[C = K_0 - \sum_{i = 1}^{N} B_i^TK_i^{-1} B_i, \tag{SchurComp} \]

is called the Schur complement of (BlockLPCompact) and its computation, or rather, the solution of the associated linear system is most crucial to the interior point algorithm.

While most of the steps can be done locally on each MPI-process, the formation of the actual Schur complement and the solution of the resulting linear system would require storing the matrix on a single process. This poses a problem, since for many energy system model instances there are more than 300 000 linking constraints and variables, making it intractable to compute and store the Schur complement as a dense matrix, the common procedure for the depicted algorithm. However, often the majority of linking constraints and variables are local, meaning, they only connect two different building blocks \(i, j\) of the original problem. These structures lead to an increased sparsity of the resulting global Schur complement.

One of the major improvements implemented is the exploitation of these local linking structures. Knowing beforehand about the locality of certain linking constraints and variables allows the prediction of the non-zero pattern in the resulting Schur complement. This reduces the amount of storage needed for the matrix and also enables one to use sparse LP solvers for the solution of the Schur complement. Both advantages increase the performance of the original version of PIPS greatly, allowing it to efficiently solve systems with larger linking parts. For a detailed description of the implemented mechanisms for exploiting the locality of linking constraints and variables in the extended solver, refer to [9].

Despite the previously mentioned improvements, a high number of linking constraints and variables can still slow down the solution process of the algorithms significantly. This is because factorization and solution of the global Schur complement can still be prohibitively time-consuming. One possible remedy is to not compute the Schur complement explicitly but to use an iterative approach, e.g., a Krylov subspace method [10]. These methods only require matrix-vector products with the system matrix but often depend highly on the effective preconditioning of the linear system. For this purpose, a distributed preconditioner has been added to PIPS-IPM++, details of which can be found in [9]. The novel implementation of such a preconditioner in the context of interior-point methods lead to significant performance improvements.

Changes to the interior point algorithm

Most of the algorithmic efforts described above have concentrated on efficiently solving the linear system arising from an interior-point algorithm. However, also the interior point algorithm itself is of high importance to achieve fast and robust convergence. To improve robustness, the originally implemented predictor-corrector algorithm of Mehrotra has been replaced by the multiple centrality correctors scheme of Colombo and Gondzio [3].

Moreover, since PIPS-IPM has been developed to solve quadratic problems, the same step length (central part of an interior-point algorithm) for primal and dual variables has been used originally. As PIPS-IPM++ is currently only concerned with LPs, the use of different primal and dual step lengths [12] has been implemented to achieve faster convergence.

Scaling

Two additional points in implementing robust interior-point algorithms that are not directly tied to the algorithm itself are scaling and presolving. Both preconditioning techniques modify the original system matrix in a way that makes the application of an interior point method more reliable.

For the next paragraphs we consider the LP in a slightly different and simplified form than (BlockLP), namely with explicit variable bounds:

\[\min\{ c^T x : Ax = b,\; \ell \leq x \leq u \}. \]

Scaling can be described by means of two diagonal matrices \(R = (r_{i,j})\) and \(C = (c_{i,j})\) with positive diagonal elements. The diagonals correspond to the row and column scaling factors respectively. Defining

\[\tilde{A} = R A C,\quad \tilde{b} = R b, \quad \tilde{c} = C c, \quad \tilde{\ell} = C^{-1}\ell, \text{ and } \tilde{u} = C^{-1} u, \]

one obtains the scaled linear program

\[\min\{ \tilde{c}^T x : \tilde{A}x = \tilde{b},\; \tilde{\ell} \leq x \leq \tilde{u} \}. \]

Each solution \(\tilde{x}\) to the scaled LP corresponds to a solution \(x = C \tilde{x}\) to the unscaled LP with the same objective value.

PIPS-IPM++ has been extended by geometric scaling, equilibrium scaling, and a combination of both [4]. While equilibrium scaling divides all coefficients in each nonzero row and column of the constraint matrix by the absolute largest entry within this vector, geometric scaling uses a simplified geometric mean of the absolute vector entries as divisor: For each column \(A_j\) of the constraint matrix \(A\) the divisor is \(\sqrt{ \max_{i:a_{ij} \neq 0}|a_{ij}| \cdot \min_{i:a_{ij} \neq 0} |a_{ij}|}\), for each row \(a_{i \cdot}\) it is \(\sqrt{ \max_{j:a_{ij} \neq 0} |a_{ij}| \cdot \min_{j:a_{ij} \neq 0} |a_{ij}|}\). Geometric scaling is computationally more expensive than equilibrium scaling, since it is applied iteratively (up to 10 times in the extended solver); equilibrium scaling on the other hand always converges in one step. However, the arrowhead structure allows to perform all scaling methods efficiently in parallel. Therefore, the scaling run times are neglectable, and geometric scaling is used as default.

Presolve

As mentioned, another important ingredient of state-of-the-art linear programming solvers is presolving. LPs resulting from a modeled application often contain redundant information, which can be harmful for the performance of an interior point algorithm. Thus, it is advisable to eliminate redundant parts before the actual solving process is started. Presolving techniques aim to reduce the number of variables, constraints, and non-zeroes while keeping the presolved and original problem equivalent.

PIPS-IPM++ implements many different presolving methods [7], incorporating one or more of the techniques described in [1] [2] [6], e.g., singleton row elimination, bound tightening, parallel and nearly parallel row detection, and a few methods summarized under the term model cleanup. The latter includes the detection of redundant rows as well as the elimination of negligibly small entries from the constraint matrix.

A presolving routine can apply certain reductions to the LP: deletion of a row or column, deletion of a system entry, modification of variable bounds and the left- and right-hand side, and modification of objective function coefficients. These reductions are applied as equivalence operations that do not change the set of feasible solutions of the linear program. This allows to recover an original solution (a solution of the original system matrix) from the solution computed by the solver in a so-called postsolve step.

It is important to note, that none of the presolving techniques described above are allowed to alter the general block structure of the system (BlockLP), since this would lead to the non-applicability of the structure-specific solution method. Thus, the set of implemented methods is highly tailored to the general structure of the problem [5] [7].

Hierarchical Approach

With the hierarchical approach, PIPS-IPM++ splits the Schur complement decomposition (and thus also the Schur complement) into several layers to handle energy system models with even stronger linkage. TODO extend this