Graph partitioning with general purpose solvers

I gave a talk at ROADEF2018 about using general-purpose solvers for large-scale partitioning problems. Although much slower than ad-hoc algorithms, they can still converge to a good solution with some help… and they can tackle much more complex models.


Minipart 0.3: getting there

I just added a “v0.3” tag to my latest Minipart commit. It is shaping quite well, and is giving promising results at last. Compared to hMetis2, the results are very similar on average (2% better), although it’s still much much slower (5x).

A 2% quality improvement would be nothing to sneer at if I could improve the runtime, but this average hides a few interesting datapoints: some benchmarks show a 20% difference between hMetis and Minipart. I guess partitioning is not a solved problem yet.

Next steps

There are a lot of things I want to try. Nowadays, you cannot just hand-tune your algorithm: it takes too much brain time, and our brain is bad at it anyway. I probably have a lot to gain by just trying all kinds of parameters automatically – what people have been doing with SAT solvers for a while. Or give the parameters to a good blackbox optimizer, like people now do for neural networks. Or maybe try machine learning, why not.

Less fun but equally important would be to implement and benchmark all the “good” algorithms I left aside: all other partitioners implement Fiduccia-Mattheyse and heavy-edge coarsening, but I didn’t try them yet. A lot of ideas from MLPart and hMetis have proven useful, and I expect to find some room for improvement here.

Minipart partitioner

I am now writing a graph and hypergraph partitioner, Minipart. It’s starting to yield good solutions on the ISPD benchmarks, similar to MLPart and hMetis.

In a way, it is very similar to existing tools: it interleaves local search passes with problem simplifications (coarsening). On the other hand, I started with very different choices for coarsening, and my benchmarks are slowly leading me to a different kind of local search as well.

More details when I publish version 1.0!

Incremental evaluation is often overlooked

As a community, we focus a lot on the heuristics available to solve a problem. But my experience in the field is that we don’t spend much time on the heuristics themselves: the cost function will get most of the work.


When solving a problem using a metaheuristic, it is important to quickly evaluate incumbent solutions. When the number of variables is small, it is possible to compute the cost of a solution from scratch, but, if the problem has millions of variables, even a linear time-complexity will become limiting.

This is where incremental evaluation comes into play: for many cost functions, we can compute the result quickly for a small change of inputs. This is perfect for local search, which can now scale to very large problems.

Rolling your own

If you need incremental evaluation for your problem, you are going to roll your own. It is quite easy at first: evaluating sums, maxs and products incrementally is not very difficult.

But in practice requirements change: you’ll need to implement a slightly different cost function and add new features to it. Before you know it, this incremental cost function prevents you from experimenting as much as you’d like to. It may even have become a maintenance issue! Moreover, we have more important stuff to do, and it’s rarely on par with what you could obtain with more optimizations and preprocessing.

Any framework?

I sometimes feel the need for a library that would make incremental evaluation smoother. I know of one good framework for the metaheuristics part (ParadisEO), but I dont know of any framework to implement incremental evaluation of the cost function. When tackling some large scale problems, this is where the work is spent (both for the humans and the computer).

If you know one such framework, I’d like to hear from you!

Full disclosure: I am going to work at LocalSolver starting in November. A powerful modeling language makes modeling much easier, but generic solvers do not allow you to plug your own search engine: sometimes I’d like the best of both worlds.

My optimization toolbox

The principle of optimization is simple: get the best possible solution on a problem, subject to a given cost and problem-specific constraints. It is important in many areas, from economics and planning to electronics design. Although the theory is useful and very important for tool writers, it is more of an art for the user. However, the more optimization problems I see, the more I learn to reuse the same old tricks. I wanted to look back on what I learned and why.

When I tackled my first optimization problems, I had a strong tendency to write custom algorithms, whether basic local-search algorithms, brute-force or problem-specific clever algorithms. Now I see this approach as a big loss of time: we have a huge toolbox of powerful tools and modeling approaches that usually work better, and are much faster to try in any case. Even for the rare cases where these tools wouldn’t work in practice, they are very helpful to get a better idea of your problem. The big question is “which tool and which model?”.

Most of the work I did could be tackled with integer programming or continuous optimization, and I think all programmers who need to optimize something — in particular people working on automation of electronic design — should at least know when to try them.

Integer programming

Integer linear programming is probably the most important tool you will ever use: it can model almost anything if you know a few tricks, while being efficient on most problems. It optimizes a linear cost function, subject to linear inequalities on integer or real variables. Even on non-linear problems, an approximate linear programming model can be much better and faster than anything else.

There are some good free tools (I use GLPK and its modeling language, GMPL). On most problems it is my first modelling attempt. I discovered recently that it was able to use Minisat for pseudo-boolean problems (when all the variables are boolean): you get a completely different optimization algorithm for free, that can be much more efficient on some problems.

Huge problems

On the other hand, I had to deal with some huge uncontrained problems, in the millions of variables. It may not be something you need to use every day, but the conjugate gradient method and Nesterov’s method are interesting to learn about. There is a bunch or libraries you can use for convex optimization in C++, R, Python, Julia…

For complex and huge problems, such as a big travelling salesman or vehicle routing problem, you can’t do much: either there is a library for this specific problem, or you write some ad-hoc heuristics from scratch.

Ad-hoc algorithms

Now I  don’t think it is a good idea to ever consider this as a first approach if you want to solve a problem optimally, but for big or badly defined problems a heuristic algorithm is generally the only way. Simple local search is generally a good starting point.

Depending on the problem, even greedy algorithms can give good result. I often try simple permutations first, like the Lin-Kerningham heuristic, and maybe simulated annealing. I find genetic algorithms are more of a buzzword, and quite overrated.

Back to exact techniques

Even if an exact technique will not work for a difficult problem, it is often useful to try it anyway. It constrains you to model your problem, simplify it, and make it fit in a mathematical framework. Moreover, it can give you a hint of how bad your current algorithms are on small instances, and provide a subroutine for local search on your big problem.

In design automation, integer programming and the like aren’t used very often. In other fields, in particular planning, integer programming and the likes have proven invaluable long ago: give it a try.


Optimal is not always better

I stumbled on a difficult realization this week: a carefully engineered algorithm is not necessarily better than a very simple code.

One of the critical steps of my program used to rely on a quick and dirty heuristic – a hack relying on a very simple local search. Since it is an important optimization pass, I tried to adapt a mathematically clean and efficient algorithm, (based on minimum cost flows, which appear everywhere). It can optimize on a larger scale, and I hoped to get much better results.

Unsurprisingly, it turns out that it was a lost of time: for my datasets, the runtime of the quick heuristic is much lower to obtain the same results. Each pass is so much faster that its suboptimality doesn’t make a difference.

What is more disturbing is that same-quality results are not equivalent for the other optimization passes: when evaluating related cost functions, the clean algorithm tends to do much worse. I still don’t know if it is due to some kind of overfitting or to a specific feature of the heuristic, but I’d very much like to know what it is about.

On scheduling algorithms applied to VLSI placement

There is a striking similarity between VLSI placement and scheduling. Digital electronics circuits are built from “standard cells”, logic gates with a unique height. They are organized in rows, so that the power wires run between two rows of standard cells.

Tasks must be scheduled on machines so that they do not conflict, standard cells must be placed in rows so that they do not overlap. The only difference between scheduling tasks and placing logic gates is the cost function, generally more complex for placement than for scheduling.

Placement problems have never been formulated as scheduling problems: it is no surprise that there has been few communication between the fields, the journals and laboratories being separate and the goals completely different. But it seems to me that we can solve some placement subproblems using scheduling techniques.

Problem formulation

Scheduling is the problem of assigning tasks to machines. They are constrained by their execution times, with independent costs for each task.

In standard cell placement, the cost is usually much more complicated: the wirelength involves distances between tasks, and cannot be expressed a a sum of independent task costs. However, for a given ordering of the cells in the rows, it is possible to find the corresponding optimal positions by using linear programming (more precisely, network flow). This gives us a simple optimization pass to use during placement but it is still a bit expensive.

Therefore, for standard cell placement as well as scheduling, researchers have considered the simpler problem where there is only one row or one machine, still with fixed ordering. The cost function is a sum of independent piecewise linear functions, and hopefully there are faster algorithms to solve the problem.

A standard cell with one pin on the same row and two on other rows, with the associated cost function
A standard cell with one pin on the same row and two on other rows (a), an equivalent placement of the pins (b), and the associated cost function (c)

Fast algorithms for the ordered single row problem

Indeed, this problem is much easier to solve in practice. The clumping algorithm solves this problem in worst case quadratic time, and is usually much faster. The idea behind it is to add the cells at the end of the row and push it to the next pin until its placement is optimal: the final placement is optimal due to the convexity of the cost function.

A group of cells is pushed while it decreases wirelength. Each time a pin is passed, the derivative of the wirelength is updated and the algorithm may stop and add the next cell to the row.
A group of cells is pushed while it decreases wirelength. Each time a pin is passed, the derivative of the wirelength is updated and the algorithm stops and adds the next cell to the row.

This quadratic worst case complexity is annoying: several publications managed to improve this runtime up to m log² m where m is the number of pins. However, the datastructures involved are a special type of balanced trees, both slow and difficult to code.

I discovered a simpler m log n algorithm based on a priority queue, but by chance I found out that it is already known for scheduling,  as the specialized cascading descent algorithm. The cascading descent is equivalent to the clumping algorithm. The basic idea is to index the pins with a value that does not depend on the cell: this way, merging groups of cells takes constant time. The algorithm uses a single priority queue to hold the pins, hence its better complexity.

The absolute position of a cell is the position of the first cell if all cells were clustered. The position of a pin is defined relative to the cell it connects to.
The absolute position of a cell is the corresponding position of the first cell if all cells were clustered. The position of a pin is defined relative to the cell it connects to.


Applications to VLSI placement

It isn’t a breakthrough: even naive linear programming is fast enough to optimize a large part of a circuit, so that this improvement might be barely noticeable compared with approximate or  quadratic algorithms.

However, there are two extensions that may prove particularly useful: local search and non-convex optimization.

Local search is the process of modifying the placement through simple modifications, like cell swapping. One of my goals is to modify the algorithm in order to support online placement modification with logarithmic complexity. That is, keep an optimal placement through small ordering modifications.

Non-convex optimization is trickier. There can be several local optimas, which makes most such problems extremely difficult. For the special structure of the ordered single row problem, however, there are quadratic algorithms. It means that we can integrate more complex cost functions directly in the ordered single row optimization, for example cell orientation.

Both ideas are relatively new: cell swapping, orientation and position optimization are almost always separate steps (although it is possible, even integer programming models in the litterature do not integrate all of them). I wonder how these methods will play compared to independent optimization passes.