Incremental evaluation is often overlooked

As a community, we focus a lot on the heuristics available to solve a problem. But my experience in the field is that we don’t spend much time on the heuristics themselves: the cost function will get most of the work.

Incrementality

When solving a problem using a metaheuristic, it is important to quickly evaluate incumbent solutions. When the number of variables is small, it is possible to compute the cost of a solution from scratch, but, if the problem has millions of variables, even a linear time-complexity will become limiting.

This is where incremental evaluation comes into play: for many cost functions, we can compute the result quickly for a small change of inputs. This is perfect for local search, which can now scale to very large problems.

Rolling your own

If you need incremental evaluation for your problem, you are going to roll your own. It is quite easy at first: evaluating sums, maxs and products incrementally is not very difficult.

But in practice requirements change: you’ll need to implement a slightly different cost function and add new features to it. Before you know it, this incremental cost function prevents you from experimenting as much as you’d like to. It may even have become a maintenance issue! Moreover, we have more important stuff to do, and it’s rarely on par with what you could obtain with more optimizations and preprocessing.

Any framework?

I sometimes feel the need for a library that would make incremental evaluation smoother. I know of one good framework for the metaheuristics part (ParadisEO), but I dont know of any framework to implement incremental evaluation of the cost function. When tackling some large scale problems, this is where the work is spent (both for the humans and the computer).

If you know one such framework, I’d like to hear from you!

Full disclosure: I am going to work at LocalSolver starting in November. A powerful modeling language makes modeling much easier, but generic solvers do not allow you to plug your own search engine: sometimes I’d like the best of both worlds.

C++ reminder #3

A surprise with template type deduction and implicit conversion, when summing an array of float:

vector<float> to_sum = {0.5, 0.5, 0.5, 0.5};
cout << accumulate(to_sum.begin(), to_sum.end(), 0) << endl;
cout << accumulate(to_sum.begin(), to_sum.end(), 0.0) << endl;
cout << accumulate(to_sum.begin(), to_sum.end(), 0.0f) << endl;

This will print:

0
2
2

The template parameter is inferred from the scalar in the last argument: 0, 0.0 or 0.0f. (the first two arguments are iterators). The elements are converted implicitly to this type before summation.

  1. int type. Obviously not what was intended
  2. double. OK, but more precise and slower than float
  3. float. Probably the intent, but we need the 0.0f or an explicit cast

 

Note: this is just an example; if you care about precision or performance this is not the right way of summing floating point numbers.

C++ reminder

C++ is hard. We often see code written in the hope of avoiding a specific class of errors. For example, if we reset your pointers to NULL once they are deleted, they can never be deleted twice, right?

struct foo {
  char *p;
  ~foo();
};
foo::~foo() {
  delete p;
  p = nullptr;
}

Check that

Right, the reset gets optimized away. Since it’s forbidden to read a member from a deleted object, the compiler assumes that the member p can never be read again, and removes the write. Only with optimization disabled do we have the expected behaviour.

Morality: C++ is hard and workarounds rarely work as you would expect them to.

An introduction to netlists

In the electronic industry, there is one common denominator to all the tools and all the teams. Whether we are doing ASIC design, programming FPGAs or formally verifying chips, there is one thing we all use in our software: it is the representation of the design, called the netlist.

The netlist is the representation of the connections of the design. Although it probably originated from physical design representations, it is now a common representation throughout the compiler, from the high-level code to the final physical layout.

This post is the first in a series on the netlist, to present the basic concepts. Next ones will go more in depth on the operations performed on a netlist, the APIs that are provided to access it, and the actual representation in a programming language.

What’s in a netlist

A netlist represents the connections between the components of a design. Here is what it looks like:

A module containing two gates (and & or)
A simple module: two cells and four nets; net names are not shown

Let’s call the component “cells” and the connections “nets”. Each cell has an interface: each element of this interface is called a “port”. Note that everything has a name, and that the ports generally have a direction as well. For the computer scientists among us, this is a lot like a graph.

Now, imagine we reuse mod in another netlist. It might just be a cell among others. some of those cells may actually be complex components, containing other, smaller cells: this creates a hierarchy of nested cells, some of which may be reused.

A toy CPU netlist, with cell reuse for the cores and ALUs

For example, a toy CPU: core is defined once, but used twice, creating two “instances”, core1 and core2. Like in software design, reuse is encouraged: a “module” (core, ALU) will be instantiated several times (core1, core2, core1.ALU1, core1.ALU2…). This reuse is there to make the designers’ and tools’ jobs easier: once we create the chip, there will be one copy of core for each core1, core2….

Netlists everywhere

We have seen that the netlist came from the physical design world, but at the same time its quirks are present in “high-level” languages like Verilog and VHDL.

The consequence is that a netlist is the go-to format for everyone working in EDA… but there are many different netlist representations. For example:

  • Netlists targeting physical design with representations of the actual wiring as polygons. Coriolis’ Hurricane is a good example
  • Netlists targeting logical optimization, with specialized bit-level representations. This is the case in ABC
  • Higher-level netlists for synthesis and verification, with high-level operators, like Yosys
  • Generic netlists, that provide a generic API without application-specific utilities, like GBL

Since they are central to all those tools, their design as a lot of impact. First, on the overall development of the tools: an easy-to-use API is a must. Second, on the performance, to process millions to billions of gates efficiently. But let’s keep those for the next post.

My optimization toolbox

The principle of optimization is simple: get the best possible solution on a problem, subject to a given cost and problem-specific constraints. It is important in many areas, from economics and planning to electronics design. Although the theory is useful and very important for tool writers, it is more of an art for the user. However, the more optimization problems I see, the more I learn to reuse the same old tricks. I wanted to look back on what I learned and why.

When I tackled my first optimization problems, I had a strong tendency to write custom algorithms, whether basic local-search algorithms, brute-force or problem-specific clever algorithms. Now I see this approach as a big loss of time: we have a huge toolbox of powerful tools and modeling approaches that usually work better, and are much faster to try in any case. Even for the rare cases where these tools wouldn’t work in practice, they are very helpful to get a better idea of your problem. The big question is “which tool and which model?”.

Most of the work I did could be tackled with integer programming or continuous optimization, and I think all programmers who need to optimize something — in particular people working on automation of electronic design — should at least know when to try them.

Integer programming

Integer linear programming is probably the most important tool you will ever use: it can model almost anything if you know a few tricks, while being efficient on most problems. It optimizes a linear cost function, subject to linear inequalities on integer or real variables. Even on non-linear problems, an approximate linear programming model can be much better and faster than anything else.

There are some good free tools (I use GLPK and its modeling language, GMPL). On most problems it is my first modelling attempt. I discovered recently that it was able to use Minisat for pseudo-boolean problems (when all the variables are boolean): you get a completely different optimization algorithm for free, that can be much more efficient on some problems.

Huge problems

On the other hand, I had to deal with some huge uncontrained problems, in the millions of variables. It may not be something you need to use every day, but the conjugate gradient method and Nesterov’s method are interesting to learn about. There is a bunch or libraries you can use for convex optimization in C++, R, Python, Julia…

For complex and huge problems, such as a big travelling salesman or vehicle routing problem, you can’t do much: either there is a library for this specific problem, or you write some ad-hoc heuristics from scratch.

Ad-hoc algorithms

Now I  don’t think it is a good idea to ever consider this as a first approach if you want to solve a problem optimally, but for big or badly defined problems a heuristic algorithm is generally the only way. Simple local search is generally a good starting point.

Depending on the problem, even greedy algorithms can give good result. I often try simple permutations first, like the Lin-Kerningham heuristic, and maybe simulated annealing. I find genetic algorithms are more of a buzzword, and quite overrated.

Back to exact techniques

Even if an exact technique will not work for a difficult problem, it is often useful to try it anyway. It constrains you to model your problem, simplify it, and make it fit in a mathematical framework. Moreover, it can give you a hint of how bad your current algorithms are on small instances, and provide a subroutine for local search on your big problem.

In design automation, integer programming and the like aren’t used very often. In other fields, in particular planning, integer programming and the likes have proven invaluable long ago: give it a try.

 

Gridless routing?

During a few weeks, I have been working on the router. I abandonned the project for a moment – there is still a lot of low hanging fruit in placement research – but it is nonetheless an interesting project .

Routing is the step where the wires are drawn, and on ASIC circuits (I don’t know about FPGAs) it consists of two steps: a global routing steps that balances congestion and makes choices regarding the overall shapes, and a detailed routing step which actually places the wires, hopefully without overlaps or design rules violations.

Router limitations

Routers tend to be limited in what they can do, for physical and algorithmical reasons. For example, all practical digital routers use Manhattan wiring, with only horizontal and vertical wires. This is actually worth it: it greatly simplifies the algorithms and probably even the manufacturing processes. In fact, every attempt to generalize to non-Manhattan routing layers failed.

Another sensible limitation is the definition of a preferred direction for each routing layer: it is friendlier to sequential routing methods, where wires would tend to obstruct each other if no constraint is introduced.

The limitation I am writing about is the use of a routing grid during detailed routing. That is, the wires are placed using a constant pitch in each layer, generally wide enough to guarantee design rules conformance. This scheme is amenable to efficient representations: as a 2D array or, with a preferred direction, as a list of segments for each track. All usual routing algorithms (maze routing, channel or switchbox routing) work on such a data structure.

The grid is amenable to efficient representations
The grid is amenable to efficient representations. Maze routing algorithm generally use a 2D grid (a) while Kite is track-oriented (b)

Is the grid a problem?

It would be better if the router was not limited by the grid, but the grid isn’t necessarily a bad thing: if every wire is drawn at minimum width with a uniform spacing anyway, there is no gain to expect from a gridless router.

However, it isn’t necessarily true on newer design processes. Drawing wires of various widths can improve the RC for critical wires. On the other hand, non-uniform spacing can potentially improve crosstalk and is a less pessimistic approach to design rule conformance when minimum spacing varies with the wire’s length and shape. Research on gridless routing can make the router much more flexible on wire sizing and spacing.

Data structure bloat in gridless routers

The difficulty of gridless routing is the definition of an efficient datastructure. Detailed routing needs to find paths for each net and typically will request neighbouring segments or whitespace, or segments in a given area. Such requests consume most of the detailed router runtime. Switching to a gridless architecture will make this accesses slower and is going to have a HUGE effect on runtime.

Tiling and corner stitching

The most versatile algorithm, maze routing, works on non-uniform grids as well, and a straightforward approach to gridless routing is to define such a grid based on the boundaries of already placed segments.

The same routing pattern and its corresponding tiling
The same routing pattern and its corresponding tiling

In memory, this tiling is generally represented with pointers to the neighbours. In “corner stitching”, used by the Magic layout tool, there are only 4 pointers at the corners. The problem with such structures is that they involve a lot  of pointer chasing and are heavy on memory. In the case of corner stitching, the algorithms to access and modify the datastructure tend to be complicated.

Spatial indexing

Another approach is to apply a coarse-grained grid or a quadtree on the routing area: this makes area queries much faster and the datastructure smaller. However, maze routing, which relies on an explicit tiling, cannot be performed with this structure.

I think that structures based on such a coarse-grained grid are more efficient than tiling: they are smaller and should yield simpler and faster queries… but they are not suitable for current routing algorithms.

Simpler algorithms make fast gridless routing possible

The most common routing algorithm, maze routing, is almost unusable for gridless routing: it is painfully slow on complex tiled structures, and not amenable to spatial indexing. There is a need to reconsider routing algorithms for gridless routing, and luckily it turns out that this work has already been done by the maintainer of Coriolis.

Wires are mostly straight

Even when the wiring is quite dense, there is no need to introduce many bends.
Even when the wiring is quite dense, there is no need to introduce many bends.

When we look at a routed circuit, a striking result is that most wires are straight. Although they may use a few doglegs, or make detours in some extreme cases, complex patterns are uncommon even on highly congested instances. Coriolis Kite’s approach has been to focus on handling this common case efficiently.

While Kite uses a grid, it doesn’t use a classical maze, channel or switchbox router and tries to place the segments without breaking them. The main loop finds free space for a straight segment and doesn’t need a fully fledged maze algorithm. This makes it very promising for gridless routing.

Kite’s algorithm to get rid of tiling

If there is no need for maze routing except for a few difficult nets, the main representation can be much simpler, such as a coarse-grained grid. Finding a segment embedding is more expensive than on a track-based structure, but can still be relatively quick.

I will try to make it work before the end of my internship. It could be straightforward, but I expect it to be relatively painful: I didn’t design Kite’s algorithms and they took a lot of experience and trial-error steps to get right. However, they seem to be just the right approach for gridless routing, and I can’t wait to see how the datastructure will perform in practice.

Sidenote: there is no such thing as independent optimization passes

During my work on the router, my first attempt was not on a gridless router but on a simple amelioration of the global router. It was apparently promising… but although it seemed better on routing metrics than the old global router, it failed spectacularly during detailed routing.

It turns out that all tools are intertwined, and in some cases you can’t compare two good tools based on simple metrics: you need to take into account every surrounding tools. This is one of the reasons why this discussion remains mostly theoretical: writing a new router is not a simple plug-and-play operation, especially when the existing tool is already pretty optimized.

Coriolis’ legalizer is unexpectedly good

I spent part of the weekend trying to improve the legalization algorithm in Coloquinte – now integrated in Coriolis. It turns out that I didn’t manage to obtain any improvement: every change I tried obtained worse results than the current version.

The current algorithm doesn’t even legalize the true placement: it works on a modified placement obtained by heuristically solving a flow problem. Although this placement is supposed to be close to the best legal placement, I didn’t expect it to stand against more sophisticated heuristics that use it only as a hint.

I think that what makes it so good is that this first pass is inherently imprecise and optimistic: within the finer grained window, it doesn’t move the cells as much as it should. They are moved very close to their final position, but still a bit off in the direction of their target position. Therefore, the second pass receives a good hint placement which is still a bit biased toward the target.

Whatever the reason is, I’m finished with the legalizer. I don’t think there is much more work to do in this area, and I’m convinced that it is better than published works like Abacus and HiBinLegalizer. Sadly, benchmarking against them involves finding and installing them and MPL, which is all but trivial. Our legalizer should be good enough for us now (at least for standard cells, big macros are still a pain).

Benchmark results

The ISPD conference organizes a design automation contest each year. It provides a lot of public benchmarks and results to compare to. Although Coloquinte does not yield the best results yet, it compares favorably against most published results. Here are some screenshots of wirelength-driven placements on some ISPD05 benchmarks.

Bigblue1 is an "easy" placement, where there are few fixed macros
Bigblue1 is an “easy” placement, where there are few fixed macros in the center
adaptec2
Adaptec2 is more difficult: the fixed macros make it difficult to spread the placement