Coloquinte report and presentation

I just realized that the report and presentation I wrote about Coloquinte were never made public. Here they are, under CC-BY-SA. Please use it as you see fit: to learn about placement algorithms, as a bibliography, or maybe to improve Coloquinte or other tools!

Coloquinte VLSI placer report

Coloquinte VLSI placer slides

Coriolis’ legalizer is unexpectedly good

I spent part of the weekend trying to improve the legalization algorithm in Coloquinte – now integrated in Coriolis. It turns out that I didn’t manage to obtain any improvement: every change I tried obtained worse results than the current version.

The current algorithm doesn’t even legalize the true placement: it works on a modified placement obtained by heuristically solving a flow problem. Although this placement is supposed to be close to the best legal placement, I didn’t expect it to stand against more sophisticated heuristics that use it only as a hint.

I think that what makes it so good is that this first pass is inherently imprecise and optimistic: within the finer grained window, it doesn’t move the cells as much as it should. They are moved very close to their final position, but still a bit off in the direction of their target position. Therefore, the second pass receives a good hint placement which is still a bit biased toward the target.

Whatever the reason is, I’m finished with the legalizer. I don’t think there is much more work to do in this area, and I’m convinced that it is better than published works like Abacus and HiBinLegalizer. Sadly, benchmarking against them involves finding and installing them and MPL, which is all but trivial. Our legalizer should be good enough for us now (at least for standard cells, big macros are still a pain).

Benchmark results

The ISPD conference organizes a design automation contest each year. It provides a lot of public benchmarks and results to compare to. Although Coloquinte does not yield the best results yet, it compares favorably against most published results. Here are some screenshots of wirelength-driven placements on some ISPD05 benchmarks.

Bigblue1 is an "easy" placement, where there are few fixed macros
Bigblue1 is an “easy” placement, where there are few fixed macros in the center
adaptec2
Adaptec2 is more difficult: the fixed macros make it difficult to spread the placement

 

Low density placements

Variable densities are placements are now possible in Coloquinte. I am quite happy with the way it is done, by changing the placement region density rather than the cells’ areas like in other tools.

A uniform density placement after the rough legalization pass
A uniform density placement after the rough legalization pass
The legalized placement: this placement is 50% whitespace
The legalized placement: this placement is 50% whitespace
The same circuit after a few detailed placement passes. Since the detailed placement is not routing-aware, we already see some artifacts
The same circuit after a few detailed placement passes. There are some artifacts since the detailed placement is not density-aware yet

I implemented it using a line sweep algorithm, which makes it flexible: it can be fed overlapping regions and handles both macros and limited density regions in the same way. It remains to be seen if I can make a satisfying routing-driven placer.

 

First routed circuits

Coloquinte is now used by the Coriolis toolchain. This means that you can now route the circuits obtained – and with an open-source tool.

Close view of the routed circuit

Here are some screenshots of a circuit obtained with Coloquinte and Kite (the router). The next big step is to make congestion-aware placements: although Coloquinte and Kite optimize well enough that there is no routing failure on small circuits, this feature is mandatory in an industrial tool.

A legal placement of the circuit

I find it very motivating to finally see what the algorithms do. Running the algorithm on a true circuit rather than ISPD benchmarks has caught some silly bugs, but now I’m adding features again.

The circuit after routing

The road so far

It has been three months since I began this project, and it is time for a checkpoint. I just implemented the last mandatory algorithm, and Coloquinte’s algorithms are now usable to build a circuit.

It does not support standard formats, and since I still didn’t program the backend the results are still ~15% from the best state of the art. But all the core algorithms have been implemented.

The optimizer

The core wirelength optimizer has been the first part I implemented. I chosed to write the sparse linear algebra code from scratch rather than using a library. The whole thing is in fact closer to a true non-linear optimizer, since it never completely solves the linear system, but overall it is very similar to other placement tools.

Look-ahead legalization

This tool modifies a solution so that it meets cell density constraints: it drives the optimizer toward an almost “legal” solution, where there are no more cells in a region than allowed by its area. It is although used as a pre-processing step of a true legalization.

It is very simple but extremely different from other algorithms. It distributes the cells in the regions and improves the distribution between nearby regions. There are powerful algorithms to do it, but the simplest heuristics seem to work best.

 Legalizer

The legalizer is the final step to obtain a correct placement. It places the cells so that they do not overlap, hopefully as close to the optimized solution as possible.

The naive way to do it is to sort the cells by position and chose the best position for each cell sequentially. Coloquinte’s legalizer uses a single row problem, which allows it to push already legalized cells to obtain a better solution (in Coloquinte it optimizes the displacement, but other legalizers did it for the quadratic displacement). Compared with the other solution, it is extremely stable.

Next steps

Obviously, a good global placement doesn’t replace a detailed placement optimization: it is the next big step. There are still a lot of things to be done, and a lot of naive heuristics where we could do better, but detailed placement should finally close the gap with state of the art tools.

 

A floorplanning algorithm for analog circuits

Floorplanning is the process of placing a set of rectangular blocks on a chip. Huge blocks, like a RAM or a whole circuit module. As such, it is a very specific domain in placement: floorplanning usually handles less than 100 cells (to be compared with the ~100000 handled by other methods) but with entirely different algorithms.

Since the laboratory where I work has an important analog toolchain – for the design of circuits that are not purely digital, typically for wireless transmissions or sensors – I gave a try to algorithms to place the resulting transistors.

The constraints for analog circuit placement

I am not an analog designer myself, so I tried to gather informations about the needs in the analog world. The recurring answer was “they want to chose” between different possibilities, because the designers’ knowledge is not easily mimicked by an algorithm yet.

Providing this freedom is probably the hard part, but there were more satisfying answers for me too, that could lead directly to a toy implementation. Symmetry constraints, in order to mitigate process variations. Proximity constraints. Area minimization. Routing corridors. Those are easier to translate into an algorithm.

Moreover, analog circuits have some freedom during placement: the transistors are big, and it is possible to change their aspect ratio.

Integer programming for floorplanning

Most tools for floorplanning work on some kind of topological representation: block a is above block b, which is on the left of block c. This is usually limited to area optimization only, but linear programming can handle more complex situations.

In those situations, including deformable blocks and wirelength minimization, linear programming can yield a solution for the given topology. Rather than writing complex data structures to represent the topology, I included it in an integer programming model: it is simpler, requires less code, and ideally it would even prove the solution’s optimality.

Results

With the tool I use (GLPK), I obtain what I consider to be good results. On a small real benchmark with 7 cells, it proves optimality relatively quickly. It is generally not able to prove optimality as soon as there are more than 10 cells, although local search heuristics are extremely efficient.

It remains to be seen whether such a floorplanner would be useful to an analog designer. It seems to me that obtaining a good enough placement quickly is important, but they would surely want to fine-tune it. For this trial-and-error process, tuning the model may be the way to go, giving block deformation and shifting for free whenever a cost or a constraint is modified.

This work gave me some insight in linear programming for digital detailed placement: I am experimenting with similar models to place standard cells in the main tool.

Coloquinte’s architecture

As I said in the previous post, Coloquinte will be built around an analytical placement algorithm alternating between convex placement optimization and cell spreading: it is one of the most efficient methods published so far.

My early prototypes worked, but the spreading algorithm was a bit too simple to cope with the benchmarks. More importantly, they lacked a clean architecture.

The new goal is to work well on a large class of circuits and the tool should be more modular. It is meant to interface loosely with Paris 6 university’s tools: the Coriolis toolchain from the Lip6 laboratory, where I work part-time from next week, and its database Hurricane.

Overall architecture

It should be splitted between 3 independent blocks that transform the Hurricane representation: the global placer, the legalizer and the detailed placer. In order to keep control over the optimization passes applied, each tool is scriptable in Python. This is particularly important for the global placer, where everything is tunable from the spreading schedule to the net models.

The tools interface through Hurricane. Each tool is a collection of C++ modules controlled by a Python script
The tools interface through Hurricane. Each tool is a collection of C++ modules controlled by a Python script

The global placer

The global placer uses a technique known as quadratic placement where the wires are modelled as elastic springs: this yields a sparse linear system, which can be solved approximately in linearithmic time. It is not as artificial as it seems: it is in fact very close to Newton’s method for optimization.
This quadratic placement alternates with a spreading phase, whose result is fed back to “anchor” the next placement iteration. Everything is tunable: I intend to let the user chose the net model, the legalization algorithm and how strong the cells are pulled toward the legal position. Each of these components gets a Python interface so that the whole optimization is scriptable.

You may note that I include “Companion tools” that could interface with the global placer: such tools are necessary when optimizing for complex objectives. I build Coloquinte so that it can cope with other tools with minor modifications in case we want to extend it.

Post-processing

The detailed placer and legalizer need not as much controllability, but I am going to use the same architecture nonetheless: the more the designer can control, the better the result. Ideally, we would try multiple combinations of the optimization passes to obtain the best tradeoffs: they can be distributed as several Python scripts, possibly completely different.

Today, I begin to program the legalizer. Since this algorithm is completely new, I am probably not going to talk about it anytime soon, but I have a (partially) new one for detailed placement that I am going to present in the next post.

Why I chosed analytical placement

The choice between analytical placement and partitioning is a crucial and early one. For Coloquinte, however, I made this choice a long time ago.

There are two obvious reasons: analytical placement generally performs a little better and I know it better.

However, there are deeper reasons why I want to commit to using analytical placement. In my opinion, analytical placement yields better flexibility and modularity.

Flexibility

First, it is tunable with simple but numerous parameters. By changing the cost function, you can in theory optimize for timing and power. The spreading forces control the tradeoff between speed of convergence and solution quality.
With partitioning-based placement, you need to design whole new algorithms for power, timing and congestion optimization (which many papers did). With a complete analytical placer, you can test parameter tuning within minutes: change the cost function and the spreading schedule, then execute. And there are a lot of cost functions to chose from.

Examples of wirelength-only net models for analytical placement: Half-perimeter wirelength, star, rectilinear minimum spanning tree and Steiner tree
Examples of net models for analytical placement: Half-perimeter wirelength, star, rectilinear minimum spanning tree and Steiner tree

Modularity

Another important property of some analytical placement algorithms is that they modify an existing placement at each step, not build it from scratch. For this reason, it is convenient to perform netlist or placement changes. Whether to interface it with other programs (performing cell resizing and buffering for example) or to enable engineering change orders (ECOs), it is a useful feature. I suppose it makes it easier to adapt the placer to new problems, with all the performance-critical code factored between the solver and the legalizer.

Therefore, I am going to use an analytical placement that continuously spreads the cells: if region constraints are used, circuit modifications are not as easy. Additionally, this method is the leading one on benchmarks: using it should improve both flexibility and efficiency.

That doesn’t mean writing a complete placement tool is easy: external tools are needed to analyze timing and congestion, but with analytical placement the interface may be very loose and quite modular. I am going to show Coloquinte’s interface in the next post.