I just realized that the report and presentation I wrote about Coloquinte were never made public. Here they are, under CC-BY-SA. Please use it as you see fit: to learn about placement algorithms, as a bibliography, or maybe to improve Coloquinte or other tools!
I spent part of the weekend trying to improve the legalization algorithm in Coloquinte – now integrated in Coriolis. It turns out that I didn’t manage to obtain any improvement: every change I tried obtained worse results than the current version.
The current algorithm doesn’t even legalize the true placement: it works on a modified placement obtained by heuristically solving a flow problem. Although this placement is supposed to be close to the best legal placement, I didn’t expect it to stand against more sophisticated heuristics that use it only as a hint.
I think that what makes it so good is that this first pass is inherently imprecise and optimistic: within the finer grained window, it doesn’t move the cells as much as it should. They are moved very close to their final position, but still a bit off in the direction of their target position. Therefore, the second pass receives a good hint placement which is still a bit biased toward the target.
Whatever the reason is, I’m finished with the legalizer. I don’t think there is much more work to do in this area, and I’m convinced that it is better than published works like Abacus and HiBinLegalizer. Sadly, benchmarking against them involves finding and installing them and MPL, which is all but trivial. Our legalizer should be good enough for us now (at least for standard cells, big macros are still a pain).
Variable densities are placements are now possible in Coloquinte. I am quite happy with the way it is done, by changing the placement region density rather than the cells’ areas like in other tools.
I implemented it using a line sweep algorithm, which makes it flexible: it can be fed overlapping regions and handles both macros and limited density regions in the same way. It remains to be seen if I can make a satisfying routing-driven placer.
It has been three months since I began this project, and it is time for a checkpoint. I just implemented the last mandatory algorithm, and Coloquinte’s algorithms are now usable to build a circuit.
It does not support standard formats, and since I still didn’t program the backend the results are still ~15% from the best state of the art. But all the core algorithms have been implemented.
The core wirelength optimizer has been the first part I implemented. I chosed to write the sparse linear algebra code from scratch rather than using a library. The whole thing is in fact closer to a true non-linear optimizer, since it never completely solves the linear system, but overall it is very similar to other placement tools.
This tool modifies a solution so that it meets cell density constraints: it drives the optimizer toward an almost “legal” solution, where there are no more cells in a region than allowed by its area. It is although used as a pre-processing step of a true legalization.
It is very simple but extremely different from other algorithms. It distributes the cells in the regions and improves the distribution between nearby regions. There are powerful algorithms to do it, but the simplest heuristics seem to work best.
The legalizer is the final step to obtain a correct placement. It places the cells so that they do not overlap, hopefully as close to the optimized solution as possible.
The naive way to do it is to sort the cells by position and chose the best position for each cell sequentially. Coloquinte’s legalizer uses a single row problem, which allows it to push already legalized cells to obtain a better solution (in Coloquinte it optimizes the displacement, but other legalizers did it for the quadratic displacement). Compared with the other solution, it is extremely stable.
Obviously, a good global placement doesn’t replace a detailed placement optimization: it is the next big step. There are still a lot of things to be done, and a lot of naive heuristics where we could do better, but detailed placement should finally close the gap with state of the art tools.
There is a striking similarity between VLSI placement and scheduling. Digital electronics circuits are built from “standard cells”, logic gates with a unique height. They are organized in rows, so that the power wires run between two rows of standard cells.
Tasks must be scheduled on machines so that they do not conflict, standard cells must be placed in rows so that they do not overlap. The only difference between scheduling tasks and placing logic gates is the cost function, generally more complex for placement than for scheduling.
Placement problems have never been formulated as scheduling problems: it is no surprise that there has been few communication between the fields, the journals and laboratories being separate and the goals completely different. But it seems to me that we can solve some placement subproblems using scheduling techniques.
Scheduling is the problem of assigning tasks to machines. They are constrained by their execution times, with independent costs for each task.
In standard cell placement, the cost is usually much more complicated: the wirelength involves distances between tasks, and cannot be expressed a a sum of independent task costs. However, for a given ordering of the cells in the rows, it is possible to find the corresponding optimal positions by using linear programming (more precisely, network flow). This gives us a simple optimization pass to use during placement but it is still a bit expensive.
Therefore, for standard cell placement as well as scheduling, researchers have considered the simpler problem where there is only one row or one machine, still with fixed ordering. The cost function is a sum of independent piecewise linear functions, and hopefully there are faster algorithms to solve the problem.
Fast algorithms for the ordered single row problem
Indeed, this problem is much easier to solve in practice. The clumping algorithm solves this problem in worst case quadratic time, and is usually much faster. The idea behind it is to add the cells at the end of the row and push it to the next pin until its placement is optimal: the final placement is optimal due to the convexity of the cost function.
This quadratic worst case complexity is annoying: several publications managed to improve this runtime up to m log² m where m is the number of pins. However, the datastructures involved are a special type of balanced trees, both slow and difficult to code.
I discovered a simpler m log n algorithm based on a priority queue, but by chance I found out that it is already known for scheduling, as the specialized cascading descent algorithm. The cascading descent is equivalent to the clumping algorithm. The basic idea is to index the pins with a value that does not depend on the cell: this way, merging groups of cells takes constant time. The algorithm uses a single priority queue to hold the pins, hence its better complexity.
Applications to VLSI placement
It isn’t a breakthrough: even naive linear programming is fast enough to optimize a large part of a circuit, so that this improvement might be barely noticeable compared with approximate or quadratic algorithms.
However, there are two extensions that may prove particularly useful: local search and non-convex optimization.
Local search is the process of modifying the placement through simple modifications, like cell swapping. One of my goals is to modify the algorithm in order to support online placement modification with logarithmic complexity. That is, keep an optimal placement through small ordering modifications.
Non-convex optimization is trickier. There can be several local optimas, which makes most such problems extremely difficult. For the special structure of the ordered single row problem, however, there are quadratic algorithms. It means that we can integrate more complex cost functions directly in the ordered single row optimization, for example cell orientation.
Both ideas are relatively new: cell swapping, orientation and position optimization are almost always separate steps (although it is possible, even integer programming models in the litterature do not integrate all of them). I wonder how these methods will play compared to independent optimization passes.
Floorplanning is the process of placing a set of rectangular blocks on a chip. Huge blocks, like a RAM or a whole circuit module. As such, it is a very specific domain in placement: floorplanning usually handles less than 100 cells (to be compared with the ~100000 handled by other methods) but with entirely different algorithms.
Since the laboratory where I work has an important analog toolchain – for the design of circuits that are not purely digital, typically for wireless transmissions or sensors – I gave a try to algorithms to place the resulting transistors.
The constraints for analog circuit placement
I am not an analog designer myself, so I tried to gather informations about the needs in the analog world. The recurring answer was “they want to chose” between different possibilities, because the designers’ knowledge is not easily mimicked by an algorithm yet.
Providing this freedom is probably the hard part, but there were more satisfying answers for me too, that could lead directly to a toy implementation. Symmetry constraints, in order to mitigate process variations. Proximity constraints. Area minimization. Routing corridors. Those are easier to translate into an algorithm.
Moreover, analog circuits have some freedom during placement: the transistors are big, and it is possible to change their aspect ratio.
Integer programming for floorplanning
Most tools for floorplanning work on some kind of topological representation: block a is above block b, which is on the left of block c. This is usually limited to area optimization only, but linear programming can handle more complex situations.
In those situations, including deformable blocks and wirelength minimization, linear programming can yield a solution for the given topology. Rather than writing complex data structures to represent the topology, I included it in an integer programming model: it is simpler, requires less code, and ideally it would even prove the solution’s optimality.
With the tool I use (GLPK), I obtain what I consider to be good results. On a small real benchmark with 7 cells, it proves optimality relatively quickly. It is generally not able to prove optimality as soon as there are more than 10 cells, although local search heuristics are extremely efficient.
It remains to be seen whether such a floorplanner would be useful to an analog designer. It seems to me that obtaining a good enough placement quickly is important, but they would surely want to fine-tune it. For this trial-and-error process, tuning the model may be the way to go, giving block deformation and shifting for free whenever a cost or a constraint is modified.
This work gave me some insight in linear programming for digital detailed placement: I am experimenting with similar models to place standard cells in the main tool.
The choice between analytical placement and partitioning is a crucial and early one. For Coloquinte, however, I made this choice a long time ago.
There are two obvious reasons: analytical placement generally performs a little better and I know it better.
However, there are deeper reasons why I want to commit to using analytical placement. In my opinion, analytical placement yields better flexibility and modularity.
First, it is tunable with simple but numerous parameters. By changing the cost function, you can in theory optimize for timing and power. The spreading forces control the tradeoff between speed of convergence and solution quality.
With partitioning-based placement, you need to design whole new algorithms for power, timing and congestion optimization (which many papers did). With a complete analytical placer, you can test parameter tuning within minutes: change the cost function and the spreading schedule, then execute. And there are a lot of cost functions to chose from.
Another important property of some analytical placement algorithms is that they modify an existing placement at each step, not build it from scratch. For this reason, it is convenient to perform netlist or placement changes. Whether to interface it with other programs (performing cell resizing and buffering for example) or to enable engineering change orders (ECOs), it is a useful feature. I suppose it makes it easier to adapt the placer to new problems, with all the performance-critical code factored between the solver and the legalizer.
Therefore, I am going to use an analytical placement that continuously spreads the cells: if region constraints are used, circuit modifications are not as easy. Additionally, this method is the leading one on benchmarks: using it should improve both flexibility and efficiency.
That doesn’t mean writing a complete placement tool is easy: external tools are needed to analyze timing and congestion, but with analytical placement the interface may be very loose and quite modular. I am going to show Coloquinte’s interface in the next post.
The last post gave a quick introduction to the synthesis flow. Since it is the purpose of the tool, I am going to focus on placement algorithms here. Let’s leave all the other topics for now, even if high-level optimization, boolean optimization and technology mapping are equally interesting.
Placement may optimize the integrated circuit for several metrics: power, timing, area and, more importantly, congestion. That is, it must pass a feasible problem to the subsequent routing stage.
The most basic objective is wirelength: it correlates well with all the other metrics and is the primary one for most benchmarks.
So, how does a placer work? Early placers used meta-heuristics, such as simulated annealing. Even with clustering techniques, it becomes difficult to deal with large circuits. Both the industry and the academic world stopped using this technique during the 80s, and they began using two steps: a simplified global placement problem is solved to obtain approximate positions for the cells. It is followed by a legalization to obtain a correct circuit and a detailed placement phase then corrects local suboptimalities.
The global placement phase may be the most important one since it captures the problem’s complexity, while detailed placement is mostly about refinement. Two classes of algorithms have been developped for global placement: partitioning-based placement and analytical placement.
Placement through graph partitioning
You can see the netlist to place as an (hyper)graph: the cells are the vertices and the nets are the (hyper)edges. Placing the circuit with minimal wirelength is the problem of cutting the graph with minimal cost: you assign parts of the circuit to placement regions with as few wires between them as possible. By partitioning into gradually smaller regions, you can obtain a circuit with few global wiring. This weighted min-cut problem is complex but can be solved efficiently with heuristics: it is at the heart of partitioning-based placement.
Today, the partitioning-based tools are much more complex to capture timing and congestion: all tools, from Capo from the university of Michigan to Dragon and Fengshui have further heuristics to drive partitioning.
On the other hand, you may model the wirelength as a continuous function of the cells’ positions: this is the idea behind analytical placement. With a convex model, you may use efficient optimization algorithms. However, the obvious optimal solution is to cluster all cells together, whereas you want to return an almost overlap-free placement. In order to spread the cells, analytical placement needs a companion algorithm that modifies the cost function or constrains the cells to separate regions. The algorithms alternate between optimization and spreading until a sufficiently refined solution is reached.
The choice of these spreading heuristics varies considerably between placement tools. Early tools constrained the cells to regions: it is still the method used by the commercial tool BonnPlace. More recent academic tools pull the cells toward a legal position, the calculation of which is generally purely heuristic. Kraftwerk simulated repulsion with an electrostatic potential while tools like Fastplace, SimPL or MPl just try to spread the cells with partitioning-like heuristics. Notable exceptions are tools that model the legalization problem as a flow problem, which have more mathematical justifications.
Note that no tool that I know of is available for more than benchmarking: they all provide binary-only versions. This is a sufficient reason to create an open tool, and I hope that Coloquinte will fill this gap.
I will not publish code before I have something clean enough for the global placer, but I hope to open the Git within two (one?) months and to gradually publish the algorithms.
In the next post, I will discuss the choice of a specific analytical placement algorithm.