97% density and a clock tree

Further testing and debugging during the last few days. Coloquinte hit some bugs in Coriolis that we (well, my advisor) solved, and we uncovered minor bugs in both tools in the process.

We tested Coloquinte with the clocktree generator, and our small test design is easily placed and routed. Since the clocktree generator yields a lot of preplaced repeaters and used to crash during post-processing I expected to find bugs in Coloquinte, but the python interface turned out to be guilty.

A placement at 97% density with ~5000 cells, and a routed clocktree
A placement at 97% density with ~5000 cells, and a routed clocktree

While my advisor makes Coriolis’ routing and clocktree synthesis work with Coloquinte, I am making it routing aware: it must be able to balance its target density with the routing demand.

It doesn’t account for congestion yet, and just packs the cell as densely as it can. For this small design, routing is not a problem even at full density, but a feedback loop will be needed for bigger ones.

First routed circuits

Coloquinte is now used by the Coriolis toolchain. This means that you can now route the circuits obtained – and with an open-source tool.

Close view of the routed circuit

Here are some screenshots of a circuit obtained with Coloquinte and Kite (the router). The next big step is to make congestion-aware placements: although Coloquinte and Kite optimize well enough that there is no routing failure on small circuits, this feature is mandatory in an industrial tool.

A legal placement of the circuit

I find it very motivating to finally see what the algorithms do. Running the algorithm on a true circuit rather than ISPD benchmarks has caught some silly bugs, but now I’m adding features again.

The circuit after routing

On scheduling algorithms applied to VLSI placement

There is a striking similarity between VLSI placement and scheduling. Digital electronics circuits are built from “standard cells”, logic gates with a unique height. They are organized in rows, so that the power wires run between two rows of standard cells.

Tasks must be scheduled on machines so that they do not conflict, standard cells must be placed in rows so that they do not overlap. The only difference between scheduling tasks and placing logic gates is the cost function, generally more complex for placement than for scheduling.

Placement problems have never been formulated as scheduling problems: it is no surprise that there has been few communication between the fields, the journals and laboratories being separate and the goals completely different. But it seems to me that we can solve some placement subproblems using scheduling techniques.

Problem formulation

Scheduling is the problem of assigning tasks to machines. They are constrained by their execution times, with independent costs for each task.

In standard cell placement, the cost is usually much more complicated: the wirelength involves distances between tasks, and cannot be expressed a a sum of independent task costs. However, for a given ordering of the cells in the rows, it is possible to find the corresponding optimal positions by using linear programming (more precisely, network flow). This gives us a simple optimization pass to use during placement but it is still a bit expensive.

Therefore, for standard cell placement as well as scheduling, researchers have considered the simpler problem where there is only one row or one machine, still with fixed ordering. The cost function is a sum of independent piecewise linear functions, and hopefully there are faster algorithms to solve the problem.

A standard cell with one pin on the same row and two on other rows, with the associated cost function
A standard cell with one pin on the same row and two on other rows (a), an equivalent placement of the pins (b), and the associated cost function (c)

Fast algorithms for the ordered single row problem

Indeed, this problem is much easier to solve in practice. The clumping algorithm solves this problem in worst case quadratic time, and is usually much faster. The idea behind it is to add the cells at the end of the row and push it to the next pin until its placement is optimal: the final placement is optimal due to the convexity of the cost function.

A group of cells is pushed while it decreases wirelength. Each time a pin is passed, the derivative of the wirelength is updated and the algorithm may stop and add the next cell to the row.
A group of cells is pushed while it decreases wirelength. Each time a pin is passed, the derivative of the wirelength is updated and the algorithm stops and adds the next cell to the row.

This quadratic worst case complexity is annoying: several publications managed to improve this runtime up to m log² m where m is the number of pins. However, the datastructures involved are a special type of balanced trees, both slow and difficult to code.

I discovered a simpler m log n algorithm based on a priority queue, but by chance I found out that it is already known for scheduling,  as the specialized cascading descent algorithm. The cascading descent is equivalent to the clumping algorithm. The basic idea is to index the pins with a value that does not depend on the cell: this way, merging groups of cells takes constant time. The algorithm uses a single priority queue to hold the pins, hence its better complexity.

The absolute position of a cell is the position of the first cell if all cells were clustered. The position of a pin is defined relative to the cell it connects to.
The absolute position of a cell is the corresponding position of the first cell if all cells were clustered. The position of a pin is defined relative to the cell it connects to.

 

Applications to VLSI placement

It isn’t a breakthrough: even naive linear programming is fast enough to optimize a large part of a circuit, so that this improvement might be barely noticeable compared with approximate or  quadratic algorithms.

However, there are two extensions that may prove particularly useful: local search and non-convex optimization.

Local search is the process of modifying the placement through simple modifications, like cell swapping. One of my goals is to modify the algorithm in order to support online placement modification with logarithmic complexity. That is, keep an optimal placement through small ordering modifications.

Non-convex optimization is trickier. There can be several local optimas, which makes most such problems extremely difficult. For the special structure of the ordered single row problem, however, there are quadratic algorithms. It means that we can integrate more complex cost functions directly in the ordered single row optimization, for example cell orientation.

Both ideas are relatively new: cell swapping, orientation and position optimization are almost always separate steps (although it is possible, even integer programming models in the litterature do not integrate all of them). I wonder how these methods will play compared to independent optimization passes.

A floorplanning algorithm for analog circuits

Floorplanning is the process of placing a set of rectangular blocks on a chip. Huge blocks, like a RAM or a whole circuit module. As such, it is a very specific domain in placement: floorplanning usually handles less than 100 cells (to be compared with the ~100000 handled by other methods) but with entirely different algorithms.

Since the laboratory where I work has an important analog toolchain – for the design of circuits that are not purely digital, typically for wireless transmissions or sensors – I gave a try to algorithms to place the resulting transistors.

The constraints for analog circuit placement

I am not an analog designer myself, so I tried to gather informations about the needs in the analog world. The recurring answer was “they want to chose” between different possibilities, because the designers’ knowledge is not easily mimicked by an algorithm yet.

Providing this freedom is probably the hard part, but there were more satisfying answers for me too, that could lead directly to a toy implementation. Symmetry constraints, in order to mitigate process variations. Proximity constraints. Area minimization. Routing corridors. Those are easier to translate into an algorithm.

Moreover, analog circuits have some freedom during placement: the transistors are big, and it is possible to change their aspect ratio.

Integer programming for floorplanning

Most tools for floorplanning work on some kind of topological representation: block a is above block b, which is on the left of block c. This is usually limited to area optimization only, but linear programming can handle more complex situations.

In those situations, including deformable blocks and wirelength minimization, linear programming can yield a solution for the given topology. Rather than writing complex data structures to represent the topology, I included it in an integer programming model: it is simpler, requires less code, and ideally it would even prove the solution’s optimality.

Results

With the tool I use (GLPK), I obtain what I consider to be good results. On a small real benchmark with 7 cells, it proves optimality relatively quickly. It is generally not able to prove optimality as soon as there are more than 10 cells, although local search heuristics are extremely efficient.

It remains to be seen whether such a floorplanner would be useful to an analog designer. It seems to me that obtaining a good enough placement quickly is important, but they would surely want to fine-tune it. For this trial-and-error process, tuning the model may be the way to go, giving block deformation and shifting for free whenever a cost or a constraint is modified.

This work gave me some insight in linear programming for digital detailed placement: I am experimenting with similar models to place standard cells in the main tool.

VLSI global placement algorithms

The last post gave a quick introduction to the synthesis flow. Since it is the purpose of the tool, I am going to focus on placement algorithms here. Let’s leave all the other topics for now, even if high-level optimization, boolean optimization and technology mapping are equally interesting.

Placement may optimize the integrated circuit for several metrics: power, timing, area and, more importantly, congestion. That is, it must pass a feasible problem to the subsequent routing stage.
The most basic objective is wirelength: it correlates well with all the other metrics and is the primary one for most benchmarks.

So, how does a placer work? Early placers used meta-heuristics, such as simulated annealing. Even with clustering techniques, it becomes difficult to deal with large circuits. Both the industry and the academic world stopped using this technique during the 80s, and they began using two steps: a simplified global placement problem is solved to obtain approximate positions for the cells. It is followed by a legalization to obtain a correct circuit and a detailed placement phase then corrects local suboptimalities.

The global placement phase may be the most important one since it captures the problem’s complexity, while detailed placement is mostly about refinement. Two classes of algorithms have been developped for global placement: partitioning-based placement and analytical placement.

Placement through graph partitioning

You can see the netlist to place as an (hyper)graph: the cells are the vertices and the nets are the (hyper)edges. Placing the circuit with minimal wirelength is the problem of cutting the graph with minimal cost: you assign parts of the circuit to placement regions with as few wires between them as possible. By partitioning into gradually smaller regions, you can obtain a circuit with few global wiring. This weighted min-cut problem is complex but can be solved efficiently with heuristics: it is at the heart of partitioning-based placement.

Today, the partitioning-based tools are much more complex to capture timing and congestion: all tools, from Capo from the university of Michigan to Dragon and Fengshui have further heuristics to drive partitioning.

Continous optimization

On the other hand, you may model the wirelength as a continuous function of the cells’ positions: this is the idea behind analytical placement. With a convex model, you may use efficient optimization algorithms. However, the obvious optimal solution is to cluster all cells together, whereas you want to return an almost overlap-free placement. In order to spread the cells, analytical placement needs a companion algorithm that modifies the cost function or constrains the cells to separate regions. The algorithms alternate between optimization and spreading until a sufficiently refined solution is reached.

The choice of these spreading heuristics varies considerably between placement tools. Early tools constrained the cells to regions: it is still the method used by the commercial tool BonnPlace. More recent academic tools pull the cells toward a legal position, the calculation of which is generally purely heuristic. Kraftwerk simulated repulsion with an electrostatic potential while tools like Fastplace, SimPL or MPl just try to spread the cells with partitioning-like heuristics. Notable exceptions are tools that model the legalization problem as a flow problem, which have more mathematical justifications.

Note that no tool that I know of is available for more than benchmarking: they all provide binary-only versions. This is a sufficient reason to create an open tool, and I hope that Coloquinte will fill this gap.
I will not publish code before I have something clean enough for the global placer, but I hope to open the Git within two (one?) months and to gradually publish the algorithms.

In the next post, I will discuss the choice of a specific analytical placement algorithm.