Tag Archives: VLSI

Coloquinte’s architecture

As I said in the previous post, Coloquinte will be built around an analytical placement algorithm alternating between convex placement optimization and cell spreading: it is one of the most efficient methods published so far.

My early prototypes worked, but the spreading algorithm was a bit too simple to cope with the benchmarks. More importantly, they lacked a clean architecture.

The new goal is to work well on a large class of circuits and the tool should be more modular. It is meant to interface loosely with Paris 6 university’s tools: the Coriolis toolchain from the Lip6 laboratory, where I work part-time from next week, and its database Hurricane.

Overall architecture

It should be splitted between 3 independent blocks that transform the Hurricane representation: the global placer, the legalizer and the detailed placer. In order to keep control over the optimization passes applied, each tool is scriptable in Python. This is particularly important for the global placer, where everything is tunable from the spreading schedule to the net models.

The tools interface through Hurricane. Each tool is a collection of C++ modules controlled by a Python script
The tools interface through Hurricane. Each tool is a collection of C++ modules controlled by a Python script

The global placer

The global placer uses a technique known as quadratic placement where the wires are modelled as elastic springs: this yields a sparse linear system, which can be solved approximately in linearithmic time. It is not as artificial as it seems: it is in fact very close to Newton’s method for optimization.
This quadratic placement alternates with a spreading phase, whose result is fed back to “anchor” the next placement iteration. Everything is tunable: I intend to let the user chose the net model, the legalization algorithm and how strong the cells are pulled toward the legal position. Each of these components gets a Python interface so that the whole optimization is scriptable.

You may note that I include “Companion tools” that could interface with the global placer: such tools are necessary when optimizing for complex objectives. I build Coloquinte so that it can cope with other tools with minor modifications in case we want to extend it.

Post-processing

The detailed placer and legalizer need not as much controllability, but I am going to use the same architecture nonetheless: the more the designer can control, the better the result. Ideally, we would try multiple combinations of the optimization passes to obtain the best tradeoffs: they can be distributed as several Python scripts, possibly completely different.

Today, I begin to program the legalizer. Since this algorithm is completely new, I am probably not going to talk about it anytime soon, but I have a (partially) new one for detailed placement that I am going to present in the next post.

Why I chosed analytical placement

The choice between analytical placement and partitioning is a crucial and early one. For Coloquinte, however, I made this choice a long time ago.

There are two obvious reasons: analytical placement generally performs a little better and I know it better.

However, there are deeper reasons why I want to commit to using analytical placement. In my opinion, analytical placement yields better flexibility and modularity.

Flexibility

First, it is tunable with simple but numerous parameters. By changing the cost function, you can in theory optimize for timing and power. The spreading forces control the tradeoff between speed of convergence and solution quality.
With partitioning-based placement, you need to design whole new algorithms for power, timing and congestion optimization (which many papers did). With a complete analytical placer, you can test parameter tuning within minutes: change the cost function and the spreading schedule, then execute. And there are a lot of cost functions to chose from.

Examples of wirelength-only net models for analytical placement: Half-perimeter wirelength, star, rectilinear minimum spanning tree and Steiner tree
Examples of net models for analytical placement: Half-perimeter wirelength, star, rectilinear minimum spanning tree and Steiner tree

Modularity

Another important property of some analytical placement algorithms is that they modify an existing placement at each step, not build it from scratch. For this reason, it is convenient to perform netlist or placement changes. Whether to interface it with other programs (performing cell resizing and buffering for example) or to enable engineering change orders (ECOs), it is a useful feature. I suppose it makes it easier to adapt the placer to new problems, with all the performance-critical code factored between the solver and the legalizer.

Therefore, I am going to use an analytical placement that continuously spreads the cells: if region constraints are used, circuit modifications are not as easy. Additionally, this method is the leading one on benchmarks: using it should improve both flexibility and efficiency.

That doesn’t mean writing a complete placement tool is easy: external tools are needed to analyze timing and congestion, but with analytical placement the interface may be very loose and quite modular. I am going to show Coloquinte’s interface in the next post.

VLSI global placement algorithms

The last post gave a quick introduction to the synthesis flow. Since it is the purpose of the tool, I am going to focus on placement algorithms here. Let’s leave all the other topics for now, even if high-level optimization, boolean optimization and technology mapping are equally interesting.

Placement may optimize the integrated circuit for several metrics: power, timing, area and, more importantly, congestion. That is, it must pass a feasible problem to the subsequent routing stage.
The most basic objective is wirelength: it correlates well with all the other metrics and is the primary one for most benchmarks.

So, how does a placer work? Early placers used meta-heuristics, such as simulated annealing. Even with clustering techniques, it becomes difficult to deal with large circuits. Both the industry and the academic world stopped using this technique during the 80s, and they began using two steps: a simplified global placement problem is solved to obtain approximate positions for the cells. It is followed by a legalization to obtain a correct circuit and a detailed placement phase then corrects local suboptimalities.

The global placement phase may be the most important one since it captures the problem’s complexity, while detailed placement is mostly about refinement. Two classes of algorithms have been developped for global placement: partitioning-based placement and analytical placement.

Placement through graph partitioning

You can see the netlist to place as an (hyper)graph: the cells are the vertices and the nets are the (hyper)edges. Placing the circuit with minimal wirelength is the problem of cutting the graph with minimal cost: you assign parts of the circuit to placement regions with as few wires between them as possible. By partitioning into gradually smaller regions, you can obtain a circuit with few global wiring. This weighted min-cut problem is complex but can be solved efficiently with heuristics: it is at the heart of partitioning-based placement.

Today, the partitioning-based tools are much more complex to capture timing and congestion: all tools, from Capo from the university of Michigan to Dragon and Fengshui have further heuristics to drive partitioning.

Continous optimization

On the other hand, you may model the wirelength as a continuous function of the cells’ positions: this is the idea behind analytical placement. With a convex model, you may use efficient optimization algorithms. However, the obvious optimal solution is to cluster all cells together, whereas you want to return an almost overlap-free placement. In order to spread the cells, analytical placement needs a companion algorithm that modifies the cost function or constrains the cells to separate regions. The algorithms alternate between optimization and spreading until a sufficiently refined solution is reached.

The choice of these spreading heuristics varies considerably between placement tools. Early tools constrained the cells to regions: it is still the method used by the commercial tool BonnPlace. More recent academic tools pull the cells toward a legal position, the calculation of which is generally purely heuristic. Kraftwerk simulated repulsion with an electrostatic potential while tools like Fastplace, SimPL or MPl just try to spread the cells with partitioning-like heuristics. Notable exceptions are tools that model the legalization problem as a flow problem, which have more mathematical justifications.

Note that no tool that I know of is available for more than benchmarking: they all provide binary-only versions. This is a sufficient reason to create an open tool, and I hope that Coloquinte will fill this gap.
I will not publish code before I have something clean enough for the global placer, but I hope to open the Git within two (one?) months and to gradually publish the algorithms.

In the next post, I will discuss the choice of a specific analytical placement algorithm.

A quick introduction to electronic design automation

I am currently writing a tool for electronic design automation. I prototyped it last year on my spare time, and it is now an official research project targeting integrated circuit placement. I called it Coloquinte, hence the name of this blog: its purpose is mostly to present and discuss algorithms.

Electronic design automation (EDA) is a need when integrated circuits feature blocks with millions of logic gates, or when the designs and the processes are as complex as they are now. Therefore, there are many fields in EDA, targeting both circuit synthesis and simulation/verification. The synthesis is the whole process to transform a high-level specification of a circuit into a bare-metal chip.

The high-level specification it receives is usually a dedicated hardware description language (HDL) like Verilog, which provides a description that is still close to the final circuit. Nowadays, this complexity is a burden and higher-level languages are called for: it should is possible to transform C or OpenCL into hardware.
The synthesis flow works as a multipass compiler: it performs optimizations on various representations to output a manufacturable chip. It will transform the specification into boolean logic, place this logic on a chip, then route the wires between them and finally verify that it did not introduce any error. Since an integrated circuit is more constrained than a computer program, these steps often fail and still require a lot of designer input: compiling a circuit is difficult, expensive and time-consuming.

Since the synthesis has such a tremendous influence on cost (area, manufacturing yield and time-to-market) and performance (frequency and power), it has been an area of active research during the past decades. Coloquinte targets part of this synthesis flow, the placement of cells in a block.

Current placement tools are known to be at least 50% away from optimal on some benchmarks. A lot of algorithms have been published but few tools are available in the open: Coloquinte is both a research project and an attempt to have a good placement tool available in a complete toolchain. It is supposed to integrate with the toolchain from Paris VI university, Coriolis, and its database Hurricane.

My next post will be a short presentation of placement algorithms before I can explain the architecture I chosed for Coloquinte and begin posting algorithms and ideas.