There is a striking similarity between VLSI placement and scheduling. Digital electronics circuits are built from “standard cells”, logic gates with a unique height. They are organized in rows, so that the power wires run between two rows of standard cells.
Tasks must be scheduled on machines so that they do not conflict, standard cells must be placed in rows so that they do not overlap. The only difference between scheduling tasks and placing logic gates is the cost function, generally more complex for placement than for scheduling.
Placement problems have never been formulated as scheduling problems: it is no surprise that there has been few communication between the fields, the journals and laboratories being separate and the goals completely different. But it seems to me that we can solve some placement subproblems using scheduling techniques.
Scheduling is the problem of assigning tasks to machines. They are constrained by their execution times, with independent costs for each task.
In standard cell placement, the cost is usually much more complicated: the wirelength involves distances between tasks, and cannot be expressed a a sum of independent task costs. However, for a given ordering of the cells in the rows, it is possible to find the corresponding optimal positions by using linear programming (more precisely, network flow). This gives us a simple optimization pass to use during placement but it is still a bit expensive.
Therefore, for standard cell placement as well as scheduling, researchers have considered the simpler problem where there is only one row or one machine, still with fixed ordering. The cost function is a sum of independent piecewise linear functions, and hopefully there are faster algorithms to solve the problem.
Fast algorithms for the ordered single row problem
Indeed, this problem is much easier to solve in practice. The clumping algorithm solves this problem in worst case quadratic time, and is usually much faster. The idea behind it is to add the cells at the end of the row and push it to the next pin until its placement is optimal: the final placement is optimal due to the convexity of the cost function.
This quadratic worst case complexity is annoying: several publications managed to improve this runtime up to m log² m where m is the number of pins. However, the datastructures involved are a special type of balanced trees, both slow and difficult to code.
I discovered a simpler m log n algorithm based on a priority queue, but by chance I found out that it is already known for scheduling, as the specialized cascading descent algorithm. The cascading descent is equivalent to the clumping algorithm. The basic idea is to index the pins with a value that does not depend on the cell: this way, merging groups of cells takes constant time. The algorithm uses a single priority queue to hold the pins, hence its better complexity.
Applications to VLSI placement
It isn’t a breakthrough: even naive linear programming is fast enough to optimize a large part of a circuit, so that this improvement might be barely noticeable compared with approximate or quadratic algorithms.
However, there are two extensions that may prove particularly useful: local search and non-convex optimization.
Local search is the process of modifying the placement through simple modifications, like cell swapping. One of my goals is to modify the algorithm in order to support online placement modification with logarithmic complexity. That is, keep an optimal placement through small ordering modifications.
Non-convex optimization is trickier. There can be several local optimas, which makes most such problems extremely difficult. For the special structure of the ordered single row problem, however, there are quadratic algorithms. It means that we can integrate more complex cost functions directly in the ordered single row optimization, for example cell orientation.
Both ideas are relatively new: cell swapping, orientation and position optimization are almost always separate steps (although it is possible, even integer programming models in the litterature do not integrate all of them). I wonder how these methods will play compared to independent optimization passes.