Minipart 0.3: getting there

I just added a “v0.3” tag to my latest Minipart commit. It is shaping quite well, and is giving promising results at last. Compared to hMetis2, the results are very similar on average (2% better), although it’s still much much slower (5x).

A 2% quality improvement would be nothing to sneer at if I could improve the runtime, but this average hides a few interesting datapoints: some benchmarks show a 20% difference between hMetis and Minipart. I guess partitioning is not a solved problem yet.

Next steps

There are a lot of things I want to try. Nowadays, you cannot just hand-tune your algorithm: it takes too much brain time, and our brain is bad at it anyway. I probably have a lot to gain by just trying all kinds of parameters automatically – what people have been doing with SAT solvers for a while. Or give the parameters to a good blackbox optimizer, like people now do for neural networks. Or maybe try machine learning, why not.

Less fun but equally important would be to implement and benchmark all the “good” algorithms I left aside: all other partitioners implement Fiduccia-Mattheyse and heavy-edge coarsening, but I didn’t try them yet. A lot of ideas from MLPart and hMetis have proven useful, and I expect to find some room for improvement here.

Minipart partitioner

I am now writing a graph and hypergraph partitioner, Minipart. It’s starting to yield good solutions on the ISPD benchmarks, similar to MLPart and hMetis.

In a way, it is very similar to existing tools: it interleaves local search passes with problem simplifications (coarsening). On the other hand, I started with very different choices for coarsening, and my benchmarks are slowly leading me to a different kind of local search as well.

More details when I publish version 1.0!

Dynamic random choice datastructure

One of the basic usages of random numbers is chosing an item at random. Most of the time, we want a uniform distribution, but sometimes we want a weighted one: the probability of selecting an element is proportional to its weight. We would use Numpy’s random.choice(), or C++’s discrete_distribution.

The algorithm would pick a random number, and perform a binary search (libstdc++ implementation) to find the associated item. But some randomized algorithms need to update the weights during execution: with those algorithms, we need to precompute all the weights each time we do a change.

It’s actually possible to do it in O(log n) time, by maintaining a binary tree: each node contains the sum of the weights of its child. This allows us to do a similar binary search (log n), and to access only a few nodes when a weight is updated (log n). I read about an implementation based on red-black trees on HN.

I wrote a similar datastructure based on an implicit binary tree – getting read of the pointers and the memory allocations, using about 3 times less memory when using doubles. It is available on Github.

Incremental evaluation is often overlooked

As a community, we focus a lot on the heuristics available to solve a problem. But my experience in the field is that we don’t spend much time on the heuristics themselves: the cost function will get most of the work.

Incrementality

When solving a problem using a metaheuristic, it is important to quickly evaluate incumbent solutions. When the number of variables is small, it is possible to compute the cost of a solution from scratch, but, if the problem has millions of variables, even a linear time-complexity will become limiting.

This is where incremental evaluation comes into play: for many cost functions, we can compute the result quickly for a small change of inputs. This is perfect for local search, which can now scale to very large problems.

Rolling your own

If you need incremental evaluation for your problem, you are going to roll your own. It is quite easy at first: evaluating sums, maxs and products incrementally is not very difficult.

But in practice requirements change: you’ll need to implement a slightly different cost function and add new features to it. Before you know it, this incremental cost function prevents you from experimenting as much as you’d like to. It may even have become a maintenance issue! Moreover, we have more important stuff to do, and it’s rarely on par with what you could obtain with more optimizations and preprocessing.

Any framework?

I sometimes feel the need for a library that would make incremental evaluation smoother. I know of one good framework for the metaheuristics part (ParadisEO), but I dont know of any framework to implement incremental evaluation of the cost function. When tackling some large scale problems, this is where the work is spent (both for the humans and the computer).

If you know one such framework, I’d like to hear from you!

Full disclosure: I am going to work at LocalSolver starting in November. A powerful modeling language makes modeling much easier, but generic solvers do not allow you to plug your own search engine: sometimes I’d like the best of both worlds.

C++ reminder #3

A surprise with template type deduction and implicit conversion, when summing an array of float:

vector<float> to_sum = {0.5, 0.5, 0.5, 0.5};
cout << accumulate(to_sum.begin(), to_sum.end(), 0) << endl;
cout << accumulate(to_sum.begin(), to_sum.end(), 0.0) << endl;
cout << accumulate(to_sum.begin(), to_sum.end(), 0.0f) << endl;

This will print:

0
2
2

The template parameter is inferred from the scalar in the last argument: 0, 0.0 or 0.0f. (the first two arguments are iterators). The elements are converted implicitly to this type before summation.

  1. int type. Obviously not what was intended
  2. double. OK, but more precise and slower than float
  3. float. Probably the intent, but we need the 0.0f or an explicit cast

 

Note: this is just an example; if you care about precision or performance this is not the right way of summing floating point numbers.

C++ reminder

C++ is hard. We often see code written in the hope of avoiding a specific class of errors. For example, if we reset your pointers to NULL once they are deleted, they can never be deleted twice, right?

struct foo {
  char *p;
  ~foo();
};
foo::~foo() {
  delete p;
  p = nullptr;
}

Check that

Right, the reset gets optimized away. Since it’s forbidden to read a member from a deleted object, the compiler assumes that the member p can never be read again, and removes the write. Only with optimization disabled do we have the expected behaviour.

Morality: C++ is hard and workarounds rarely work as you would expect them to.

An introduction to netlists

In the electronic industry, there is one common denominator to all the tools and all the teams. Whether we are doing ASIC design, programming FPGAs or formally verifying chips, there is one thing we all use in our software: it is the representation of the design, called the netlist.

The netlist is the representation of the connections of the design. Although it probably originated from physical design representations, it is now a common representation throughout the compiler, from the high-level code to the final physical layout.

This post is the first in a series on the netlist, to present the basic concepts. Next ones will go more in depth on the operations performed on a netlist, the APIs that are provided to access it, and the actual representation in a programming language.

What’s in a netlist

A netlist represents the connections between the components of a design. Here is what it looks like:

A module containing two gates (and & or)
A simple module: two cells and four nets; net names are not shown

Let’s call the component “cells” and the connections “nets”. Each cell has an interface: each element of this interface is called a “port”. Note that everything has a name, and that the ports generally have a direction as well. For the computer scientists among us, this is a lot like a graph.

Now, imagine we reuse mod in another netlist. It might just be a cell among others. some of those cells may actually be complex components, containing other, smaller cells: this creates a hierarchy of nested cells, some of which may be reused.

A toy CPU netlist, with cell reuse for the cores and ALUs

For example, a toy CPU: core is defined once, but used twice, creating two “instances”, core1 and core2. Like in software design, reuse is encouraged: a “module” (core, ALU) will be instantiated several times (core1, core2, core1.ALU1, core1.ALU2…). This reuse is there to make the designers’ and tools’ jobs easier: once we create the chip, there will be one copy of core for each core1, core2….

Netlists everywhere

We have seen that the netlist came from the physical design world, but at the same time its quirks are present in “high-level” languages like Verilog and VHDL.

The consequence is that a netlist is the go-to format for everyone working in EDA… but there are many different netlist representations. For example:

  • Netlists targeting physical design with representations of the actual wiring as polygons. Coriolis’ Hurricane is a good example
  • Netlists targeting logical optimization, with specialized bit-level representations. This is the case in ABC
  • Higher-level netlists for synthesis and verification, with high-level operators, like Yosys
  • Generic netlists, that provide a generic API without application-specific utilities, like GBL

Since they are central to all those tools, their design as a lot of impact. First, on the overall development of the tools: an easy-to-use API is a must. Second, on the performance, to process millions to billions of gates efficiently. But let’s keep those for the next post.

Coloquinte report and presentation

I just realized that the report and presentation I wrote about Coloquinte were never made public. Here they are, under CC-BY-SA. Please use it as you see fit: to learn about placement algorithms, as a bibliography, or maybe to improve Coloquinte or other tools!

Coloquinte VLSI placer report

Coloquinte VLSI placer slides