I just added a “v0.3” tag to my latest Minipart commit. It is shaping quite well, and is giving promising results at last. Compared to hMetis2, the results are very similar on average (2% better), although it’s still much much slower (5x).
A 2% quality improvement would be nothing to sneer at if I could improve the runtime, but this average hides a few interesting datapoints: some benchmarks show a 20% difference between hMetis and Minipart. I guess partitioning is not a solved problem yet.
There are a lot of things I want to try. Nowadays, you cannot just hand-tune your algorithm: it takes too much brain time, and our brain is bad at it anyway. I probably have a lot to gain by just trying all kinds of parameters automatically – what people have been doing with SAT solvers for a while. Or give the parameters to a good blackbox optimizer, like people now do for neural networks. Or maybe try machine learning, why not.
Less fun but equally important would be to implement and benchmark all the “good” algorithms I left aside: all other partitioners implement Fiduccia-Mattheyse and heavy-edge coarsening, but I didn’t try them yet. A lot of ideas from MLPart and hMetis have proven useful, and I expect to find some room for improvement here.
I am now writing a graph and hypergraph partitioner, Minipart. It’s starting to yield good solutions on the ISPD benchmarks, similar to MLPart and hMetis.
In a way, it is very similar to existing tools: it interleaves local search passes with problem simplifications (coarsening). On the other hand, I started with very different choices for coarsening, and my benchmarks are slowly leading me to a different kind of local search as well.
I had some fun migrating the server to Debian 9 (from 6…) and setting up Https. Of course, I broke Apache in the process and had to reinstall from scratch, but overall it went fine. Let’s encrypt make it incredibly easy to setup a TLS certificate.
One of the basic usages of random numbers is chosing an item at random. Most of the time, we want a uniform distribution, but sometimes we want a weighted one: the probability of selecting an element is proportional to its weight. We would use Numpy’s random.choice(), or C++’s discrete_distribution.
The algorithm would pick a random number, and perform a binary search (libstdc++ implementation) to find the associated item. But some randomized algorithms need to update the weights during execution: with those algorithms, we need to precompute all the weights each time we do a change.
It’s actually possible to do it in O(log n) time, by maintaining a binary tree: each node contains the sum of the weights of its child. This allows us to do a similar binary search (log n), and to access only a few nodes when a weight is updated (log n). I read about an implementation based on red-black trees on HN.
I wrote a similar datastructure based on an implicit binary tree – getting read of the pointers and the memory allocations, using about 3 times less memory when using doubles. It is available on Github.
As a community, we focus a lot on the heuristics available to solve a problem. But my experience in the field is that we don’t spend much time on the heuristics themselves: the cost function will get most of the work.
When solving a problem using a metaheuristic, it is important to quickly evaluate incumbent solutions. When the number of variables is small, it is possible to compute the cost of a solution from scratch, but, if the problem has millions of variables, even a linear time-complexity will become limiting.
This is where incremental evaluation comes into play: for many cost functions, we can compute the result quickly for a small change of inputs. This is perfect for local search, which can now scale to very large problems.
Rolling your own
If you need incremental evaluation for your problem, you are going to roll your own. It is quite easy at first: evaluating sums, maxs and products incrementally is not very difficult.
But in practice requirements change: you’ll need to implement a slightly different cost function and add new features to it. Before you know it, this incremental cost function prevents you from experimenting as much as you’d like to. It may even have become a maintenance issue! Moreover, we have more important stuff to do, and it’s rarely on par with what you could obtain with more optimizations and preprocessing.
I sometimes feel the need for a library that would make incremental evaluation smoother. I know of one good framework for the metaheuristics part (ParadisEO), but I dont know of any framework to implement incremental evaluation of the cost function. When tackling some large scale problems, this is where the work is spent (both for the humans and the computer).
If you know one such framework, I’d like to hear from you!
Full disclosure: I am going to work at LocalSolver starting in November. A powerful modeling language makes modeling much easier, but generic solvers do not allow you to plug your own search engine: sometimes I’d like the best of both worlds.
One of my favourites in the standard library: numeric_limits. Sometimes, you want to initialize a number to something really big. Some code just uses magic numbers, but the proper way is to use numeric_limits<T>::max(), which does exactly what you expect.
Right, the reset gets optimized away. Since it’s forbidden to read a member from a deleted object, the compiler assumes that the member p can never be read again, and removes the write. Only with optimization disabled do we have the expected behaviour.
Morality: C++ is hard and workarounds rarely work as you would expect them to.
In the electronic industry, there is one common denominator to all the tools and all the teams. Whether we are doing ASIC design, programming FPGAs or formally verifying chips, there is one thing we all use in our software: it is the representation of the design, called the netlist.
The netlist is the representation of the connections of the design. Although it probably originated from physical design representations, it is now a common representation throughout the compiler, from the high-level code to the final physical layout.
This post is the first in a series on the netlist, to present the basic concepts. Next ones will go more in depth on the operations performed on a netlist, the APIs that are provided to access it, and the actual representation in a programming language.
What’s in a netlist
A netlist represents the connections between the components of a design. Here is what it looks like:
Let’s call the component “cells” and the connections “nets”. Each cell has an interface: each element of this interface is called a “port”. Note that everything has a name, and that the ports generally have a direction as well. For the computer scientists among us, this is a lot like a graph.
Now, imagine we reuse mod in another netlist. It might just be a cell among others. some of those cells may actually be complex components, containing other, smaller cells: this creates a hierarchy of nested cells, some of which may be reused.
For example, a toy CPU: core is defined once, but used twice, creating two “instances”, core1 and core2. Like in software design, reuse is encouraged: a “module” (core, ALU) will be instantiated several times (core1, core2, core1.ALU1, core1.ALU2…). This reuse is there to make the designers’ and tools’ jobs easier: once we create the chip, there will be one copy of core for each core1, core2….
We have seen that the netlist came from the physical design world, but at the same time its quirks are present in “high-level” languages like Verilog and VHDL.
The consequence is that a netlist is the go-to format for everyone working in EDA… but there are many different netlist representations. For example:
Netlists targeting physical design with representations of the actual wiring as polygons. Coriolis’ Hurricane is a good example
Netlists targeting logical optimization, with specialized bit-level representations. This is the case in ABC
Higher-level netlists for synthesis and verification, with high-level operators, like Yosys
Generic netlists, that provide a generic API without application-specific utilities, like GBL
Since they are central to all those tools, their design as a lot of impact. First, on the overall development of the tools: an easy-to-use API is a must. Second, on the performance, to process millions to billions of gates efficiently. But let’s keep those for the next post.
I just realized that the report and presentation I wrote about Coloquinte were never made public. Here they are, under CC-BY-SA. Please use it as you see fit: to learn about placement algorithms, as a bibliography, or maybe to improve Coloquinte or other tools!