Dynamic random choice datastructure

One of the basic usages of random numbers is chosing an item at random. Most of the time, we want a uniform distribution, but sometimes we want a weighted one: the probability of selecting an element is proportional to its weight. We would use Numpy’s random.choice(), or C++’s discrete_distribution.

The algorithm would pick a random number, and perform a binary search (libstdc++ implementation) to find the associated item. But some randomized algorithms need to update the weights during execution: with those algorithms, we need to precompute all the weights each time we do a change.

It’s actually possible to do it in O(log n) time, by maintaining a binary tree: each node contains the sum of the weights of its child. This allows us to do a similar binary search (log n), and to access only a few nodes when a weight is updated (log n). I read about an implementation based on red-black trees on HN.

I wrote a similar datastructure based on an implicit binary tree – getting read of the pointers and the memory allocations, using about 3 times less memory when using doubles. It is available on Github.

Incremental evaluation is often overlooked

As a community, we focus a lot on the heuristics available to solve a problem. But my experience in the field is that we don’t spend much time on the heuristics themselves: the cost function will get most of the work.

Incrementality

When solving a problem using a metaheuristic, it is important to quickly evaluate incumbent solutions. When the number of variables is small, it is possible to compute the cost of a solution from scratch, but, if the problem has millions of variables, even a linear time-complexity will become limiting.

This is where incremental evaluation comes into play: for many cost functions, we can compute the result quickly for a small change of inputs. This is perfect for local search, which can now scale to very large problems.

Rolling your own

If you need incremental evaluation for your problem, you are going to roll your own. It is quite easy at first: evaluating sums, maxs and products incrementally is not very difficult.

But in practice requirements change: you’ll need to implement a slightly different cost function and add new features to it. Before you know it, this incremental cost function prevents you from experimenting as much as you’d like to. It may even have become a maintenance issue! Moreover, we have more important stuff to do, and it’s rarely on par with what you could obtain with more optimizations and preprocessing.

Any framework?

I sometimes feel the need for a library that would make incremental evaluation smoother. I know of one good framework for the metaheuristics part (ParadisEO), but I dont know of any framework to implement incremental evaluation of the cost function. When tackling some large scale problems, this is where the work is spent (both for the humans and the computer).

If you know one such framework, I’d like to hear from you!

Full disclosure: I am going to work at LocalSolver starting in November. A powerful modeling language makes modeling much easier, but generic solvers do not allow you to plug your own search engine: sometimes I’d like the best of both worlds.