The most efficient algorithm to scan a bitmap

Yesterday in class, we were learning about the linux scheduler. How you can have several different algorithms for scheduling, etc. All very nice. The actual scheduler used in linux 2.6 has 140 FIFO queues; one for each possible priority. The scheduler takes the first job from the highest priority queue. The queues have a 140 bit bitmap. Because the bitmap gets scanned every single time there is a new quantum, the algorithm for scanning the bitmap needs to be very fast.

Our professor asked how we would go about scanning the bitmap. As an example, he put up an algorithm that would compare a single bit, then shift the bits left and keep track of the number of shifts. Obviously room for improvement there. I suggested that the algorithm should be just an integer base 2 log function. To which the professor asked me how I would implement it. I wanted to say: "in hardware", but he was already moving forward. But really, the question of "what is the most significant bit?" is exactly the same question as "what is the integer base 2 log?", one of these two problems has been solved already, in hardware. And actually, I'd solved a problem similar to this years ago, again using a little bit of assembler.

Then he presented how linux does it: by using a binary search algorithm on the highest integer, then on the second integer, etc, until all 140 bits have been found. To me this looks like a O(log n) algorithm because that's what a binary search does.

However, big O notation assumes that we are dealing with large numbers, and 140 is just not that large. A bit scan function is a 0(n) algorithm. However, if your lower order values are small enough, and the problem small enough, then you can consistently beat an 0(log n) algorithm.

Here is my approach:

I am going to specialize on the processor. I am working with a 64-bit Intel processor. I would expect any kernel code to have libraries that are platform specific. I am just implementing one for my platform. In this case, it is not that big of a deal, because I just have to implement a intLog2 function that takes a 64-bit int and returns one as well. Second, I am going to use an assembly code operation that performs a reverse bit scan to do my log function (thank you Intel). On a 64-bit machine, that means I only have 3 operations to perform on that 140-bit bitmap. I am also going to use the operations registers to perform the function. Finally, I am going to wrap that assembly operation in a #define so that it is executed inline.

Performing the scan on all three elements of the bitmap should simply take the most significant long (I'm calling it b[0], and verify that it has any values at all, if it does, return the scanned value, same for b[1] and b[2]. If none of them have any values, then simply return -1 (for no values in the queue).

Ok, here is the code:

 Name        : ScanBitmap.c
 Author      : Aaron Wells
 Copyright   : Freely Distributable
 Description : C, Ansi-style


 * Integer Base 2 Log Function
 * The integer base 2 log is simply the number of the high bit that is set
 * This is accomplished by using the Intel bsr instruction (Bit Scan Reverse)
 * Command is available on 386 or better CPU
 * The values should be loaded and retrieved from registers
#define intlog2(x)       \
		({ long __value, __arg = (x);   \
		__asm__ ("bsr %1,%0": "=r" (__value): "r" (__arg));  \
		__value; })

// 3 longs is 3 * 64 bits = 192 bits (52 extra)
#define BITMAP_SIZE 3

 * This is my "algorithm", not much to it, is there??
static long scanHiBit(unsigned long b[BITMAP_SIZE]){
	if(b[0]) return intlog2(b[0]) + 128;
	if(b[1]) return intlog2(b[1]) + 64;
	if(b[2]) return intlog2(b[2]);
	return -1;

static void printBitmap(unsigned long b[BITMAP_SIZE]){
	printf("%lx %lx %lx", b[0], b[1], b[2]);

int main(void) {
	unsigned long b1[BITMAP_SIZE] = {-1, -1, -1};
	unsigned long b2[BITMAP_SIZE] = { 0, -1, -1};
	unsigned long b3[BITMAP_SIZE] = { 0,  0, -1};
	unsigned long b4[BITMAP_SIZE] = { 0,  0,  1};	// almost worst case
	unsigned long b5[BITMAP_SIZE] = { 0,  0,  0};	// worst case

	printf(" :%ld\n",scanHiBit(b1));

	printf(" :%ld\n",scanHiBit(b2));

	printf(" :%ld\n",scanHiBit(b3));

	printf(" :%ld\n",scanHiBit(b4));

	printf(" :%ld\n",scanHiBit(b5));


There is always room for improvement, and this code is no different. To really do this right, you would pass in the pointer to the start of the bitmap and do the whole thing in assembler. Not very portable, but very fast.

By the way, there would be no reason that you couldn't do a binary search as a fall-back algorithm for CPUs that don't have a reverse scan instruction.

As a follow-up, I may compare my approach (I can't take credit for the algorithm because it is basically a built-in instruction) to the scheduler scanner in linux... another day.


Popular posts from this blog

KnockoutJS, WebAPI, and TypeScript

Database Projects, SQL Unit Tests, and TeamCity