Wednesday, 8 July 2009

Finding the Most Significant Bit (MSB) for a number in Perl

The following is a script to print a number's Most Significant Bit value:

user@host:~$ echo "44" | perl -MPOSIX -ne 'print 2**floor(log($_)/log(2))."\n"'

Why you ask? Well, it can be handy. For example, I had a CSV file which had over 2000 columns! The rows were in dates and each column was a number of event occurrences. So a data element was the number of times that occurrence occurred on that day. Make sense?

Something like "On 08/07/2009 10 people had 4 apples, 8 people had 5 apples, 2 people had 6 apples" etc.

Because I had so many columns, I needed to put the occurrences in to ranges. Also, because the data was an inverse exponential (i.e. very few people had 1000 apples on a particular day and 90% of the people for that day had 1 apple) it makes sense to have those ranges exponentially larger. An obvious exponential to use was the power of two. Getting the Most Significant Bit allows you to determine the correct range dynamically without doing comparisons.

So, in the above example, 44 is in the range 32 (MSB) to 63. You can then use the MSB as a key in a hash and total up the values for all the numbers in that range.

Simple huh?

Basically, the POSIX module is needed for the floor function. log(n)/log(2) gives you log2(n), and flooring that result gives you the index for the MSB. So, 2 to the power of the index gives you the MSB itself.

No comments:

Post a Comment