Maybe it's just my own neuroses around being a self-taught developer and not having much of any mathematical background, but it always bothered me that articles like this constantly use phrases like "it's clear that...", "obviously...", "it's trivial to...", etc.

> Clearly, floor of base 2 log of x is (WORDBITS-lzc(x)).

Uh, is it?

syphilis2 250 days ago [-]

It's poor form to describe items as clear, obvious, trivial, 'of course', easy to see, and so on. To look too much into it: if the item is clearly understood then there's no need to identify it as clear. Clearly it's clear. It bothers me too.

taneq 250 days ago [-]

It's a bit of a trope that mathematicians consider all knowledge to fall into two groups: 'unknown' and 'obvious/trivial'.

250 days ago [-]

dahart 250 days ago [-]

>> Clearly, floor of base 2 log of x is (WORDBITS-lzc(x)).

> Uh, is it?

Of course, that is intuitively obvious to even the most casual observer. ;)

From the article How to Read Mathematics [0], that kind of statement "is a signal to the reader that what's involved here is perhaps tedious and even difficult, but involves no deep insights."

These "filler" expressions are typical in mathematical demonstrations (as an indication that step won't be explained). But sometimes it's not obvious

> Clearly, floor of base 2 log of x is (WORDBITS-lzc(x)).

Yes. Because the 1st bit indicates the MSB of the number and the way binary works

If you have only one bit set, then WORDBITS-lzc(x) is log2(x). If you have less significant bits enabled that won't be enough to bump the floor to the next number

zaptheimpaler 250 days ago [-]

I think anyone who has tried to teach or explain something will see why this happens. You have to make some assumptions about what readers know/can figure out otherwise you would literally be explaining everything starting from binary numbers every time.

The "clearly" stuff is just shorthand for "I assume you know this because its foundational to understanding the thing were talking about now / or I'm too lazy to explain this now / or this is so far off-topic that this isnt the place / etc. etc" . Nothing more - it can be read as arrogance on the authors part but I think thats an uncharitable interpretation.

moomin 250 days ago [-]

It is... if you're the kind of person who does bit twiddling.
And for everyone else it's gobbledygook.

Basically, you're using a binary representation, so everything is in powers of two. Let's just talk integers. 2^n is just the nth bit set and a bunch of zeros. Every number between 2^n and 2^n+1 has the nth digit set and some lower digits. The log base 2 is between n and n+1. So the floor is n. So, biggest digit set=floor of log base 2.

Once you've grasped that principle, the exact details are, in fact, clear. :)

mikeash 250 days ago [-]

To bring it into less computery terms, it's basically the binary equivalent of the statement that the floor of base 10 log of x is the number of decimal digits in x. The floor of log10 9823 is 4, the floor of log10 874323 is 6, etc. This is, I hope, clear. At least if you understand what log base 10 means.

moomin 250 days ago [-]

I may spend too much time visualising binary representations. :)

mikeash 250 days ago [-]

Well, your explanation is better overall, since it's describing what's actually going on. Mine is just an analogy in case that gives someone trouble.

andrepd 250 days ago [-]

I dunno, I guess it's alright to use these phrases for things that are truly trivial and direct. But plenty of people use these phrases to introduce things that are definitely not obvious. It's quite annoying and I suspect it's either ingrained in habit, or it's just the author wanting to "show off", sort of like "wow, this is so obvious to me, isn't it to you?", nevermind that they spent a great deal of time to understand it when they first came across it.

Grrr x)

pkaye 250 days ago [-]

I come from a firmware development background so this kind of bit manipulation tricks are familiar to me. That equation makes sense after a few seconds looking at it but I can't always run a proof in my mind to be 100% certain.

250 days ago [-]

Palomides 250 days ago [-]

if you like these, definitely check out hacker's delight by warren, it's pretty readable and thorough. imo a big advantage of at least skimming it is that you can develop an intuition of what sorts of problems can be solved in a few bitwise ops.

Veedrac 250 days ago [-]

These tricks are cool on their own, but they're really neat when you tie them together to solve larger (but still toy) puzzles. Two fun ones I can remember:

> Extra-extra credit can be earned for finding an algorithm that takes no more than O(log N) space and no more than
O(log N) time, where N is the number you are seeking for.

and the bit magic made it O(1) space and time. The second was on similar lines.

Unfortunately the bit fiddling at my day job tends to be a bit tamer ;).

Veedrac 250 days ago [-]

A clarification: this isn't to say such tricks are never used in a non-toy environment. There is, for example, this search algorithm I made for a serious purpose:

The bit math is milder, but it still comes in handy.

glangdale 250 days ago [-]

There are a few good pages/books on this ("Bit Twiddling Hacks", and Henry Warren's Hacker's Delight).

On my wishlist: updated versions of these for modern architectures and SIMD. Full disclosure: am Intel employee. That said, modern chips of all strips (x86, ARM64) all can do some amazing things with low-cost SIMD and bit-manipulation instructions. It would be interesting to see what smart folks come up with using these. Obviously you can make the 'bit extract' function really boring with PEXT, but what cool tricks can you do when you start with AVX2 and BMI1/2 as a baseline?

zzo38computer 250 days ago [-]

I have found many thing like this, and have thought of how some of these things might be done with MMIX too. With MMIX, you could for example to use 2ADDU and 4ADDU to multiply by fifteen or by twelve, etc. Reversing bits (in this case, of a 64-bit number) is also simple in MMIX:

MOR $0,#0102040810204080,$0
MOR $0,$0,#0102040810204080

(Although that constant will be placed in a register, but you can use the same constant twice.)

wsgeek 250 days ago [-]

Do these actually run faster than the code produced by a good compiler? Benchmarks?

usefulcat 249 days ago [-]

In some cases I suspect it depends a lot on the hardware. I did a bit of research into the "Integer minimum" algorithm. Given the following straightforward implementation:

int min(int x, int y) { return y < x ? y : x; }

..most compilers will produce something very much like the following for x86-64 (from Compiler Explorer https://gcc.godbolt.org using -Ofast -march=haswell):

min(int, int):
cmp esi, edi
mov eax, edi
cmovle eax, esi
ret

Compare that to the generated instructions for the algorithm listed in the "Integer minimum or maximum" section (x+(((y-x)>>(WORDBITS-1))&(y-x))):

min(int, int):
sub esi, edi
mov eax, esi
sar eax, 31
and eax, esi
add eax, edi
ret

It's not immediately obvious to me which one is likely to be faster (I'm not an expert on x86-64 assembly).

andrepd 250 days ago [-]

Probably not. These low level tricks are supposed to be hidden away inside functions, and any good compiler ought to produce similar machine code in the majority of these examples.

deathanatos 250 days ago [-]

> Do these actually run faster than the code produced by a good compiler? Benchmarks?

Some of these are absolutely suspect. Now, these are all architecture specific: my results below are for gcc, targeting the Intel chip in the MBP in my lap. (Of course, some of the article's code samples assume certain things about an architecture, and are not portable.)

For example,

> Absolute Value of a Float

> IEEE floating point uses an explicit sign bit, so the absolute value can be taken by a bitwise AND with the complement of the sign bit. For IA32 32-bit, the sign bit is an int value of 0x80000000, for IA32 64-bit, the sign bit is the long long int value 0x8000000000000000. Of course, if you prefer to use just int values, the IA32 64-bit sign bit is 0x80000000 at an int address offset of +1. For example:

double x;
/* make x = abs(x) */
*(((int *) &x) + 1) &= 0x7fffffff;

First, this is undefined behavior. The portable way to do this is fabs(), which is part of the standard library. For me, calling fabs() compiles to a single instruction:

fabs() compiled down to essentially the same bitwise &, in assembly, that the article is doing. The article's "advice" also gets compiled to exactly the same instruction & set of constants, which I actually find way more interesting.

> Integer Constant Multiply

> Given an integer value x and an integer or floating point value y, the value of xy can be computed efficiently using a sequence derived from the binary value of x. For example, if x is 5 (4 + 1):*

y2 = y + y;
y4 = y2 + y2;
result = y + y4;

> In the special case that y is an integer, this can be done with shifts:

y4 = (y << 2);
result = y + y4;

Compilers will do this for you, today, automatically. In the example case of multiplying by 5, gcc will do better than the article's code, by using lea cleverly:

leal (%rdi,%rdi,4), %eax

gcc also compiles the article's shift-then-add strategy to exactly the same leal.

The article mentions this optimization again, later,

> Shift-and-Add Optimization

> Rather obviously, if an integer multiply can be implemented by a shift-and-add sequence, then a shift-and-add sequence can be implemented by multiplying by the appropriate constant... with some speedup on processors like the AMD Athlon. Unfortunately, GCC seems to believe constant multiplies should always be converted into shift-and-add sequences, so there is a problem in using this optimization in C source code.

The above leal instruction was the output of GCC, so this is obviously wrong. One can maybe say the article is old, but I'm pretty sure GCC has had the "use leal to do constant multiplies" optimization for over a decade now, but then, perhaps the article is that old. If so, the advice would appear to no longer apply.

Write it the obvious, readable way first. If you decide to optimize at the level this article is discussing, you should be reading the assembly to see if your optimizations will have any effect.

But not all of them are bad. The "is this unsigned integer a power of 2" code is correct, I believe, and if you understand binary, not that hard to understand.

Veedrac 250 days ago [-]

That's not to say these optimisations are no longer needed; you just need to be smarter about applying them. If you're multiplying by 2^n - 1, for example, keeping n around and writing a pair of shifts is not an optimisation a compiler will be able to make, and even when doing the power inline (x * ((1 << n) - 1)) most compilers aren't able to optimise it properly.

This is one of my favorite web pages. There isn't much call to use these techniques in most code, but understanding why they work helps expand one's thinking.

I have found many thing like this, and have thought of how some of these things might be done with MMIX too. With MMIX, you could for example to use 2ADDU and 4ADDU to multiply by fifteen or by twelve, etc. Reversing bits (in this case, of a 64-bit number) is also simple in MMIX:

MOR $0,#0102040810204080,$0
MOR $0,$0,#0102040810204080

(Although that constant will be placed in a register, but you can use the same constant twice.)

>

Clearly, floor of base 2 log of x is (WORDBITS-lzc(x)).Uh, is it?

> Uh, is it?

Of course, that is intuitively obvious to even the most casual observer. ;)

http://www.urbandictionary.com/define.php?term=intuitively%2...

[0] http://web.stonehill.edu/compsci/History_Math/math-read.htm

HN discussion: https://news.ycombinator.com/item?id=14232977

> Clearly, floor of base 2 log of x is (WORDBITS-lzc(x)).

Yes. Because the 1st bit indicates the MSB of the number and the way binary works

If you have only one bit set, then WORDBITS-lzc(x)

islog2(x). If you have less significant bits enabled that won't be enough to bump the floor to the next numberThe "clearly" stuff is just shorthand for "I assume you know this because its foundational to understanding the thing were talking about now / or I'm too lazy to explain this now / or this is so far off-topic that this isnt the place / etc. etc" . Nothing more - it can be read as arrogance on the authors part but I think thats an uncharitable interpretation.

Basically, you're using a binary representation, so everything is in powers of two. Let's just talk integers. 2^n is just the nth bit set and a bunch of zeros. Every number between 2^n and 2^n+1 has the nth digit set and some lower digits. The log base 2 is between n and n+1. So the floor is n. So, biggest digit set=floor of log base 2.

Once you've grasped that principle, the exact details are, in fact, clear. :)

Grrr x)

https://www.reddit.com/r/programming/comments/4vctor/.../d5x...

https://cs.stackexchange.com/a/74888/16408

The first asked

> Extra-extra credit can be earned for finding an algorithm that takes no more than O(log N) space and no more than O(log N) time, where N is the number you are seeking for.

and the bit magic made it O(1) space and time. The second was on similar lines.

Unfortunately the bit fiddling at my day job tends to be a bit tamer ;).

neverused in a non-toy environment. There is, for example, this search algorithm I made for a serious purpose:https://stackoverflow.com/q/40141793/1763356

The bit math is milder, but it still comes in handy.

On my wishlist: updated versions of these for modern architectures and SIMD. Full disclosure: am Intel employee. That said, modern chips of all strips (x86, ARM64) all can do some amazing things with low-cost SIMD and bit-manipulation instructions. It would be interesting to see what smart folks come up with using these. Obviously you can make the 'bit extract' function

really boringwith PEXT, but what cool tricks can you do when you start with AVX2 and BMI1/2 as a baseline?MOR $0,#0102040810204080,$0 MOR $0,$0,#0102040810204080

(Although that constant will be placed in a register, but you can use the same constant twice.)

Do these actually run faster than the code produced by a good compiler? Benchmarks?Some of these are absolutely suspect. Now, these are all architecture specific: my results below are for gcc, targeting the Intel chip in the MBP in my lap. (Of course, some of the article's code samples assume certain things about an architecture, and are not portable.)

For example,

>

Absolute Value of a Float>

First, this is undefined behavior. The portable way to do this is fabs(), which is part of the standard library. For me, calling fabs() compiles to a single instruction: where LCPI1_0 is fabs() compiled down to essentially the same bitwise &, in assembly, that the article is doing. The article's "advice"IEEE floating point uses an explicit sign bit, so the absolute value can be taken by a bitwise AND with the complement of the sign bit. For IA32 32-bit, the sign bit is an int value of 0x80000000, for IA32 64-bit, the sign bit is the long long int value 0x8000000000000000. Of course, if you prefer to use just int values, the IA32 64-bit sign bit is 0x80000000 at an int address offset of +1. For example:alsogets compiled to exactly the same instruction & set of constants, which I actually find way more interesting.>

Integer Constant Multiply>

>Given an integer value x and an integer or floating point value y, the value of xy can be computed efficiently using a sequence derived from the binary value of x. For example, if x is 5 (4 + 1):*In the special case that y is an integer, this can be done with shifts:Compilers will do this for you, today, automatically. In the example case of multiplying by 5, gcc will do better than the article's code, by using lea cleverly: gccalsocompiles the article's shift-then-add strategy toexactlythe same leal.The article mentions this optimization again, later,

>

Shift-and-Add Optimization>

Rather obviously, if an integer multiply can be implemented by a shift-and-add sequence, then a shift-and-add sequence can be implemented by multiplying by the appropriate constant... with some speedup on processors like the AMD Athlon. Unfortunately, GCC seems to believe constant multiplies should always be converted into shift-and-add sequences, so there is a problem in using this optimization in C source code.The above leal instruction was the output of GCC, so this is obviously wrong. One can maybe say the article is old, but I'm pretty sure GCC has had the "use leal to do constant multiplies" optimization for over a decade now, but then, perhaps the article is that old. If so, the advice would appear to no longer apply.

Write it the obvious, readable way first. If you decide to optimize at the level this article is discussing, you should be reading the assembly to see if your optimizations will have any effect.

But not all of them are bad. The "is this unsigned integer a power of 2" code is correct, I believe, and if you understand binary, not that hard to understand.

https://godbolt.org/g/aHmMDL

Hacker's Delight: https://www.amazon.com/Hackers-Delight-2nd-Henry-Warren/dp/0...MOR $0,#0102040810204080,$0 MOR $0,$0,#0102040810204080

(Although that constant will be placed in a register, but you can use the same constant twice.)