Hi,
Gaurav Dhiman wrote:
It would be great if somebody can tell how compiler internally benifits from it
to arrange the code in optimized manner.
Let's take the following C code:
===================
#define likely(x) __builtin_expect(!!(x), 1)
#define unlikely(x) __builtin_expect(!!(x), 0)
int main(char *argv[], int argc)
{
int a;
/* Get the value from somewhere GCC can't optimize */
a = atoi (argv[1]);
if (unlikely (a == 2)) {
a++;
}
else {
a--;
}
printf ("%d\n", a);
return 0;
}
====================
Compile it with -O2, disassemble using objdump -S (comments added by me):
====================
080483b0 <main>:
// Prologue
80483b0: 55 push %ebp
80483b1: 89 e5 mov %esp,%ebp
80483b3: 50 push %eax
80483b4: 50 push %eax
80483b5: 83 e4 f0 and $0xfffffff0,%esp
// Call atoi()
80483b8: 8b 45 08 mov 0x8(%ebp),%eax
80483bb: 83 ec 1c sub $0x1c,%esp
80483be: 8b 48 04 mov 0x4(%eax),%ecx
80483c1: 51 push %ecx
80483c2: e8 1d ff ff ff call 80482e4 <atoi@plt>
80483c7: 83 c4 10 add $0x10,%esp
// Test the value
80483ca: 83 f8 02 cmp $0x2,%eax
// --------------------------------------------------------
// If 'a' equal to 2 (which is unlikely), then jump,
// otherwise continue directly, without jump, so that it
// doesn't flush the pipeline.
// --------------------------------------------------------
80483cd: 74 12 je 80483e1 <main+0x31>
80483cf: 48 dec %eax
// Call printf
80483d0: 52 push %edx
80483d1: 52 push %edx
80483d2: 50 push %eax
80483d3: 68 c8 84 04 08 push $0x80484c8
80483d8: e8 f7 fe ff ff call 80482d4 <printf@plt>
// Return 0 and go out.
80483dd: 31 c0 xor %eax,%eax
80483df: c9 leave
80483e0: c3 ret
====================
Now, replace 'unlikely' with 'likely'. Compile with -O2, disassemble
with objdump -S (comments added by me):
=====================
080483b0 <main>:
// Prologue
80483b0: 55 push %ebp
80483b1: 89 e5 mov %esp,%ebp
80483b3: 50 push %eax
80483b4: 50 push %eax
80483b5: 83 e4 f0 and $0xfffffff0,%esp
// Call atoi()
80483b8: 8b 45 08 mov 0x8(%ebp),%eax
80483bb: 83 ec 1c sub $0x1c,%esp
80483be: 8b 48 04 mov 0x4(%eax),%ecx
80483c1: 51 push %ecx
80483c2: e8 1d ff ff ff call 80482e4 <atoi@plt>
80483c7: 83 c4 10 add $0x10,%esp
// --------------------------------------------------
// If 'a' equal 2 (which is likely), we will continue
// without branching, so without flusing the pipeline. The
// jump only occurs when a != 2, which is unlikely.
// ---------------------------------------------------
80483ca: 83 f8 02 cmp $0x2,%eax
80483cd: 75 13 jne 80483e2 <main+0x32>
// Here the a++ incrementation has been optimized by gcc
80483cf: b0 03 mov $0x3,%al
// Call printf()
80483d1: 52 push %edx
80483d2: 52 push %edx
80483d3: 50 push %eax
80483d4: 68 c8 84 04 08 push $0x80484c8
80483d9: e8 f6 fe ff ff call 80482d4 <printf@plt>
// Return 0 and go out.
80483de: 31 c0 xor %eax,%eax
80483e0: c9 leave
80483e1: c3 ret
=======================
I hope it makes the thing clearer.
Sincerly,
Thomas