Surely you can force inline a function but that doesn’t always generate efficient code. Macro substitution happens in the preprocessing phase whereas inline substitution takes place in the compilation phase. This often makes a difference.
Case 1: Inline function
#include <stdio.h>
inline int __attribute__((always_inline)) SUM(int a, int b)
{
return (a + b);
}
int main(void)
{
printf("%d", SUM(10, 20));
return 0;
}
This compiles to the following assembly. Tested on https://godbolt.org/ using ARM gcc 5.4.
.LC0:
.ascii "%d\000"
main:
stmfd sp!, {fp, lr}
add fp, sp, #4
sub sp, sp, #8
mov r3, #10
str r3, [fp, #-8]
mov r3, #20
str r3, [fp, #-12]
ldr r2, [fp, #-8]
ldr r3, [fp, #-12]
add r3, r2, r3
ldr r0, .L2
mov r1, r3
bl printf
mov r3, #0
mov r0, r3
sub sp, fp, #4
ldmfd sp!, {fp, pc}
.L2:
.word .LC0
The SUM
function has indeed been inlined but notice the actual addition operation does takes place in assembly add r3, r2, r3
. In contrast consider the following code which uses a macro
Case 2: Macro
#include <stdio.h>
#define SUM(a,b) ((a) + (b))
int main(void)
{
printf("%d", SUM(10, 20));
return 0;
}
This assembles to
.LC0:
.ascii "%d\000"
main:
stmfd sp!, {fp, lr}
add fp, sp, #4
ldr r0, .L2
mov r1, #30
bl printf
mov r3, #0
mov r0, r3
ldmfd sp!, {fp, pc}
.L2:
.word .LC0
There’s no addition operation in the assembly, the compiler was intelligent enough to replace it with 30.
Now you may argue that if we compiled the first example with full optimizations -O2
we would have got similar results, but that’s not the point of the discussion.
The key idea is an inline function is still a function. If the inline function had local variables we would see space being reserved for them on the stack in the function from where it was called.
If we use macros instead the compiler doesn’t see the two separate functions. The preprocessor performs the substitution and all the compiler sees is a single block of code and can better optimize it.