Function call in x86 architecture has everything to do with two registers: EBP and ESP. I'll discuss what happens in the call stack and the change in EBP and ESP when a function is called.
In x86 architecure, register EBP means base pointer which always pointing to the base address of an activation record. And register ESP means stack pointer which always pointing to the top of the activation record. EIP as a program counter, storing the address of next instruction should be executed.
In AT&T x86 assembly, there're four instructions: call, ret, enter and leave participated in function call.
call and ret equal to the following logics respectively:
CALL: push %eip #store the return address on stack frame
mov f, %eip #reset the value of EIP for safety
jmp LABEL #jump to subroutine
RET: pop %eip #restore the return address to EIP
jmp %eip #jump back to where the subroutine call ends
enter and leave equal to the following instructions respectively:
ENTER: push %ebp #store the old EBP
mov %esp, %ebp #set new EBP pointing to current ESP
LEAVE: mov %ebp, %esp #set ESP pointing to current EBP (return to the point before function call happened)
pop %ebp #restore the old EBP
When the caller calls a function, it invokes call instruction, pushing the IP (containing the return address) into stack. When the activation record of subrouting is established, the callee invokes enter, pushing the EBP of caller, and setting it's own EBP.
When the subroutine is over, the callee invokes leave and ret, recovering the caller's EBP and ESP. Then the activation record of callee no longer exists
Here's the change of stack in caller when calling the subroutine:
Here's the change of stack in callee return from the subroutine:
One thing to point out is that when there's no change in ESP, the callee just invoke pop %ebp instead of leave. Now, let's take a look at an example.
/* test.c */
#include <stdio.h>
int foo1(int a, int b) {
return a + b;
}
int foo2(int a, int b) {
int g = a + b;
return g;
}
int main() {
foo1(1, 2);
foo2(1, 2);
return 0;
}
Let's compile this program in GCC:
gcc -g -m32 -o test test.c
Now we use GDB to disassemble the target program: (on x86_64 Linux using -m32 options when compiling)
(gdb) disassemble foo1
Dump of assembler code for function foo1:
0x080483db <+0>: push %ebp
0x080483dc <+1>: mov %esp,%ebp
0x080483de <+3>: mov 0x8(%ebp),%edx
0x080483e1 <+6>: mov 0xc(%ebp),%eax
0x080483e4 <+9>: add %edx,%eax
0x080483e6 <+11>: pop %ebp
0x080483e7 <+12>: ret
End of assembler dump.
(gdb) disassemble foo2
Dump of assembler code for function foo2:
0x080483e8 <+0>: push %ebp
0x080483e9 <+1>: mov %esp,%ebp
0x080483eb <+3>: sub $0x10,%esp
0x080483ee <+6>: mov 0x8(%ebp),%edx
0x080483f1 <+9>: mov 0xc(%ebp),%eax
0x080483f4 <+12>: add %edx,%eax
0x080483f6 <+14>: mov %eax,-0x4(%ebp)
0x080483f9 <+17>: mov -0x4(%ebp),%eax
0x080483fc <+20>: leave
0x080483fd <+21>: ret
End of assembler dump.
Here's the stack frame of foo1 and foo2
As we can see, although foo1 and foo2 have the same logic, when foo1 finished, it didn't invoke leave while when foo2 finished, it invoked leave. foo1 doesn't have local variable in its activation record, so it does not need extra space, thus ESP didn't change in foo1. As the picture shown above, foo2 has a local variable g, thus the complier reserved 16 byte for the stack frame of foo2. There's only one integer local variable of foo2, Why GCC reserved 16 byte for it? The reason is GCC prefers to keep the stack byte-aligned with fetching group size in architecture level.