REN

Ph.D. in Computer Science at Rutgers University

Function Call with register RBP and RSP in x64

In last article Function Call with register EBP and ESP in x86, we've covered function call in x86. In x64, there're some differences in funciton call compared to x86; not just extend everything to 64 bit. In this article, I'm gonna show the layout of stack frame/activation record in x64.

What happened when it goes to 64 bit?

First of all, let's recall the example in the last article Function Call with register EBP and ESP in x86. In last article, we use -m32 option to make the program run in 32 bit mode, but this time we need to see what happens in 64 bit.

/* test.c */
#include <stdio.h>

int foo1(int a, int b) {
	return a + b;
}

int foo2(int a, int b) {
	int g = a + b;
	return g;
}

int main() {
	foo1(1, 2);
	foo2(1, 2);
	return 0;
}

Let's compile this program in GCC:

gcc -g -o test test.c

Now we use GDB to disassemble the target program: (on x86_64 Linux)

(gdb) disassemble foo1
Dump of assembler code for function foo1:
   0x00000000004004d6 <+0>:	push   %rbp
   0x00000000004004d7 <+1>:	mov    %rsp,%rbp
   0x00000000004004da <+4>:	mov    %edi,-0x4(%rbp)
   0x00000000004004dd <+7>:	mov    %esi,-0x8(%rbp)
   0x00000000004004e0 <+10>:	mov    -0x4(%rbp),%edx
   0x00000000004004e3 <+13>:	mov    -0x8(%rbp),%eax
   0x00000000004004e6 <+16>:	add    %edx,%eax
   0x00000000004004e8 <+18>:	pop    %rbp
   0x00000000004004e9 <+19>:	retq   
End of assembler dump.
(gdb) disassemble foo2
Dump of assembler code for function foo2:
   0x00000000004004ea <+0>:	push   %rbp
   0x00000000004004eb <+1>:	mov    %rsp,%rbp
   0x00000000004004ee <+4>:	mov    %edi,-0x14(%rbp)
   0x00000000004004f1 <+7>:	mov    %esi,-0x18(%rbp)
   0x00000000004004f4 <+10>:	mov    -0x14(%rbp),%edx
   0x00000000004004f7 <+13>:	mov    -0x18(%rbp),%eax
   0x00000000004004fa <+16>:	add    %edx,%eax
   0x00000000004004fc <+18>:	mov    %eax,-0x4(%rbp)
   0x00000000004004ff <+21>:	mov    -0x4(%rbp),%eax
   0x0000000000400502 <+24>:	pop    %rbp
   0x0000000000400503 <+25>:	retq   
End of assembler dump.

As we see in the result above, in function foo2, when it returns, RSP is not modified. That's wired! because in foo2, there's a local variable g. Stack pointer RSP is pointing to the top of the stack all the time, where is g stored? Let's debug it in GDB:

(gdb) x/10xw $rsp
0x7fffffffe0b0:	0xffffe0c0	0x00007fff	0x00400526	0x00000000
0x7fffffffe0c0:	0x00400530	0x00000000	0xf7a2e830	0x00007fff
0x7fffffffe0d0:	0x00000000	0x00000000
(gdb) x/10xw $rbp
0x7fffffffe0b0:	0xffffe0c0	0x00007fff	0x00400526	0x00000000
0x7fffffffe0c0:	0x00400530	0x00000000	0xf7a2e830	0x00007fff
0x7fffffffe0d0:	0x00000000	0x00000000
(gdb) x/10xw $rsp-4
0x7fffffffe0ac:	0x00000003	0xffffe0c0	0x00007fff	0x00400526
0x7fffffffe0bc:	0x00000000	0x00400530	0x00000000	0xf7a2e830
0x7fffffffe0cc:	0x00007fff	0x00000000

After function foo2 is about to return, we could see RSP equals to RBP, while beyond RSP, the local variable g was in RSP - 4. What happened here? Isn't the RSP pointing to the top the stack? Now, let's go to the main part of this article.


Register Extension in x64

x86 has 8 GPRs (General-Purpose Registers): EAX, EBX, ECX, EDX, EBP, ESP, ESI, EDI. x64 extended them to 64 bits: RAX, RBX, RCX, RDX, RBP, RSP, RSI, RDI and added another 8 GPRs: R8, R9, R10, R11, R12, R13, R14, R15. Remember, although they're called 'general-purpose registers', some of them are used for specific operation by default. The register extension in x64 broke the criticises on few GPRs in x86 architecture. More GPRs means the potential in increasing the performance of pipeling.


Activation Record Layout in x64

As it said above, x64 has more than 8 general-purpose registers, parameters are stored in some of those registers. According to the x86_64 ABI, the first 6 integer of pointer arguments of a function are passed by registers. Only the 7th argument and onwards are passed on the stack. Thus, in the example above, foo1's arguments a and b are stored in EDI and ESI respectively instead of in stack frame.

But how could local variable stay beyond the stack pointer? In x64, there's a reserved space which could be used to store local variables called "red zone" for activation record of a function without changing stack pointer RSP. Thus, the confusion in the beginning of this article now becomes clear. Now, let's look at how System V x86_64 ABI defines this "red zone":

	The 128-byte area beyond the location pointed to by %rsp is considered to be reserved and shall not be modified by signal or interrupt
 	handlers. Therefore, functions may use this area for temporary data that is not needed across function calls. In particular, leaf functions
 	may use this area for their entire stack frame, rather than adjusting the stack pointer in the prologue and epilogue. This area is known as
 	the red zone.

The "red zone" specifies an 128 bytes area lower than RSP that will not be modified by signal or interrupt handlers. Thus foo2's local variable g could be stored in "red zone" of foo2 without altering RSP. In addition, leaf functions, those who are not gonna call any other funtions inside its function body, could use "red zone" as their entire stack frame.

Now, let's look at an code example based on the previous one

/* test.c */
#include <stdio.h>

int foo2(int, int);

int foo1(int a, int b, int c, int d, int e, int f, int g, int h) {
	int sum = a + b + c + d + e + f + g + h;
	int ave = sum / 8;
	int count = foo2(sum, ave);
	return count;
}

int foo2(int a, int b) {
	int g = a / b;
	return g;
}

int main() {
	foo1(1, 2, 3, 4, 5, 6, 7, 8);
	return 0;
}

Now we use GDB to disassemble the target functions respectively: (on x86_64 Linux)

main

(gdb) disassemble /m main
Dump of assembler code for function main:
18	long main() {
   0x0000000000400583 <+0>:	push   %rbp
   0x0000000000400584 <+1>:	mov    %rsp,%rbp

19		foo1(1, 2, 3, 4, 5, 6, 7, 8);
   0x0000000000400587 <+4>:	pushq  $0x8
   0x0000000000400589 <+6>:	pushq  $0x7
   0x000000000040058b <+8>:	mov    $0x6,%r9d
   0x0000000000400591 <+14>:	mov    $0x5,%r8d
   0x0000000000400597 <+20>:	mov    $0x4,%ecx
   0x000000000040059c <+25>:	mov    $0x3,%edx
   0x00000000004005a1 <+30>:	mov    $0x2,%esi
   0x00000000004005a6 <+35>:	mov    $0x1,%edi
   0x00000000004005ab <+40>:	callq  0x4004d6 
   0x00000000004005b0 <+45>:	add    $0x10,%rsp

20		return 0;
   0x00000000004005b4 <+49>:	mov    $0x0,%eax

21	}
   0x00000000004005b9 <+54>:	leaveq 
   0x00000000004005ba <+55>:	retq

End of assembler dump.

foo1

(gdb) disassembl /m foo1
Dump of assembler code for function foo1:
6	int foo1(int a, int b, int c, int d, int e, int f, int g, int h) {
   0x00000000004004d6 <+0>:	push   %rbp
   0x00000000004004d7 <+1>:	mov    %rsp,%rbp
   0x00000000004004da <+4>:	sub    $0x30,%rsp
   0x00000000004004de <+8>:	mov    %edi,-0x14(%rbp)
   0x00000000004004e1 <+11>:	mov    %esi,-0x18(%rbp)
   0x00000000004004e4 <+14>:	mov    %edx,-0x1c(%rbp)
   0x00000000004004e7 <+17>:	mov    %ecx,-0x20(%rbp)
   0x00000000004004ea <+20>:	mov    %r8d,-0x24(%rbp)
   0x00000000004004ee <+24>:	mov    %r9d,-0x28(%rbp)

7		int sum = a + b + c + d + e + f + g + h;
   0x00000000004004f2 <+28>:	mov    -0x14(%rbp),%edx
   0x00000000004004f5 <+31>:	mov    -0x18(%rbp),%eax
   0x00000000004004f8 <+34>:	add    %eax,%edx
   0x00000000004004fa <+36>:	mov    -0x1c(%rbp),%eax
   0x00000000004004fd <+39>:	add    %eax,%edx
   0x00000000004004ff <+41>:	mov    -0x20(%rbp),%eax
   0x0000000000400502 <+44>:	add    %eax,%edx
   0x0000000000400504 <+46>:	mov    -0x24(%rbp),%eax
   0x0000000000400507 <+49>:	add    %eax,%edx
   0x0000000000400509 <+51>:	mov    -0x28(%rbp),%eax
   0x000000000040050c <+54>:	add    %eax,%edx
   0x000000000040050e <+56>:	mov    0x10(%rbp),%eax
   0x0000000000400511 <+59>:	add    %eax,%edx
   0x0000000000400513 <+61>:	mov    0x18(%rbp),%eax
   0x0000000000400516 <+64>:	add    %edx,%eax
   0x0000000000400518 <+66>:	mov    %eax,-0xc(%rbp)

8		int ave = sum / 8;
   0x000000000040051b <+69>:	mov    -0xc(%rbp),%eax
   0x000000000040051e <+72>:	lea    0x7(%rax),%edx
   0x0000000000400521 <+75>:	test   %eax,%eax
   0x0000000000400523 <+77>:	cmovs  %edx,%eax
   0x0000000000400526 <+80>:	sar    $0x3,%eax
   0x0000000000400529 <+83>:	mov    %eax,-0x8(%rbp)

9		int count = foo2(sum, ave);
   0x000000000040052c <+86>:	mov    -0x8(%rbp),%edx
   0x000000000040052f <+89>:	mov    -0xc(%rbp),%eax
   0x0000000000400532 <+92>:	mov    %edx,%esi
   0x0000000000400534 <+94>:	mov    %eax,%edi
   0x0000000000400536 <+96>:	callq  0x400543 
   0x000000000040053b <+101>:	mov    %eax,-0x4(%rbp)

10		return count;
   0x000000000040053e <+104>:	mov    -0x4(%rbp),%eax

11	}
   0x0000000000400541 <+107>:	leaveq 
   0x0000000000400542 <+108>:	retq   

End of assembler dump.

As the assembly shown above, arguments a, b, c, d, e, f are restored in registers, and g, h are pushed into stack. Here, foo1 is not a leaf function, thus the GCC did not make the optimization of foo1 by using "red zone".

Here's the stack frame of main calling foo1

       

foo2

(gdb) disassemble /m foo2
Dump of assembler code for function foo2:
13	int foo2(int a, int b) {
   0x0000000000400543 <+0>:	push   %rbp
   0x0000000000400544 <+1>:	mov    %rsp,%rbp
   0x0000000000400547 <+4>:	mov    %edi,-0x14(%rbp)
   0x000000000040054a <+7>:	mov    %esi,-0x18(%rbp)

14		int g = a / b;
   0x000000000040054d <+10>:	mov    -0x14(%rbp),%eax
   0x0000000000400550 <+13>:	cltd   
   0x0000000000400551 <+14>:	idivl  -0x18(%rbp)
   0x0000000000400554 <+17>:	mov    %eax,-0x4(%rbp)

15		return g;
   0x0000000000400557 <+20>:	mov    -0x4(%rbp),%eax

16	}
   0x000000000040055a <+23>:	pop    %rbp
   0x000000000040055b <+24>:	retq   

End of assembler dump.

Here comes the beginning of this article, RSP was not modified in foo2 although there's a local variable in foo2. Because in foo2, GCC choose to use the "red zone" to store local variable g. Here's the stack frame of foo2:

       

Conclusion

Function call in x64 differs a lot from that in x86. The "red zone" is an optimization for leaf functions to avoid adjusting the stack pointer in the prologue and epilogue. x64 has some interesting quirks, which aren't obvious even if you're familiar with 32-bit x86:)