Binary Exploitation 101 - SSP (Stack Smashing Protector)
This blog series is still a work in progress. The content may change without notice.
In this chapter, we’ll learn about SSP (Stack Smashing Protector) and its bypass. The materials for this chapter can be found in the chapter_06 folder.
Introduction
As we learned in the previous chapter, NX bit can prevent attacks using shellcode, but it can be bypassed using ROP. Then, is there a better way? Remember that the attacks so far work by overwriting the return address of the main
function. If we can detect any tampering with the return address, we can stop these attacks before they succeed. This is the idea behind SSP.
SSP
SSP (Stack Smashing Protector) is a security mechanism that detects tampering with the return address. In the previous chapters, we disabled this by passing the -fno-stack-protector
option at compile time. Looking at this chapter’s chal.c
, the source code is exactly the same as in the previous chapter, but that option is gone:
When SSP is enabled, the following code is inserted at the beginning and end of the main
function (you can check this using the objdump
command):
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
0000000000401278 <main>:
401278: f3 0f 1e fa endbr64
40127c: 55 push rbp
40127d: 48 89 e5 mov rbp,rsp
401280: 48 83 ec 30 sub rsp,0x30
401284: 64 48 8b 04 25 28 00 mov rax,QWORD PTR fs:0x28
40128b: 00 00
40128d: 48 89 45 f8 mov QWORD PTR [rbp-0x8],rax
...
4012f6: 48 8b 55 f8 mov rdx,QWORD PTR [rbp-0x8]
4012fa: 64 48 2b 14 25 28 00 sub rdx,QWORD PTR fs:0x28
401301: 00 00
401303: 74 05 je 40130a <main+0x92>
401305: e8 a6 fd ff ff call 4010b0 <__stack_chk_fail@plt>
40130a: c9 leave
40130b: c3 ret
The fs register is a segment register on x86-64, and glibc uses it to hold the address of the TLS (Thread Local Storage). The actual TLS object is struct pthread
. This structure has a header of type tcphead_t
at its beginning. The value at fs:0x28
is stack_guard
:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
typedef struct
{
void *tcb; /* Pointer to the TCB. Not necessarily the
thread descriptor used by libpthread. */
dtv_t *dtv;
void *self; /* Pointer to the thread descriptor. */
int multiple_threads;
int gscope_flag;
uintptr_t sysinfo;
uintptr_t stack_guard;
uintptr_t pointer_guard;
...
} tcbhead_t;
/* GCC generates %fs:0x28 to access the stack guard. */
_Static_assert (offsetof (tcbhead_t, stack_guard) == 0x28,
"stack guard offset");
In dynamically linked binaries, stack_guard
is initialized in the security_init
function, which is called by the dl_main
function:
1
2
3
4
5
6
7
static void
security_init (void)
{
/* Set up the stack checker's canary. */
uintptr_t stack_chk_guard = _dl_setup_stack_chk_guard (_dl_random);
#ifdef THREAD_SET_STACK_GUARD
THREAD_SET_STACK_GUARD (stack_chk_guard);
The _dl_setup_stack_chk_guard
function reads 8 bytes from _dl_random
and returns a value with the least significant byte set to 0. This is the value of stack_guard
:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
static inline uintptr_t __attribute__ ((always_inline))
_dl_setup_stack_chk_guard (void *dl_random)
{
union
{
uintptr_t num;
unsigned char bytes[sizeof (uintptr_t)];
} ret;
/* We need in the moment only 8 bytes on 32-bit platforms and 16
bytes on 64-bit platforms. Therefore we can use the data
directly and not use the kernel-provided data to seed a PRNG. */
memcpy (ret.bytes, dl_random, sizeof (ret));
#if BYTE_ORDER == LITTLE_ENDIAN
ret.num &= ~(uintptr_t) 0xff;
#elif BYTE_ORDER == BIG_ENDIAN
ret.num &= ~((uintptr_t) 0xff << (8 * (sizeof (ret) - 1)));
#else
# error "BYTE_ORDER unknown"
#endif
return ret.num;
}
_dl_random
is a pointer to a 16-byte buffer, which is initialized with random values by the kernel in the create_elf_tables
function. The pointer to this buffer is then passed to the dynamic linker through the auxiliary vector, and the dynamic linker sets _dl_random
to this pointer in the _dl_parse_auxv
function:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
static int
create_elf_tables(struct linux_binprm *bprm, const struct elfhdr *exec,
unsigned long interp_load_addr,
unsigned long e_entry, unsigned long phdr_addr)
{
...
/*
* Generate 16 random bytes for userspace PRNG seeding.
*/
get_random_bytes(k_rand_bytes, sizeof(k_rand_bytes));
u_rand_bytes = (elf_addr_t __user *)
STACK_ALLOC(p, sizeof(k_rand_bytes));
if (copy_to_user(u_rand_bytes, k_rand_bytes, sizeof(k_rand_bytes)))
return -EFAULT;
...
NEW_AUX_ENT(AT_RANDOM, (elf_addr_t)(unsigned long)u_rand_bytes);
...
1
2
3
4
5
static inline
void _dl_parse_auxv (ElfW(auxv_t) *av, dl_parse_auxv_t auxv_values)
{
...
_dl_random = (void *) auxv_values[AT_RANDOM];
In summary, at the beginning of the main
function, the random value generated by the kernel is copied into rbp - 0x8
. This value is called the stack canary:
1
2
3
401284: 64 48 8b 04 25 28 00 mov rax,QWORD PTR fs:0x28
40128b: 00 00
40128d: 48 89 45 f8 mov QWORD PTR [rbp-0x8],rax
But why does this allow us to detect modifications to the return address? Let’s check with GDB. Start GDB with the following command:
1
pwndbg -q --ex 'b main' --ex 'r' ./chal_patched
Use disass
to display the code of the main
function, set a breakpoint at the end with b *0x4012f6
, and continue execution with c
. Next, check the stack frame and the value of the rbp register using x/8xg $rsp
and i r rbp
. We can see that a random value is stored at rbp - 0x8
. Finally, by running x/xg $fs_base + 0x28
to inspect the value at fs:0x28
, and we can see that it matches the value stored at rbp - 0x8
:
Here, note that the stack canary is located above the variable buf
(i.e., at a higher address). As we saw earlier, the end of the main function contains the following instructions:
1
2
3
4
5
4012f6: 48 8b 55 f8 mov rdx,QWORD PTR [rbp-0x8]
4012fa: 64 48 2b 14 25 28 00 sub rdx,QWORD PTR fs:0x28
401301: 00 00
401303: 74 05 je 40130a <main+0x92>
401305: e8 a6 fd ff ff call 4010b0 <__stack_chk_fail@plt
Here, the value at rbp - 0x8
is read and compared with the value at fs:0x28
. If they do not match, the process is terminated by the __stack_chk_fail
function. Since the return address of the main function is located above the stack canary, any buffer overflow that simply overwrites the return address will trigger this check and cause the program to terminate. This is how SSP detects tampering with the return address.
By giving a large input to chal_patched
, we can see that the canary gets overwritten and the program exits:
SSP Bypass
With SSP enabled, simply overwriting the return address via a buffer overflow, as we did before, will be detected. So how can we bypass SSP and still get a shell? The key point is that SSP relies on the assumption that if the return address of the main
function is corrupted, the stack canary will also be corrupted. As long as this assumption holds, checking the stack canary allows the program to detect tampering with the return address.
However, if an attacker knows the value of the stack canary, this assumption can be broken. By writing the original stack canary back to rbp - 0x8
while overwriting the return address of the main
function, the attack can succeed. Since the stack canary remains unchanged, SSP does not detect the attack. This type of attack is possible because SSP detects tampering with the return address indirectly.
Exercise
Based on what you have learned so far, write an exploit that bypasses SSP and launches a shell. Before you start, make sure to execute the following command to disable ASLR (if you are using a Docker container, run it on the host):
1
sudo sysctl -w kernel.randomize_va_space=0
You can use the template, and the following hints may help. If successful, you should be able to launch a shell like this:
If you have any questions, feel free to leave a comment below. You can see my solution here.
If you have time, try to think about why the _dl_setup_stack_chk_guard
function sets the least significant byte of the random value to 0. (Hint: What would happen if an attacker tried to leak this value using a function like puts
?) Also, it is a good exercise to consider other ways to bypass SSP.
Hints
- The value of the stack canary can be checked from the output of the
dump_stack
function.