I feel compelled to write this article in Q&A style. Plato I'm not, but maybe it's because there are so many questions to answer.
What is stack unwinding?
In any language that supports recursive functions (ie. pretty much everything except Fortran 77 and Brainf*ck) the language runtime keeps a stack of what functions are currently executing. Stack unwinding is a way of inspecting, and possibly modifying, that stack.
Why would you want to do that?
The answer may seem obvious, but there are several related, yet subtly different, situations where unwinding is useful or necessary:
- As a runtime control-flow mechanism (C++ exceptions, C
- In a debugger, to show the user the stack.
- In a profiler, to take a sample of the stack.
- From the program itself (like from a crash handler to show the stack).
Ok, lay it on me: how do you unwind the stack?
Let's start with the simplest case, which is the
longjmp()function from C (see The Wikipedia entry for setjmp.h for the basics).
longjmp()jumps down the stack to the corresponding
setjmp(), across as many stack frames as there are in between. Unlike the other approaches we will see,
longjmp()does not iterate over the list of frames, it just blasts through them all at once with wanton disregard for anything precious or beautiful that they might have contained.
The implementations of
longjmpare surprisingly simple, though heavily architecture specific. For example, let's take a look at the x86-64 implementation of
setjmp/longjmpfrom the FreeBSD kernel (you can look at the userspace version from libc too, but it is slightly more complicated because it also has to save/restore other status registers). I've converted this to Intel syntax:
_setjmp: mov QWORD PTR [rdi],rbx ; save rbx mov QWORD PTR [rdi+0x8],rsp ; save rsp mov QWORD PTR [rdi+0x10],rbp ; save rbp mov QWORD PTR [rdi+0x18],r12 ; save r12 mov QWORD PTR [rdi+0x20],r13 ; save r13 mov QWORD PTR [rdi+0x28],r14 ; save r14 mov QWORD PTR [rdi+0x30],r15 ; save r15 mov rdx,QWORD PTR [rsp] ; get return address mov QWORD PTR [rdi+0x38],rdx ; save return address xor eax,eax ; return 0 ret _longjmp: mov rbx,QWORD PTR [rdi] ; restore rbx mov rsp,QWORD PTR [rdi+0x8] ; restore rsp mov rbp,QWORD PTR [rdi+0x10] ; restore rbp mov r12,QWORD PTR [rdi+0x18] ; restore r12 mov r13,QWORD PTR [rdi+0x20] ; restore r13 mov r14,QWORD PTR [rdi+0x28] ; restore r14 mov r15,QWORD PTR [rdi+0x30] ; restore r15 mov rdx,QWORD PTR [rdi+0x38] ; get return address mov QWORD PTR [rsp],rdx ; restore return address xor eax,eax inc eax ; return 1 retMaybe it's just me, but I expected these functions to look way more gnarly. All
setjmpis doing is saving a bunch of registers including
rsp(the stack pointer) and the return address into the
jmp_bufarray that was passed as a parameter. All
longjmpis doing is restoring those registers and the return address of the original
setjmpcall. This tweaking of the return address is how the call to
longjmpmanages to "return from" the original
This technique is simple, but so primitive that you might not even call it "unwinding." We are just setting the stack pointer
rspto its old value, which instantly discards the entire contents of the stack between the
setjmpcalls. Any resources that were stored there are leaked! Any other data that was in the middle of being modified might be corrupted, unless you took care to leave it in a consistent state before making the function call that ultimately called
What about the other registers, like
Good eye! The x86-64 architecture has 16 general-purpose registers, but we only saved 7 of them. What about the others? Well recall that
longjmpare implemented as functions and as such they follow the standard x86-64 calling convention. The x86-64 calling convention dictates that the 7 registers above are owned by the caller (also known as callee-save), which means that
setjmpis responsible for restoring these registers before it returns. The other registers are owned by the callee function, the callee is allowed to clobber them however it likes before returning. So we're under no obligation to restore these registers before returning, and the caller can't make any assumptions about what they will contain.
Who decides these calling conventions?
Probably the same gurus who answer all my linker/loader questions. In my mind they all sit in a circle and meditate, reflecting on ancient texts from The Cult of the Bound Variable.
Ok, that makes sense, stack unwinding doesn't seem so bad.
That's because I started with the simplest case. Every other stack-unwinding mechanism has to walk the stack, frame-by-frame.
How do you walk the list of frames?
There are two main ways. I'll start with the simpler one. If you've ever looked at a function in assembly, you might have seen a prologue and epilogue that look like this:
push rbp mov rbp, rsp ; other stuff... pop rbp retI remember when I was taking assembly language in college I could not understand why I had to do this. What's the point of having both a base pointer and a stack pointer, both pointing at the stack? As long as you adjust the stack pointer back to where it started before you call
retyou should be fine, right?
It turns out this convention comes in really handy when you want to walk the stack (like to generate a backtrace), because it connects all the stack frames together into a linked list whose head pointer is
rbp! This means that generating the back trace is really as simple as a linked list traversal.
Aha! So that's how debugger backtraces and
backtrace()from libc work!
Actually... no. But that's how they used to work! On 32-bit x86, it used to be that everyone compiled with a frame pointer, because this was the only way of getting stack traces in the debugger. But this scheme has two significant overheads:
- One general-purpose register is used up at all times. On x86, this meant 1 out of only 8 registers was gone. On x86-64, it would be 1 out of 16.
- Every function has to have this prologue and epilogue I showed above.
How can you walk up the stack without a base pointer?
I know, it seems hopeless. Without a base pointer stored in each stack frame, how do you find the previous frame?
What the tools do nowadays is store some information in a different part of the file that provides "unwind information." Unwind information tells you, for an arbitrary instruction:
- how much the stack pointer has been adjusted (or equivalently, how much the unwinder has to adjust it back to get to the base of the frame, which is necessary to find the return address).
- where in the frame the callee-save registers are stored, if they have been clobbered since the function began.
- (optionally) a pointer to a custom "personality function" for unwinding the stack
- lots of other things? I don't fully understand this yet.
That sounds useful! Who defines the format of debug information?
Debug information on most modern UNIX-like OS's is stored in the DWARF format (a clever pun on the ELF object file format). DWARF specifies that this unwind information should go in a section called
I see, so the modern way to specify unwind information is in the
That would be too easy. When the x86-64 ABI was being developed, they decided to specify that debug information would go in the
.eh_framesection, in a slightly different format than is defined by DWARF.
.eh_frameis what all the tools appear to use in practice. And there is a standard, portable library for unwinding based on the
.debug_framemechanism called libunwind.
Gotcha, so gdb and g++'s exception support use libunwind?
No, they have their own unwinder implementations that are completely independent of libunwind (gcc's unwind-dw2.c and gdb's dwarf2-frame.c). Gcc can link against libunwind but uses its own unwinder by default. gdb has a whole bunch of unwinders, including one that scans the machine code to figure out how the frame was setup.
Ok, so what about C++ exceptions?
C++ exceptions use
.eh_frameinformation also. They use this information to not only recreate the frame, but execute code in it (C++ destructors and
catchblocks) and finally to dismantle it before transferring control to the frame with the matching
How on Earth does that work?
I don't really know, but when I find out I'll post another blog entry.