Using dtrace on OS X to debug a performance problem
I recently ported upb’s table-based decoder to use
setjmp/longjmp
-based error handling. I did this largely for code
simplicity and readability, so that the non-error code-paths didn’t
have to check for errors all the time. But unfortunately I noticed a
dramatic 75% performance decrease. What was going on?
A profile in Shark showed the majority of my time being spent in system calls, but didn’t make it clear which system calls were involved. It sounded like a job for dtrace.
I used the following script to dump all the system calls that were being issued by my process:
syscall:::entry
/pid == $target/
{
@[probefunc] = count();
}
I then ran this script on both my old (fast) benchmark and the new (unexpectedly slow) one:
$ sudo dtrace -c ./old-benchmark -s trace.d
exit 1
fstat64 1
ioctl 1
write_nocancel 1
munmap 93
mmap 94
getrusage 7728
$ sudo dtrace -c ./new-benchmark -s trace.d
exit 1
fstat64 1
ioctl 1
munmap 1
write_nocancel 1
getrusage 438
sigaltstack 111773
sigreturn 111774
sigprocmask 223546
Sure enough, the new version is making a ton of system calls that
weren’t happening before, and they appear to be signal related. I
investigated the manpage of setjmp/longjmp
and found:
The setjmp()/longjmp() pairs save and restore the signal mask while _setjmp()/_longjmp() pairs save and restore only the register set and the stack. (See sigprocmask(2).) The sigsetjmp()/siglongjmp() function pairs save and restore the signal mask if the argument savemask is non-zero; otherwise, only the register set and the stack are saved.
I replaced my calls to setjmp/longjmp
with sigsetjmp/siglongjmp
,
passing 0 for my signal mask, and the performance problems went away.
Score one for dtrace.