I recently ported upb’s table-based decoder to use setjmp/longjmp-based error handling. I did this largely for code simplicity and readability, so that the non-error code-paths didn’t have to check for errors all the time. But unfortunately I noticed a dramatic 75% performance decrease. What was going on?
A profile in Shark showed the majority of my time being spent in system calls, but didn’t make it clear which system calls were involved. It sounded like a job for dtrace.
I used the following script to dump all the system calls that were being issued by my process:
syscall:::entry /pid == $target/ { @[probefunc] = count(); }
I then ran this script on both my old (fast) benchmark and the new (unexpectedly slow) one:
$ sudo dtrace -c ./old-benchmark -s trace.d exit 1 fstat64 1 ioctl 1 write_nocancel 1 munmap 93 mmap 94 getrusage 7728 $ sudo dtrace -c ./new-benchmark -s trace.d exit 1 fstat64 1 ioctl 1 munmap 1 write_nocancel 1 getrusage 438 sigaltstack 111773 sigreturn 111774 sigprocmask 223546
Sure enough, the new version is making a ton of system calls that weren’t happening before, and they appear to be signal related. I investigated the manpage of setjmp/longjmp and found:
The setjmp()/longjmp() pairs save and restore the signal mask while _setjmp()/_longjmp() pairs save and restore only the register set and the stack. (See sigprocmask(2).)
The sigsetjmp()/siglongjmp() function pairs save and restore the signal mask if the argument savemask is non-zero; otherwise, only the register set and the stack are saved.
I replaced my calls to setjmp/longjmp with sigsetjmp/siglongjmp, passing 0 for my signal mask, and the performance problems went away. Score one for dtrace.