Using dtrace on OS X to debug a performance problem

I recently ported upb’s table-based decoder to use setjmp/longjmp-based error handling. I did this largely for code simplicity and readability, so that the non-error code-paths didn’t have to check for errors all the time. But unfortunately I noticed a dramatic 75% performance decrease. What was going on?

A profile in Shark showed the majority of my time being spent in system calls, but didn’t make it clear which system calls were involved. It sounded like a job for dtrace.

I used the following script to dump all the system calls that were being issued by my process:

syscall:::entry
/pid == $target/
{
     @[probefunc] = count();
}

I then ran this script on both my old (fast) benchmark and the new (unexpectedly slow) one:

$ sudo dtrace -c ./old-benchmark -s trace.d

  exit                                             1
  fstat64                                          1
  ioctl                                            1
  write_nocancel                                   1
  munmap                                          93
  mmap                                            94
  getrusage                                     7728

$ sudo dtrace -c ./new-benchmark -s trace.d

  exit                                             1
  fstat64                                          1
  ioctl                                            1
  munmap                                           1
  write_nocancel                                   1
  getrusage                                      438
  sigaltstack                                 111773
  sigreturn                                   111774
  sigprocmask                                 223546

Sure enough, the new version is making a ton of system calls that weren’t happening before, and they appear to be signal related. I investigated the manpage of setjmp/longjmp and found:

The setjmp()/longjmp() pairs save and restore the signal mask while _setjmp()/_longjmp() pairs save and restore only the register set and the stack. (See sigprocmask(2).) The sigsetjmp()/siglongjmp() function pairs save and restore the signal mask if the argument savemask is non-zero; otherwise, only the register set and the stack are saved.

I replaced my calls to setjmp/longjmp with sigsetjmp/siglongjmp, passing 0 for my signal mask, and the performance problems went away. Score one for dtrace.