EINTR and PC loser-ing (The "Worse Is Better" case study)

Richard Gabriel’s 1989 essay Worse Is Better is a famous comparison between LISP and Unix/C that pops up from time to time and is guaranteed to spark a spirited discussion. The philosophical argument itself is not something I want to get into right now; I am interested in the technical content of the essay. What always bothered me about this paper is that I never fully understood Gabriel’s primary example of a dirty hack vs. “the right thing.”

His example is “the PC loser-ing problem,” which he describes thus:

Two famous people, one from MIT and another from Berkeley (but working on Unix) once met to discuss operating system issues. The person from MIT was knowledgeable about ITS (the MIT AI Lab operating system) and had been reading the Unix sources. He was interested in how Unix solved the PC loser-ing problem. The PC loser-ing problem occurs when a user program invokes a system routine to perform a lengthy operation that might have significant state, such as IO buffers. If an interrupt occurs during the operation, the state of the user program must be saved. Because the invocation of the system routine is usually a single instruction, the PC of the user program does not adequately capture the state of the process. The system routine must either back out or press forward. The right thing is to back out and restore the user program PC to the instruction that invoked the system routine so that resumption of the user program after the interrupt, for example, re-enters the system routine. It is called ``PC loser-ing'' because the PC is being coerced into ``loser mode,'' where ``loser'' is the affectionate name for ``user'' at MIT. The MIT guy did not see any code that handled this case and asked the New Jersey guy how the problem was handled. The New Jersey guy said that the Unix folks were aware of the problem, but the solution was for the system routine to always finish, but sometimes an error code would be returned that signaled that the system routine had failed to complete its action. A correct user program, then, had to check the error code to determine whether to simply try the system routine again. The MIT guy did not like this solution because it was not the right thing. The New Jersey guy said that the Unix solution was right because the design philosophy of Unix was simplicity and that the right thing was too complex.

When I read this I always had a burning desire to know: how did the story end? How do modern operating systems resolve this problem – the “dirty hack” way or the “right way?” What part of our modern POSIX interfaces are affected by this question?

There are several things that never made sense to me about this example. First of all, why would you need to abort a system call just because an interrupt occurred? I investigated the Linux source and it seems quite clear that interrupt handlers can return to either the kernel or userspace – whichever was running when the interrupt fired. So I don’t see why you’d need to “coerce” the system into “loser mode” at all.

But let’s suppose you accept this as a given – we will assume that when a hardware interrupt occurs, you must exit to user mode. I still don’t see the difficulty in automatically re-invoking the system call. It’s true that invoking the system routine is a single instruction, but why is it that “the PC of the user program does not adequately capture the state of the process,” as Gabriel’s essay states? What other process state do we need to capture? The registers must already be saved when the syscall is entered, because they must be restored even with a completely normal syscall return. So if we want to re-invoke the system routine, it should be as easy as simply re-executing the instruction that made the system call. Right?

The whole example confused me quite a lot until I had the idea to replace “interrupt” in the above description with “signal.” This is not such a stretch, since signals are essentially user-space software interrupts. With this small change, everything started to make a lot more sense. If a signal was delivered to a process that was currently inside a system call, that signal handler could invoke a system call itself, which would cause us to re-enter the kernel. I could easily see how the complexity of dealing with this could have led early UNIX implementors to simply abort the original system call before delivering the signal.

But this is only speculation about what UNIX was like in the mid to late 80s when “Worse is Better” was written. I could be completely off the mark in this analysis – maybe returning to the kernel from a hardware interrupt handler really wasn’t implemented at that time. Or maybe saving user state really was difficult for some reason. I’d love to hear from anyone who has more historical context about this. But the essay contains an important clue that seems to reinforce my speculation that it’s actually about signals.

UNIX and EINTR

If we look closely at the “Worse is Better” essay, we get a strong clue about what the Unix guy in the story might have been talking about:

The New Jersey guy said that the Unix folks were aware of the problem, but the solution was for the system routine to always finish, but sometimes an error code would be returned that signaled that the system routine had failed to complete its action. A correct user program, then, had to check the error code to determine whether to simply try the system routine again.

As someone who has done a lot of Unix system-level programming, this sounds to me like it must be describing EINTR, the error code in Unix that means “Interrupted system call.” To give a quick description of EINTR I’ll enlist the help of my trusty copy of “Advanced Programming in the Unix Environment” by W. Richard Stevens:

A characteristic of earlier UNIX systems is that if a process caught a signal while the process was blocked in a "slow" system call, the system call was interrupted. The system call returned an error and `errno` was set to `EINTR`. This was done under the assumption that since a signal occurred and the process caught it, there is a good chance that something has happened that should wake up the blocked system call. [...] The problem with interrupted system calls is that we now have to handle the error return explicitly. The typical code sequence (assuming a read operation and assuming that we want to restart the read even if it's interrupted) would be
again:
  if ((n = read(fd, buf, BUFFSIZE)) < 0) {
    if (errno == EINTR)
      goto again;  /* just an interrupted system call */
    /* handle other errors */
  }

This sounds an awful lot like the the New Jersey guy’s approach from the story, which required a correct program “to check the error code to determine whether to simply try the system routine again.” And there’s nothing else in Unix that I’ve ever heard of that’s anything like this. This must be what the New Jersey guy from the story was talking about!

But note that in W. Richard Stevens’ explanation this isn’t some dirty hack! It’s not a case of cutting corners that is justified by favoring implementation simplicity over interface simplicity. Stevens describes it as a deliberate design decision that gives users the capability to abort a long-running system call if you catch a signal in the meantime. Now you could easily see this as a rationalization of a dirty hack (“it’s not a bug, it’s a feature!”), but it certainly seems plausible that if you catch a signal while you’re blocked on a long system call, the signal might make you decide that you don’t want to wait for the long system call any more. Indeed, Ulrich Drepper claimed in 2000 that “Returning EINTR is necessary for many applications,” though it would have been helpful if he had expanded on this point by giving some examples.

Of course, the price we have paid for this capability is that we have to wrap all of our potentially long system calls in a loop like the example above. If we don’t, our system calls can start failing and causing program errors whenever we catch a signal. You may think that you don’t use any signals yourself, but are you sure that none of your libraries do? On the flip side, if you’re implementing a library you can never know if the main application will use signals or not, so any library that wants to be robust will have to wrap these system calls in a retry loop.

Since the vast majority of programs will always want their system calls to continue even when a signal is received, 4.2BSD (released in 1983) implemented support for automatically retrying most system calls that could previously fail with EINTR. To me this sounds exactly like what the MIT guy in Richard Gabriel’s story was saying is “the right thing.” In other words, Berkeley UNIX was already doing “the right thing” five years before “Worse is Better” was written!

Modern POSIX APIs allow both behaviors (either restarting the system call automatically or returning EINTR) – this is controlled by the SA_RESTART flag. The following program illustrates both behaviors:

#include <errno.h>
#include <signal.h>
#include <stdio.h>
#include <string.h>
#include <unistd.h>

void doread() {
  char buf[128];
  printf("doing read() into buf %pn", buf);
  ssize_t ret = read(STDIN_FILENO, buf, sizeof(buf));
  if (ret < 0) {
    printf("read() for buf %p returned error: %sn", buf, strerror(errno));
  } else {
    printf("read() for buf %p returned data: %.*s", buf, (int)ret, buf);
  }
}

void sighandler(int signo) {
  printf("received signal %dn", signo);
  doread();
}

int main(int argc, char *argv[]) {
  // Register SIGHUP handler.  Pass any argument to get SA_RESTART.
  struct sigaction action;
  action.sa_handler = &sighandler;
  sigemptyset(&action.sa_mask);
  action.sa_flags = (argc > 1) ? SA_RESTART : 0;
  sigaction(SIGHUP, &action, NULL);

  doread();
  return 0;
}

Here are the results of running the program three different times. I’ve bolded the parts where I typed to give the program input on stdin. You can also see where I sent the program a SIGHUP.

$ ./test
doing read() into buf 0x7ec7959c
**INPUT FROM TERMINAL**
read() for buf 0x7ec7959c returned data: INPUT FROM TERMINAL
$ ./test
doing read() into buf 0x7ef6659c
received signal 1
doing read() into buf 0x7ef66204
**INPUT FROM TERMINAL**
read() for buf 0x7ef66204 returned data: INPUT FROM TERMINAL
read() for buf 0x7ef6659c returned error: Interrupted system call
$ ./test give_me_sa_restart
doing read() into buf 0x7eb7657c
received signal 1
doing read() into buf 0x7eb761e4
**INPUT FROM TERMINAL**
read() for buf 0x7eb761e4 returned data: INPUT FROM TERMINAL
**INPUT FROM TERMINAL AGAIN**
read() for buf 0x7eb7657c returned data: INPUT FROM TERMINAL AGAIN

Conclusion

You might ask “why all the fuss over a little example?” As I mentioned, my primary motivation in researching all of this was to get to the bottom of this issue and understand how it plays out in modern operating systems.

But if we were going to take all of this information and reflect on the “Worse is Better” argument, my personal observations/conclusions would be:

The "worse" system (Unix) did indeed do "the right thing" eventually, even if it didn't at first. "Worse is better" systems incrementally improve by responding to user needs. Since users got tired of checking for EINTR, the "worse" system added the functionality for addressing this pain point.
The whole thing did leave a rather large wart, though -- all Unix programs have to wrap these system calls in an EINTR retry loop unless they can be absolutely sure the process will never catch signals that don't have SA_RESTART set. So there is a price to pay for this incremental evolution.