Sunday, April 17, 2011

EINTR and PC loser-ing (The "Worse Is Better" case study)

Richard Gabriel's 1989 essay Worse Is Better is a famous comparison between LISP and Unix/C that pops up from time to time and is guaranteed to spark a spirited discussion. The philosophical argument itself is not something I want to get into right now; I am interested in the technical content of the essay. What always bothered me about this paper is that I never fully understood Gabriel's primary example of a dirty hack vs. "the right thing."

His example is "the PC loser-ing problem," which he describes thus:
Two famous people, one from MIT and another from Berkeley (but working on Unix) once met to discuss operating system issues. The person from MIT was knowledgeable about ITS (the MIT AI Lab operating system) and had been reading the Unix sources. He was interested in how Unix solved the PC loser-ing problem. The PC loser-ing problem occurs when a user program invokes a system routine to perform a lengthy operation that might have significant state, such as IO buffers. If an interrupt occurs during the operation, the state of the user program must be saved. Because the invocation of the system routine is usually a single instruction, the PC of the user program does not adequately capture the state of the process. The system routine must either back out or press forward. The right thing is to back out and restore the user program PC to the instruction that invoked the system routine so that resumption of the user program after the interrupt, for example, re-enters the system routine. It is called PC loser-ing'' because the PC is being coerced into loser mode,'' where loser'' is the affectionate name for user'' at MIT.

The MIT guy did not see any code that handled this case and asked the New Jersey guy how the problem was handled. The New Jersey guy said that the Unix folks were aware of the problem, but the solution was for the system routine to always finish, but sometimes an error code would be returned that signaled that the system routine had failed to complete its action. A correct user program, then, had to check the error code to determine whether to simply try the system routine again. The MIT guy did not like this solution because it was not the right thing.

The New Jersey guy said that the Unix solution was right because the design philosophy of Unix was simplicity and that the right thing was too complex.

When I read this I always had a burning desire to know: how did the story end? How do modern operating systems resolve this problem -- the "dirty hack" way or the "right way?" What part of our modern POSIX interfaces are affected by this question?

There are several things that never made sense to me about this example. First of all, why would you need to abort a system call just because an interrupt occurred? I investigated the Linux source and it seems quite clear that interrupt handlers can return to either the kernel or userspace -- whichever was running when the interrupt fired. So I don't see why you'd need to "coerce" the system into "loser mode" at all.

But let's suppose you accept this as a given -- we will assume that when a hardware interrupt occurs, you must exit to user mode. I still don't see the difficulty in automatically re-invoking the system call. It's true that invoking the system routine is a single instruction, but why is it that "the PC of the user program does not adequately capture the state of the process," as Gabriel's essay states? What other process state do we need to capture? The registers must already be saved when the syscall is entered, because they must be restored even with a completely normal syscall return. So if we want to re-invoke the system routine, it should be as easy as simply re-executing the instruction that made the system call. Right?

The whole example confused me quite a lot until I had the idea to replace "interrupt" in the above description with "signal." This is not such a stretch, since signals are essentially user-space software interrupts. With this small change, everything started to make a lot more sense. If a signal was delivered to a process that was currently inside a system call, that signal handler could invoke a system call itself, which would cause us to re-enter the kernel. I could easily see how the complexity of dealing with this could have led early UNIX implementors to simply abort the original system call before delivering the signal.

But this is only speculation about what UNIX was like in the mid to late 80s when "Worse is Better" was written. I could be completely off the mark in this analysis -- maybe returning to the kernel from a hardware interrupt handler really wasn't implemented at that time. Or maybe saving user state really was difficult for some reason. I'd love to hear from anyone who has more historical context about this. But the essay contains an important clue that seems to reinforce my speculation that it's actually about signals.

UNIX and EINTR

If we look closely at the "Worse is Better" essay, we get a strong clue about what the Unix guy in the story might have been talking about:

The New Jersey guy said that the Unix folks were aware of the problem, but the solution was for the system routine to always finish, but sometimes an error code would be returned that signaled that the system routine had failed to complete its action. A correct user program, then, had to check the error code to determine whether to simply try the system routine again.

As someone who has done a lot of Unix system-level programming, this sounds to me like it must be describing EINTR, the error code in Unix that means "Interrupted system call." To give a quick description of EINTR I'll enlist the help of my trusty copy of "Advanced Programming in the Unix Environment" by W. Richard Stevens:

A characteristic of earlier UNIX systems is that if a process caught a signal while the process was blocked in a "slow" system call, the system call was interrupted. The system call returned an error and errno was set to EINTR. This was done under the assumption that since a signal occurred and the process caught it, there is a good chance that something has happened that should wake up the blocked system call.

[...]

The problem with interrupted system calls is that we now have to handle the error return explicitly. The typical code sequence (assuming a read operation and assuming that we want to restart the read even if it's interrupted) would be

again:  if ((n = read(fd, buf, BUFFSIZE)) < 0) {    if (errno == EINTR)      goto again;  /* just an interrupted system call */    /* handle other errors */  }

This sounds an awful lot like the the New Jersey guy's approach from the story, which required a correct program "to check the error code to determine whether to simply try the system routine again." And there's nothing else in Unix that I've ever heard of that's anything like this. This must be what the New Jersey guy from the story was talking about!

But note that in W. Richard Stevens' explanation this isn't some dirty hack! It's not a case of cutting corners that is justified by favoring implementation simplicity over interface simplicity. Stevens describes it as a deliberate design decision that gives users the capability to abort a long-running system call if you catch a signal in the meantime. Now you could easily see this as a rationalization of a dirty hack ("it's not a bug, it's a feature!"), but it certainly seems plausible that if you catch a signal while you're blocked on a long system call, the signal might make you decide that you don't want to wait for the long system call any more. Indeed, Ulrich Drepper claimed in 2000 that "Returning EINTR is necessary for many applications," though it would have been helpful if he had expanded on this point by giving some examples.

Of course, the price we have paid for this capability is that we have to wrap all of our potentially long system calls in a loop like the example above. If we don't, our system calls can start failing and causing program errors whenever we catch a signal. You may think that you don't use any signals yourself, but are you sure that none of your libraries do? On the flip side, if you're implementing a library you can never know if the main application will use signals or not, so any library that wants to be robust will have to wrap these system calls in a retry loop.

Since the vast majority of programs will always want their system calls to continue even when a signal is received, 4.2BSD (released in 1983) implemented support for automatically retrying most system calls that could previously fail with EINTR. To me this sounds exactly like what the MIT guy in Richard Gabriel's story was saying is "the right thing." In other words, Berkeley UNIX was already doing "the right thing" five years before "Worse is Better" was written!

Modern POSIX APIs allow both behaviors (either restarting the system call automatically or returning EINTR) -- this is controlled by the SA_RESTART flag. The following program illustrates both behaviors:

#include <errno.h>
#include <signal.h>
#include <stdio.h>
#include <string.h>
#include <unistd.h>

char buf[128];
printf("doing read() into buf %pn", buf);
ssize_t ret = read(STDIN_FILENO, buf, sizeof(buf));
if (ret < 0) {
printf("read() for buf %p returned error: %sn", buf, strerror(errno));
} else {
printf("read() for buf %p returned data: %.*s", buf, (int)ret, buf);
}
}

void sighandler(int signo) {
}

int main(int argc, char *argv[]) {
// Register SIGHUP handler.  Pass any argument to get SA_RESTART.
struct sigaction action;
action.sa_handler = &sighandler;
action.sa_flags = (argc > 1) ? SA_RESTART : 0;
sigaction(SIGHUP, &action, NULL);

return 0;
}

Here are the results of running the program three different times. I've bolded the parts where I typed to give the program input on stdin. You can also see where I sent the program a SIGHUP.

$./testdoing read() into buf 0x7ec7959cINPUT FROM TERMINALread() for buf 0x7ec7959c returned data: INPUT FROM TERMINAL$ ./testdoing read() into buf 0x7ef6659creceived signal 1doing read() into buf 0x7ef66204INPUT FROM TERMINALread() for buf 0x7ef66204 returned data: INPUT FROM TERMINALread() for buf 0x7ef6659c returned error: Interrupted system call\$ ./test give_me_sa_restartdoing read() into buf 0x7eb7657creceived signal 1doing read() into buf 0x7eb761e4INPUT FROM TERMINALread() for buf 0x7eb761e4 returned data: INPUT FROM TERMINALINPUT FROM TERMINAL AGAINread() for buf 0x7eb7657c returned data: INPUT FROM TERMINAL AGAIN

Conclusion

You might ask "why all the fuss over a little example?" As I mentioned, my primary motivation in researching all of this was to get to the bottom of this issue and understand how it plays out in modern operating systems.

But if we were going to take all of this information and reflect on the "Worse is Better" argument, my personal observations/conclusions would be:

• The "worse" system (Unix) did indeed do "the right thing" eventually, even if it didn't at first. "Worse is better" systems incrementally improve by responding to user needs. Since users got tired of checking for EINTR, the "worse" system added the functionality for addressing this pain point.

• The whole thing did leave a rather large wart, though -- all Unix programs have to wrap these system calls in an EINTR retry loop unless they can be absolutely sure the process will never catch signals that don't have SA_RESTART set. So there is a price to pay for this incremental evolution.

1. "Sometimes to do what's right you have to do what's wrong."

I'd argue the layering approach the Unix took, is actually the more correct way of handling things. Provide an error code to be explicitly handled, then wrap it up with an optional layer on top. Burying the solution only works when there is only one option to take.

Great write up!

2. Thanks for the comment! I agree that providing EINTR for applications that want it can possibly be useful. But I think the way it's exposed in the API breaks the rule of "make simple things easy and hard things possible." Since it's very uncommon that a program wants to handle EINTR, I think it would be better to have separate versions of the functions (or a boolean parameter to these functions) that lets you enable EINTR only for calls where you want to handle it.

// Standard write() call. Will never return EINTR.
ssize_t write(int fildes, const void *buf, size_t nbyte);

// Interruptible write() call, may return EINTR.
ssize_t interruptible_write(int filedes, const void *buf,
size_t nbyte);

As it's written, since SA_RESTART is a property of the signal handler and not of the interruptible system call itself, all such system calls are potentially interruptible, so they all have to handle EINTR.

3. "Worse is betterâ€ systems incrementally improve by responding to user needs.

Are your conclusions meant to be generalized, and, if so, in what way?

For example, web browsers adopted the Robustness Principle and now we have the messy result of software that has to know how to interpret its formats, rather than distributing the format with the interpreter (Joel Spolsky's rant titled "Martian Headsets" gives a good summary of why this is such a bad approach, and what we are stuck with). The web browser example is actually not mine; it is Alan Kay's classic example of Doghouse and Pyramid architecture from his classic OOPSLA talk, "The Computer Revolution Hasn't Happened Yet" (google it). He essentially argues, implicitly, the Smalltalk way is better -- and before Tim Berners-Lee invented the Web Browser, Xerox PARC had an Object Browser built into the operating system via the Smalltalk system.

4. Hi Josh,

I found this blog entry really interesting... I hadn't thought about the fact that the PC losering problem is really about signals.

One example of where EINTR is important to me is when doing graceful server shutdowns. In particular, if I have a SIGTERM handler that sets a global "shutdown" variable, then I need to check if accept() returned EINTR and if so, did the "shutdown" global variable get set.

I like your idea of having separate system calls that retry if EINTR is returned, but it is simple enough to write those functions in a user program. In fact, it is probably a good practice to wrap most system calls to deal with other notoriously difficult to deal with errors such as ENOMEM.

5. I don't think you're interpreting the behavior of Berkeley Unix correctly with respect to the 'right thing'. The 'right thing' in the original story would have been to somehow save the state of the system call so that the call wouldn't have to be re-initiated by the user program but instead the system could restart it from the point it was interrupted after the interrupt had been handled. The Berkeley approach is the same as the layering approach suggested by the New Jersey guy except that the system does the layering and not the user.

6. An example of when EINTR is necessary: I have a program that listens to connections on a socket until its sole child process exits. The code is simple: it forks off a child and then blocks on accept(). This works because SIGCHLD interrupts the blocked accept(), the signal handler sets a flag, accept() returns EINTR, the program checks the flag, and instantly bails. If the accept() was automatically restarted for me, how would I implement this? I’d have to arrange for explicit abortability of the accept() call in some way – and timeouts aren’t great for this, because they mean that the program will respond to the death of the child process with some average delay. It’s not easy to imagine any other mechanism outside of EINTR that is comparably general without being terribly complex.