Monday, April 9, 2012

It's not a bug, it's a subtle unsupported corner case

Twice in the last two weeks I've had the unpleasant experience of reporting bugs, only to be told that it was my fault for triggering the broken behavior.

First it was GCC. I was writing some inline assembly, experimenting with writing my own implementation of setjmp/longjmp. Since setjmp() has to behave as if all registers could be clobbered, I tried listing all registers (including rsp, the stack pointer) as being clobbered. I was curious what GCC would do, because I would expect it to need rsp to find the rest of the saved register values. If you clobber all registers, how can you find anything?

I was disappointed to find that GCC silently ignored that I had listed rsp in the clobber list. It emitted code to restore all other registers (except rip obviously), but it depended on having an unadulterated rsp. So I filed a bug: %rsp in clobber list is silently ignored.

It was immediately closed with the comment "The compiler doesn't analyse asm string." This is true, but irrelevant to my situation. This is another pet peeve, by the way. I take a lot of care to create well-explained bug reports with minimal test cases. I'll often spend 30 minutes to an hour creating a bug report. I should be a software maintainer's dream come true. So I'm annoyed when my bug reports are closed out-of-hand by someone who hasn't even made the effort to understand what I was saying.

I reopened the bug and explained in greater detail why this was incorrect behavior. The response this time was:
%rsp is considered a "fixed" register, used for fixed purposes all throughout
the compiled code and are therefore not available for general allocation.

So, save %rsp at the beginning of your asm code and restore it at the end.
This is a textbook response of "it's not a bug, it's a subtle unsupported corner case." GCC has a feature called the clobber list. This clobber list works correctly in most cases, but cannot work in the case of %rsp. The maintainer has given the technical reason why this is the case. But instead of doing something to fix this corner case, he tells me how to work around this broken behavior.

But a bug with a workaround is still a bug. A person who puts %rsp in their clobber list and expects it to work will only discover why it doesn't work if they inspect GCC's generated code. It's not documented anywhere that you can't use %rsp in your clobber list. And why even document that when you could just as easily throw an error at compile-time that %rsp is not supported as a clobbered register?

I tried to explain this in the bug. The bug at least stayed open this time, but is still listed as "UNCONFIRMED."

Then yesterday I reported a bug that can crash the Lua interpreter. Lua is an interpreted language that generally supports the expectation that no Lua program should be able to SEGV the interpreter. I spent 30 minutes or more reducing this crash to a very short test case that can trigger the SEGV. There is a case where Lua will unload a .so that you loaded with "require" even though you still have a reference to it. When you call into it, the Lua interpreter jumps to unmapped memory and the process crashes.

The first substantive response I got was similarly dismissive. (To be fair, I don't believe the responder speaks for the Lua team). The responder argued that the collection order was correct even though I was managing to call into a collected library, and suggested a workaround.

(Update: One of the Lua authors acknowledged that this is indeed a bug that needs fixing, which I was very happy to hear.)

Again, even a bug with a workaround is a bug. There's no documentation anywhere that this scenario can cause a SEGV, and in the context of a large program there's no easy way to know if or how you are triggering this case.

The responses to both of these bugs were very implementation-focused: "the reason you're experiencing this behavior is for technical reason X." They don't consider the question of what the behavior ought to be from a user's perspective. They suggest workarounds that do indeed work, but that a user would have no reason to try or any reason to suspect that the intended behavior is completely broken.

I don't think this approach is acceptable when it comes to software, particularly low-level system software. It shows a disregard for other people's time. It requires users to respond to unexpected behavior by doing a deep dive into the implementation of their tools to see if they triggered some subtle unsupported (and undocumented) corner case. Particularly in the GCC case where it seems so easy to just emit a compile error in this case, there's no good reason not to do so.

10 comments:

  1. Congratulations! You have discovered the GCC bug tracker (the popular German blogger Fefe from time to time posts similar bug reports and rants about the mentality in the GCC and also the glibc project). Seriously: They should properly handle bug reports and should be thankful about people devoting their time to it (such bugs with rarely used features and corner cases).

    ReplyDelete
  2. I agree completely. I have also on occasion taken a few hours to develop simple test cases for compiler bugs - that took me days to hunt down and figure out (within the context of highly complex systems). To then simply have these brushed off as insignificant is insulting and extremely annoying.

    ReplyDelete
  3. Well argued, and I do agree with your assessment.
    Just one note, I think most developers disregard the time of other developers, most of us have more than enough time leaches to worry about other people.

    ReplyDelete
  4. I agree with you on the Lua bug. However, in case of gcc, IMO it's a matter of spec. If gcc specifies this behavior, it's not a bug - it's maybe an odd or uninspired decision of the gcc maintainers, possibly even some breakage of a standard, but it's nevertheless the spec.

    ReplyDelete
  5. In my company, the web developers call a bug that has been around for 5 years (no lie) an "undocumented feature" and then file the bug with other "undocumented features" in an Outlook folder and forget about it until the bug rears it's head during testing on another project. There the process begins again.

    Good article.

    p.s., Where is that picture taken on your header? Reminds me of growing up on my dad's farm.

    ReplyDelete
  6. Very good articule. I agree 100 %.
    My opinion: they should listen more carefully to the users ! ! ! ! !
    Especially a compiler should give warning and errors whenever possible !

    ReplyDelete
  7. No update that the Lua authors acknowledged the ( hmm let's call it an error) Lua error and plan to fix it? You are correct that the original responders do not talk for the Language yet they are part of the community and were just trying to help.

    ReplyDelete
  8. There's a tradeoff between fixing obscure bugs that might take a lot of time to fix and affect very little amount of functionality and can be worked around for, and between developing the software for the major new necessary features... a bug report might be great, clear, and reproducible, but it doesn't take into account any work that might go into locating and implementing the proper fix.

    ReplyDelete
  9. I agree with everything you say Vadim. But there is a big difference between:

    "Confirmed that this is a bug, but it's not prioritized to fix."

    and

    "I'm closing this, it's not a bug."

    ReplyDelete
    Replies
    1. Agree with the sentiment in your post so much. I am tired of having bug reports (especially for open source projects) closed as "not a bug" despite behavior that clearly contravenes what is documented. Putting together a bug report (especially with a test case) takes some effort. It's particularly galling when they ask for a reproducable test case and _then_, once you go to lengths to produce one, claim it's not a bug. It's also sad how many bug reports are left to languish - frankly, if you're going to have bug database that allows the public to enter bugs, you should make an effort to fix the legitimate ones (and personally I think bug fixes should always have higher priority than new development).

      Delete