It's not a bug, it's a subtle unsupported corner case

Twice in the last two weeks I’ve had the unpleasant experience of reporting bugs, only to be told that it was my fault for triggering the broken behavior.

First it was GCC. I was writing some inline assembly, experimenting with writing my own implementation of setjmp/longjmp. Since setjmp() has to behave as if all registers could be clobbered, I tried listing all registers (including rsp, the stack pointer) as being clobbered. I was curious what GCC would do, because I would expect it to need rsp to find the rest of the saved register values. If you clobber all registers, how can you find anything?

I was disappointed to find that GCC silently ignored that I had listed rsp in the clobber list. It emitted code to restore all other registers (except rip obviously), but it depended on having an unadulterated rsp. So I filed a bug: %rsp in clobber list is silently ignored.

It was immediately closed with the comment “The compiler doesn’t analyse asm string.” This is true, but irrelevant to my situation. This is another pet peeve, by the way. I take a lot of care to create well-explained bug reports with minimal test cases. I’ll often spend 30 minutes to an hour creating a bug report. I should be a software maintainer’s dream come true. So I’m annoyed when my bug reports are closed out-of-hand by someone who hasn’t even made the effort to understand what I was saying.

I reopened the bug and explained in greater detail why this was incorrect behavior. The response this time was:

%rsp is considered a “fixed” register, used for fixed purposes all throughout the compiled code and are therefore not available for general allocation.

So, save %rsp at the beginning of your asm code and restore it at the end.

This is a textbook response of “it’s not a bug, it’s a subtle unsupported corner case.” GCC has a feature called the clobber list. This clobber list works correctly in most cases, but cannot work in the case of %rsp. The maintainer has given the technical reason why this is the case. But instead of doing something to fix this corner case, he tells me how to work around this broken behavior.

But a bug with a workaround is still a bug. A person who puts %rsp in their clobber list and expects it to work will only discover why it doesn’t work if they inspect GCC’s generated code. It’s not documented anywhere that you can’t use %rsp in your clobber list. And why even document that when you could just as easily throw an error at compile-time that %rsp is not supported as a clobbered register?

I tried to explain this in the bug. The bug at least stayed open this time, but is still listed as “UNCONFIRMED.”

Then yesterday I reported a bug that can crash the Lua interpreter. Lua is an interpreted language that generally supports the expectation that no Lua program should be able to SEGV the interpreter. I spent 30 minutes or more reducing this crash to a very short test case that can trigger the SEGV. There is a case where Lua will unload a .so that you loaded with “require” even though you still have a reference to it. When you call into it, the Lua interpreter jumps to unmapped memory and the process crashes.

The first substantive response I got was similarly dismissive. (To be fair, I don’t believe the responder speaks for the Lua team). The responder argued that the collection order was correct even though I was managing to call into a collected library, and suggested a workaround.

(Update: One of the Lua authors acknowledged that this is indeed a bug that needs fixing, which I was very happy to hear.)

Again, even a bug with a workaround is a bug. There’s no documentation anywhere that this scenario can cause a SEGV, and in the context of a large program there’s no easy way to know if or how you are triggering this case.

The responses to both of these bugs were very implementation-focused: “the reason you’re experiencing this behavior is for technical reason X.” They don’t consider the question of what the behavior ought to be from a user’s perspective. They suggest workarounds that do indeed work, but that a user would have no reason to try or any reason to suspect that the intended behavior is completely broken.

I don’t think this approach is acceptable when it comes to software, particularly low-level system software. It shows a disregard for other people’s time. It requires users to respond to unexpected behavior by doing a deep dive into the implementation of their tools to see if they triggered some subtle unsupported (and undocumented) corner case. Particularly in the GCC case where it seems so easy to just emit a compile error in this case, there’s no good reason not to do so.