Brilliant programmers

Posted by josh at February 2nd, 2008

Last November Bruce Eckel (the “Thinking in $FOO” guy) gave a speech claiming that 5% of programmers are 20x more productive than the other 95%. Some people flat out denied that was possible. I think it’s a mistake to think of “20x more productive” as “well, Billy wrote class Foo in 3 hours so I guess a superstar programmer would have written the same class in 9 minutes.” It’s not like that at all.

To me, what makes that 5% so bad-ass is that they write code that the mediocre programmer will never write, not if you let them work until the heat death of all the stars in the sky.

I want to take a moment and recognize a few of the people I consider to be the most bad-ass programmers around. This will necessarily be people whose bad-ass work is open source.

  • Fabrice Bellard. This guy is an animal. He wrote QEMU, an hardware emulator that achieves near-native performance by using dynamic translation (translating code from one instruction set to another on-the-fly). He’s won the obfuscated C code contest twice, once with a C compiler capable of compiling itself (in less than 4k, according to the rules of the contest). He later adapted this compiler to a boot disk in which the boot loader loads and compiles the entire Linux kernel (in approximately 15 seconds on modern hardware) and proceeds to boot from it. Also, though its not exactly computer programming, in 1997 he discovered the fastest known algorithm for computing an arbitrary digit of pi.
  • Julian Seward. He wrote Valgrind, which started as the best memory debugger money can buy, and has grown into far more. He also wrote bzip2, as well as GHC, the most popular native code compiler for Haskell.
  • Mike Pall. You want this guy working on your programming language. He created LuaJIT, one of the best and lightweight JITs available for any dynamic language today. And his plans for where he’s going to take LuaJIT in the next year are going to put Lua so far ahead of other dynamic language’s implementations that it hurts.
  • Linus Torvalds: this one is almost too obvious to include, but I have to mention it because with Git he showed that Linux wasn’t just a fluke. He didn’t just get lucky by being in the right place at the right time. This guy’s the real deal.
  • anyone who has ever won the Obfuscated C Code Contest. Maybe not the must useful code ever, but what these people manage to pull off in less than 4k of code impresses me beyond measure. Special mention to Brian Westley, who has won it nine times! My favorite entry: A letter from Charlie to Charlotte.

Guys like this singlehandedly turn out code that 99.9% of programmers will simply never write. More time won’t help, and putting them into teams of people all working on the problem certainly won’t help. There’s just a very small subset of people who will ever create such brilliant solutions to such hard problems, and I have nothing but respect and awe for people of this caliber.

Posted in Uncategorized| 3 Comments | 

New Job!

Posted by josh at November 16th, 2007

A bit of personal news: I just accepted a job offer at Google Fremont (Seattle)! I’m very excited about it, and I’ll be starting on my birthday (Jan 7).

In other news, my Parsing Framework (which I’m calling by a codename: Philodendron) is making really great progress. I’ve written code to dump lexing DFAs to byte-code, and a program that can read the byte-code and spit out diagrams of the DFAs in dot format (Graphviz). I’ve converted to Git (more about that another time), and the repository is here:

http://repo.or.cz/w/philodendron.git

I’m going to refrain from any estimates about when I’ll release something, but it is truly making progress and getting closer to something usable all the time.

Posted in Uncategorized| 3 Comments | 

DJB hating on the principle of least privilege?

Posted by josh at November 3rd, 2007

Dan Bernstein, famous security expert and author of qmail, just published a paper called Some thoughts on security after ten years of qmail 1.0. The paper reflects on ten years of qmail, which has a very impressive security record, and explains Dan’s philosophy behind writing secure software.

Most of the content is what you would expect, but I found one of Dan’s conclusions quite surprising. He argues against the principle of least privilege, going so far as to call it “fundamentally wrong.”

I have become convinced that this “principle of least privilege” is fundamentally wrong. Minimizing privilege might reduce the damage done by some security holes but almost never fixes the holes. Minimizing privilege is not the same as minimizing the amount of trusted code, does not have the same benefits as minimizing the amount of trusted code, and does not move us any closer to a secure computer system

To understand DJB’s argument here, you have to back up a bit and understand what he means by “trusted code.” In his view, there are only two buckets into which code falls: trusted code and untrusted code. It is a binary thing — there is nothing in between. And here is his definition for untrusted code:

We can architect computer systems to place most of the code into untrusted prisons. “Untrusted” means that code in these prisons — no matter what the code does, no matter how badly it behaves, no matter how many bugs it has — cannot violate the user’s security requirements.

This is an incredibly restrictive definition. For code to qualify as “untrusted” in this view, a bug that gives an attacker full control over the compromised code must have no security impact over the system as a whole. In practical terms, this means that the code must satisfy two criteria:

  • It must not have direct access to any resources that could possibly be sensitive from a security perspective. For example, it must not have any filesystem access or network access. essentially, it must be nothing more than a data processing node that takes input, performs some transformation, and produces output.
  • its output must not be any more trusted than its input. for example, you can’t count on untrusted code to sanitize your input in any way, because bugs in the code could prevent it from being sanitized properly.

So what does fit into this category? DJB’s example is the address extracting code in email software — a part of the code that takes an email header as input and spits out an email address as output. If you run this code inside an interpreter that properly sandboxes it from accessing anything other than its input, then the code qualifies as untrusted code, according to DJB. Even if you find a way to exploit a bug in the program by sending it malformed data, all you are capable of doing is changing the output email address. But you already had the means of controlling that, since you supplied the input. Ergo, the code is untrusted.

The paper claims that “we can architect computer systems to place most of the code into untrusted prisons” (emphasis mine). I find this claim hard to believe. How much real-world code can you really remove all I/O from? How much of a web application can really be written in a sandbox that doesn’t have access to your database (or whatever tier interfaces you with the database)? How much of a desktop application can run in a context where no filesystem access is allowed? How much of an operating system can run without having any access to hardware? Even more significantly, how many components in any of these systems is it acceptable to say that we don’t trust their output?

Setting that aside, is it really the case that the principle of least privilege is useless? Is it really the case that a binary “trusted/untrusted” designation is the only important distinction? A surprising conclusion of that argument is that having non-privileged (eg. non-root) user accounts offers no significant advantage over running everything as root. We all know that non-root accounts aren’t a panacea, but taking the argument this far seems absurd. An exploit that gives an attacker control over a single user account is not as bad as a full root exploit, end of story. It is damage control; it is limiting the scope of the impact.

DJB is brilliant and has security credentials far above my own, but in this case I think he is overreacting. It is true that the principle of least privilege isn’t a way to prevent security holes from happening, but that doesn’t mean there isn’t merit to having multiple layers of defense. Not all code can be written to satisfy his definition of untrusted code, but that doesn’t mean that all of the rest of the code needs to be completely trusted with root-level access. There is a middle ground.

Posted in Uncategorized| 3 Comments | 

Why ALL OS’s (except BeOS) have failed on the desktop

Posted by josh at July 25th, 2007

My most recent posts have been far too constructive and boring. A fun little rant is overdue. And Con Kolivas’s Interview “Why I quit”, which is hot on the net right now, is the perfect subject matter for doing so.

Con Kolivas is a Linux Kernel hacker who’s had it with the Linux development process. He’s seen his good ideas repeatedly get stonewalled, then assimilated into competing patches from more senior Linux developers. That reeks of kernel development stagnating into an old boys club, but that’s not really what I’m here to talk about today. I want to write about my resounding agreement with Con’s opinion of desktop computing today. From the interview:

[T]he desktop PC is crap. It’s rubbish. The experience is so bloated and slowed down in all the things that matter to us. We all own computers today that were considered supercomputers 10 years ago. 10 years ago we owned supercomputers of 20 years ago.. and so on. So why on earth is everything so slow? If they’re exponentially faster why does it take longer than ever for our computers to start, for the applications to start and so on? Sure, when they get down to the pure number crunching they’re amazing (just encode a video and be amazed). But in everything else they must be unbelievably slower than ever.

I’ve been saying this to anyone who will listen for over a year now (and all my friends are tired of it and give me the hand whenever I start). But I want to take just a moment and spell out why I think current operating systems AND user-space (GUI frameworks and such) suck so badly on the desktop.

GUI frameworks (and the applications that use them) suck

Here’s how 99% of desktop applications today work. There’s an event loop in the app’s main thread. It works something like this:

while(1) {
  event = GetNextEvent();
  if(event.type == NEED_REDRAW) do_redraw();
  else if(event.type == USER_CLICK) process_user_click();
  else if(event.type == DO_SOMETHING_SLOW) something_slow();
  ...
}

In your run-of-the-mill application, the app will happily sit there doing something_slow() in the main thread (which could involve I/O to the disk or network, or just something CPU-intensive), blissfully ignoring USER_CLICK events, NEED_REDRAW events, etc. until it finishes. It could even be something as benign-seeming as a malloc(), which in stock implementations will grab a process-wide mutex and trounce around the heap reading bookkeeping information (which, mind you, could trigger page faults and force pages to be swapped in).

And while this is happening, what do you the user get? That’s right, the spinning beach ball, which I think would be more accurately rendered as a hand flipping you the bird, because that’s the message the OS is sending you at that point. You as a user are totally shut out at that point; your only option is to wait for it to subside or to kill the process completely.

Even if the something_slow() task doesn’t take long enough to show the beach ball, it’s still a delay before the UI can be redrawn. Delays like this are the cause of stuttering, which makes a UI feel sluggish even if the UI is perfectly caught up every time it does redraw.

I know what you’re thinking. “OK, so some applications suck and aren’t multithreaded. What’s your point?” The point is that we can’t pin the on application authors — I think the fault lies squarely with the GUI frameworks. Sure, if you’re really good at threading and invest a lot of effort, it’s possible to write applications that never ever block the event loop, but does the framework make it easy or provide the tools for easily doing so? Mainly, does the framework provide a threading model that is well-supported by the framework?

Only one GUI framework that I’ve ever encountered does this: BeOS. Notably:

Every BWindow object spawns a thread (in the application’s address space) where it receives and responds to interface messages from the server. –The InterfaceKit Intro

The programmer doesn’t have to be diligent, the programmer doesn’t have to have experience with the threading library; but automatically, without even trying, the app gets a separate thread for every window that is distinct from the main event loop. That means that even if the event loop spends some time doing something expensive, the GUI will continue to redraw, resize, render button clicks, etc. just as smoothly as if nothing were happening.

That’s a big part of what made the BeOS GUI faster on late 90s hardware than Windows, Linux, OR OS X are on today’s hardware.

So problem #1 is that today’s GUI frameworks suck. They let the GUI become unresponsive far too often by failing to isolate the UI’s responsiveness from long-running work that the application should be doing in the background.

Neither OS’s nor applications have any freaking clue how to do real-time

Real-time computing is one of my favorite topics, because nothing else brings out so many self-proclaimed experts regurgitating the BS they learned when their OS class spent two days studying it. (I never took an OS class and therefore clearly cannot be such a person, so ha!) Such people don’t think a process is “real-time” unless a missed deadline will launch a nuclear missile or stop somebody’s heart. I’m not kidding: here’s an example of the sort of thing that gets modded +5 on Slashdot when real-time computing comes up. Someone asked the question “would new real-time features in Linux help audio/video playback in Linux,” and the +5 modded response is:

Desktops and most servers do not get any benefit from a RTOS. RT makes it so that the system purposefully downgrades less-useful things like user input for maximum priority things like, say, polling a fetal heart monitor every n milliseconds or responding to an automobile collision to deploy an airbag.

Please, don’t fall for people like this who have such a shallow understanding of what real-time means.

Multimedia playback is absolutely real-time: it has deadlines, and the system will malfunction if those deadlines are missed. “Hard” vs. “soft” real-time is a pretty meaningless distinction; it’s an arbitrary and uninteresting way of classifying how bad it is to miss a deadline.

Now that we have that out of the way, we can move on to how current OS’s fail to provide the tools to write reliable RT applications (like, say, a movie playback application).

A movie playback application’s real-time needs are so very simple. Say I’m the application:

  • I need to read the next chunk of movie data from disk in time to feed it to the decoder.
  • Once I have the data, I need to decode it in time to feed it to the graphics card.

Every time your movie playback stutters, it’s because the OS failed to satisfy these two simple needs. Let’s look at them one at a time.

The first need is a disk I/O deadline. The app needs to read a chunk of data (of known size, or at least known upper bound) by a certain time. In other words, the OS needs to allow the movie player to reserve a certain amount of disk bandwidth, and guarantee that it got first dibs on that bandwidth. I was about to claim that no commercially-offered operating system provides this, but I was just surprised to learn that apparently it has been added to Vista. Let me save you the suspense: Linux doesn’t have it.

The second need is a CPU deadline. The app needs a fixed amount of CPU time by a certain time, or it won’t be able to decode the video in time.

In the bad old days before Linux got PREEMPT_RT (which was not very long ago), a leading reason why this part would fail was simple scheduling latency. Say the movie player is waiting for the short window of time when it needs to blast the new frame into the framebuffer. The time comes, but…! the kernel has unfortunately acquired a lock or disabled interrupts or done something else uninterruptible. My movie player, even if it is the highest priority task in the whole flipping system, won’t run until the kernel finishes what it was doing. By that time it may be too late.

The single accomplishment of the PREEMPT_RT patch is to make the two highest scheduling priorities in the system (SCHED_FIFO and SCHED_RR) actually work as advertised: processes with these priorities will actually, truly be run as soon as they are runnable, without being held hostage by obliviously latent routines in the kernel.

But do you know what’s missing from The PREEMPT_RT overview? Any mention of deadlines. In other words, there is no facility for telling the OS what a process’s deadlines are, and therefore no intelligent way for the OS to pick between scheduling two real-time processes that are both runnable. Sorry folks, but without deadlines, it ain’t real-time.

OK, so there’s one little exception. If you only have one real-time task in the system, and that ONE real-time task is scheduled SCHED_FIFO or SCHED_RR, then it will always be run when it’s runnable, and will therefore make all its deadlines provided there is enough CPU time available for that to be possible.

But TWO real-time processes? The best you can do is given them equal time using SCHED_RR, but that will fail in many cases that a proper real-time scheduler would cover.

Also remember that a process scheduled SCHED_FIFO runs to the exclusion of absolutely everything else. You’d better hope that it doesn’t get into an infinite loop, or it will lock your entire computer. A true real-time scheduler could allow real-time tasks only a limited amount of highest-priority time per period.

So problem #2: existing operating systems don’t give programmers the tools they need to write good RT applications. As a result, it is impossible to write something as simple as a movie player that will never glitch.

No one realizes it’s all about I/O

Everyone continues to be fixated on CPU cycles as a measure of speed. But I/O is an ever-present bottleneck that no one really talks about. Think about it:

  • every cache miss is a pipeline stall. so yes, binary size does matter: 32k of L1 and 4MB of L2 don’t go very far.
  • every time you page fault on a page that isn’t in memory, your application blocks on a disk read. fat lot of good your SCHED_FIFO priority does now!
  • even for meticulously-written applications, these days you just gotta move a lot of data around! a user’s photo library or music library could be hundreds of megabytes in size, and they shouldn’t have to wait just to scroll through it all.

There’s more I could write about this subject, but this entry has gotten pretty long already. The gist of it is that few OS’s or applications recognize the integral part that I/O should play in the overall design. For example, an OS should know better than to schedule a process that is swapped out — such a process can’t do a thing until it’s swapped back in. But it’s beyond the scope of most scheduler designs to take a thing like that into account.

Scheduling of non-RT tasks sucks sucked

I won’t belabor this point, because apparently the Staircase Deadline (SD) Completely Fair Scheduler solves this problem. But seriously, was it that hard to figure out that a scheduler should aim to give equal-priority processes an equal amount of CPU time? Just sayin.

Closing remarks

In summary, I agree with Con Kolivas about how sub-par an experience the Linux desktop of today is. I really want to put my money where my mouth is and write something better (and I have some plans for doing just that), but first things first: parsing.

Posted in Uncategorized| 5 Comments | 

Wishlist for JSON

Posted by josh at June 28th, 2007

JSON is great. It’s a nice implementation of the “heterogeneous structure of hashes and arrays” pattern that I have come across over and over. Just off the top of my head:

  • Perl, Python, Ruby, Javascript, and pretty much any language like them uses them as their basic data structure
  • The Tibco Rendezvous message format.
  • The NeXTSTEP property list file format.
  • The XML-RPC data format (even though it’s encoded in XML).
  • PDF files use them to specify data
  • Other times I can’t mention because I encountered them inside companies

This pattern is seriously everywhere. This is why JSON is nice: now there’s a sort of quasi-standard encoding people can reach for when they encounter this problem.

However, there are two things I reeeaallly wish it had, that many of the formats I mentioned above do have.

  1. A raw binary type. All it’s got right now are unicode strings, which are great if you’ve got known-good unicode data. But if you want to dump data of unknown encoding, or that might be malformed unicode, or that is just a big binary blob, JSON can’t help you. You’d have to convert it to 7-bit first using an encoding like quoted-printable or Base64. This is a bummer, and as an extra bummer, the data would no longer be self-describing — you’d have to store elsewhere (or know out-of-band) that the data is encoded this way.
  2. A date type. Right now your only option is to encode a date as a string or a number. I kind of like this guy’s proposal for doing that. He proposes encoding dates like so:
    ["FOO": @2007-06-28@]

With these two features added, JSON would be a lot more useful. Yeah, it wouldn’t be JavaScript compatible any more, but these features should make it into JavaScript also! Apparently there is a chance that the date literal notation will make it into JavaScript v2…

Posted in Uncategorized| 4 Comments | 

« Previous Postings | Next Postings »