OS’s that have better things to do than listen to you
All OS’s suck. I’ll be ranting about this often, largely because it’s so easy. (What, you didn’t think I was writing my own OS because I thought all my other options were hunky dory, did you?)
So as I was opening my web browser to configure my new blog, my web browser inexplicably showed me a beach ball for about 20 seconds. I find the beach ball really offensive; it’s like my computer is flipping me off. I’m saying “dammit, I REALLY want to do X,” and my computer is saying in reply “I’m deaf to you.” So a few seconds into this wait I opened a different web browser, and it got caught in beach ball land too. Maddening! Now mind you, I’m on an almost-new MacBook Pro, but even if that weren’t the case this would be totally unacceptable.
Users are often abused by their computers, and it is an unfortunate pattern that the abused start making excuses for their abusers. If you ask your everyday computer user why they’re sitting around waiting for a beach ball to turn back into a cursor, they will say that their computer is “busy” or “thinking.”
This rationale is amazing when you think about it. Computers are the most loyal servants you can possibly have. When you command a computer through programming, it does exactly what you say, without tiring or ceasing, every second of every day of its natural life.
Now imagine that you had a human servant, and you wanted him to do something for you, but he replied “I’m busy doing something else — I’ll tell you when I’m ready.” Nonsense! If you are the master, you do not wait for your servant.
I remember what a revelation it was when I first read the thread scheduler in the Ruby interpreter. Since Ruby’s threads are “green” threads, not OS threads, the Ruby interpreter itself contains the code to switch between Ruby threads. Whenever it’s time for a new thread to run, the scheduler picks what thread deserves to run next. Threads that aren’t runnable have a field describing what they’re waiting for — it could be a specific time in the future or some kind of I/O.
What amazed me is realizing that at every turn, the software gets to decide what to do next! Let me contrast this with my previous understanding. Several years ago, I was a fairly active participant on the linux-audio-dev mailing list, where Linux audio developers help each other out and collaborate on shared projects. One of the most important challenges of running an audio system is having glitch-less performance. You might think “easy — when was the last time my MP3 player skipped?” But things get more difficult when you want to make interactive, responsive audio applications — you want the lowest possible sustainable latency, so that if you’re tweaking knobs on some synth in real-time, you hear the results right away.
To the linux-audio-dev crowd, the holy grail was finding a combination of hardware (sound card, CPU), kernel (properly patched and configured), and appropriate user space configuration to allow glitch-less audio performance with reasonably low latency. There were a whole series of kernel patches by Ingo Molnar that were supposed to help, and conventional wisdom about how to achieve this elusive low-latency. But every time this goal seemed achievable, someone would post to the list that they followed ALL the recommendations including the mysterious incantation, and they were STILL seeing glitches. And then someone would say something about a driver that was holding a lock, blah blah blah, and to my ears it sounded like the problem was just that we were asking computers to do something they could not reasonably do.
Rubbish! Once I read the Ruby thread scheduler, a truth hit me that was simultaneously horrible and wonderful. The horrible news is that the whole time, it was Linux that was screwing us! The wonderful news is that OS’s can do better.
It comes down to this. Every cycle of every minute that a CPU runs, it decides what to do. A CPU is never too busy for you. It is never indisposed with tasks that are much more important than you. It is ALWAYS working wholly and completely for you, and no one else!
OK, this is only true if you are the author of all software that is running on the CPU. But as author of the OS, you can organize the world in such a way that the wishes of users and applications are unconditionally respected! If a system owner / user sets the priority of an audio thread to the max, it truly is possible to guarantee that it will run right away (modulo scheduling overhead). None of this “oh but I’m so busy!” or “sorry, I’m not actually listening to you, I’ve disabled interrupts!”
This world is possible. The incumbants are making special-case fixes, like Vista’s Multimedia Class Scheduler Service, or extremely complex fixes like the PREEMPT_RT patchset for Linux (I totally agree with this guy’s assessment of that approach). But doing it right from the ground up can be small and beautiful. That is the world I want to help create.
I thought the problem was that the user often wants the computer to do two or more things at the same time, but computers can only do one thing at once. (Okay, one thing per processor, smarty-pants.) If you do something simplistic like always choose the highest-priority thread, then you risk starving all other threads. So if you want to have real-time audio processing and a responsive UI and stream the results over a network link, you may have to make compromises.
And what does any of this have to do with the beachball in your web browser? Is that really the result of the computer doing something other than what you asked it? The computer wasn’t unresponsive – you were able to launch another program while waiting. It sounds like the original browser was doing what you want but taking too long. The beachball is feedback to let you know your request is still in progress. What *should* the user experience be in this case?
Yes, your second point is totally valid, and is the one thing I wanted to clear up before I took this entry live. In this entry I sloppily equate several screwing-the-user scenarios. I’ll revise the entry soon, but the gist is: in each of these cases the CPU still always has the option of doing something useful (listening to ME, the user!). The obstinance might be in the OS (refuses to schedule something I want) or in the application (refuses to service my UI because it’s waiting for I/O or doing something CPU-intensive). The user experience should be: the application always gives me responsive controls, even if the only control available to me is “cancel — what you’re busy doing is not worth waiting for.”
To your first point: there are ways to mitigate this. You might configure a system policy where your real-time thread gets absolute first priority, BUT only for 20ms every 40ms. The goal of the OS should be to allow the system owner/administrator to allocate resources as precisely as necessary, and to respect those allocations as fully as absolutely possible. If you can write an audio thread that promises to only use 20ms every 40ms, then you can give that thread the guarantees it needs without starving the rest of the system.
Note to self: When Josh is a millionaire and looking for help, do not consider a career change and become one of his “human servants.”
Me and Josh, for people who actually seem to like each other pretty well, are often so much at odds with one another from a technical perspective it almost stops making sense.
The things that seem to stress Josh about about technical things never really seem to be particular problematic to me, and this is especially true of waiting for stuff. It seems crazy to me. You know how when you’re at line at the bank, and you’ve been there for *for-ev-er* and then you notice two perfectly good tellers hanging in the break room talking about last night’s American Idol episode? My opinion is that if you let that annoy you you’re gonna spend a lot of time angry in banks. Talking about American Idol is part of bank teller’s jobs too, you know?
What I really don’t want to do is have my OS allow me to allocate resources as precisely as possible and then respect these allocations. I don’t even know how much RAM any of the 3 computers in my house have.
Next time you’re sitting there being annoyed by how slow your computer is being, do what I do: grab a magazine from your nearby stash (I personally like hardcore porn but it can be about efficient realtime OSes if you really want…whatever gets you off) and “browse” for about 20 seconds.
Wize and insightful argumenters may cleverly notice that my last point has no technical points whatsoever. Nor will this one incidently.
For the first post in Josh’s new blog I figured I ought to lay it all on the line. My essentially disagreement with Josh isn’t technical so much as philisophical. Now I may from time to time try to pretend like I’m moving at this problem from a unbiased technical perspective, but I thought I’d lay things on the line from the outset. I can pretend to be objective later.
What do you do when the porn you want to look at is on the computer that is being slow. Wouldn’t you rather the computer display the pron, so you can look at it while it is being slow?
Josh, I’m sorry but you are just so totally wrong.
When a computer is working, at least when it is dealing with handling real-time streaming of continous data flows, it is trying to satisfy *two* masters: the user and the hardware.
the hardware places demands on the CPU by interrupting it. when it does, the OS has to decide what to do. interrupts can be blocked, handled by just noting that they happened, or fully serviced.
if your Very Very Important Super-Dooper RealTime Application is in the middle of its work when an interrupt arrives, what should the CPU do? Should it waste cycles servicing it? Should it block interrupts? Or something else? It appears after a little thought that the correct answer depends on what the interrupt was from. If your VVISDRT application was processing audio, and the interrupt was from the audio interface, you should probably handle it, because it probably implies that you’ve missed some deadline. Or might. But if it was from the keyboard controller, you can and probably defer doing much more than noting that it executed.
Ingo’s latest RT patches basically allow you to configure even interrupt handlers themselves to have scheduling priority at the same level as application threads.
There is much more here that you are wrong about, but alas I don’t have the time to comment further – I have to get back to writing real time audio applications, which amazingly enough, actually work.
Good to hear from you Paul. I respect your opinion, but I think you’re a bit premature in calling me totally wrong. You also seem to be unfamiliar with L4, particularly its model for dealing with interrupts.
Interrupts are not demands, they are notifications. Sure, you might starve hardware by not listening to these notifications, but they are nothing more than tools for getting information about external stimuli as soon as possible.
IRQ lines in L4 are modeled as processes, and interrupts are modeled as IPCs from those processes. The priority of the drivers that subscribe to these IRQ lines determines whether they preempt the currently running thread when an interrupt comes in. So in that sense, L4 already provides what Ingo’s patches do.
The specifics of how the system should behave when an interrupt comes in are more complicated than belong in a blog comment. But the point I want to leave here is that my plans are to give user-space programmers as many tools as possible for designing robust systems, and for the OS never to be the bottleneck. Nothing you’ve written gives me reason to believe I cannot do that, or that I fundamentally misunderstand the problem space. I’ve been thinking about this a lot, and I think my ideas have merit.
you seem unaware of how the RT patch works these days too.
it also models IRQ lines as processes (well, strictly, as kernel threads). the thread doesn’t get run just because the IRQ line was raised, its scheduled along with everything else, just as you describe for L4.
you’re also slightly wrong that interrupts are not demands. from many devices, thats true. for audio interfaces, its not true. the interrupt is a *demand* for more data. you don’t meet the demand, the device “malfunctions” from a user perspective
you might not like that linux started out as so-non-RT and requires a substantial patch to make it so. but … more and more of Ingo’s work is now in the mainstream kernel, and his work does actually create a system where, given correct IRQ thread priorities and no drivers with absurd code in their interrupt handlers (yes, its prioritized but it can still do wierd, illegal stuff), you can get hard realtime from 2.6.
starting a new OS based on L4 to accomplish this seems basically nothing more than a self-learning experience. an RT OS is useless without drivers and the driver situation on linux is bad enough already. it is guaranteed to be at least as bad on anything you eventually cook up.
I wasn’t saying that the RT patches do not do this, or that you can’t get hard realtime from 2.6 with RT patches. My indictment against Linux with RT patches is only that it’s too complicated and monolithic.
These negative characteristics bleed into other areas like security and modularity — supporting RT is not the only goal of this OS. Basically I think that innovations in the OS space are hurting from the fact that no one can justify investing the resources to wipe the slate clean and start over. Companies that did (BeOS) came up with some really good stuff, but the applications problem was too high a barrier and they eventually shriveled and died.
Yes, the driver problem is obviously huge. All I can hope is that if I deliver something compelling enough, it will drive people to invest the work of porting ALSA drivers or something. Or if there’s an embedded device that wanted to use the OS and will have to write drivers for their new hardware *anyway*, they can write drivers that work with my OS from the get-go.
This is obviously very hand-wavey. I’m working on this on my own time, and my approach/motivations are somewhat of a mix between academic and commercial. I want to really break new ground by integrating a lot of good ideas that are floating around in academia, but I also want it to be a usable system. Whether I can convince anyone *else* to use it is totally unknown, but I have to try.
Sounds like an interesting project – would be nice to see another scalable OS which will work well old on hardware and work superb on new hardware.