(Not) Porting Gazelle from Lua to JavaScript

Posted by josh at February 18th, 2008

Update: Since writing this entry, I opted not to actually do a port, for reasons that are discussed in the comments. However, I’m leaving the entry around to show the thought process I went through.

I’m strongly considering porting Gazelle from Lua to JavaScript. I haven’t fully decided whether I’ll do this, and to be honest I’m not terribly thrilled about the switch. I’m going to lay the case for switching out and leave it to you, my dear readers (all 3 of you) to weigh in about whether this is a good idea.

This isn’t just about Gazelle — it’s really addressing the bigger question of what language best fills the niche I’m going after. I would describe the niche as: “small, fast, flexible language for embedding.”

Sounds just like what Lua was designed for, no? Indeed, Lua has been something of a dream to use. It’s fast, flexible, easy to learn, easy to embed, tiny, and has a tiny JIT available (disabusing me of the notion that JITs can only accompany bloated monstrosities like the JVM). So why switch from Lua?

Three main reasons. The first is that, as Steve Yegge noted when he explained why he ported Rails to JavaScript, JavaScript is the most accepted language of its type inside Google, and I want to see Gazelle take on a life of its own inside Google. I want to be able to write tools using it and have at least the slightest chance that my colleagues will take it seriously. If the imperative language that accompanies Gazelle is JavaScript, there will be an element of familiarity, and hopefully it will be possible to find someone to hack on it with me. Lua is unknown in comparison.

The second reason is much like the first: JavaScript is just more familiar to programmers in general. People know it from web browsers (even if they still think that the language itself is to blame for their browser compatibility nightmares). It is based in “curly brace syntax,” which gives Java, C, and C++ programmers warm fuzzies. It’s kind of lucky that a language as totally decent as JavaScript is set to become one of the most widely used languages, at least for the next several years.

The third reason is that it gives me a strategy for making Gazelle work on the JVM. There isn’t any real Lua implementation on the JVM, and Java heads HATE anything that uses JNI to talk to the outside world (because then it isn’t “100% pure Java”). But Rhino, on the other hand, is a pretty reasonable JavaScript implementation and it is 100% pure Java. So using JavaScript gives me a good JVM portability strategy.

So why am I less than thrilled about making the switch? The main problem is that there aren’t any JavaScript implementations that are remotely as nice as the Lua interpreter. Lua is idyllic.

  • the entire Lua interpreter download is 200Kb
  • it compiles in 4 seconds on my desktop
  • it compiles to a 200Kb executable (130Kb stripped)
  • it runs “Hello, World” in 3ms (startup time is nearly 0)
  • it runs nontrivial benchmarks among the fastest of the scripting languages
  • it has a JIT available that adds only 32Kb of code (compiled) to the core, and compiles most functions in microseconds (very low overhead and memory footprint)
  • it has a really good and well-documented embedding/extension API

I believe there is a profound benefit to making software as lightweight as it can possibly be while still accomplishing its task. Most software fails miserably by this criteria. I often think of Pascal’s quote “I would have written a shorter letter, but I did not have the time” and think how much better off we would be if software developers had the same attitude. Lua is one of those rare jewels that takes this to heart. But let’s take a look at Tamarin, which is the favored next-gen JavaScript implementation. This is from the documentation about it’s garbage collector:.

MMgc is not only a garbage collector, but a general-purpose memory manager. The Flash Player uses it for nearly all memory allocations.

Oh fantastic! Because definitely the one thing that my application doesn’t already have is a memory manager. I am so glad that embedding JavaScript into my program is going to drag in a 15,000 lines of code implementing heaps, spin locks, memory barriers, and complicated C++ macros. Couldn’t you have taken the time to write a shorter letter — err, language implementation? Does JavaScript really need that entire 15,000 line memory manager? Let’s get some perspective on what you can do in 15,000 lines:

  • Lua is 14,000 lines total. That includes the lexer, parser, vm, compiler, byte code format, extension APIs, all of the standard libraries, and the garbage collector.
  • Gazelle is currently 5000 lines total. That includes the parser, LL lookahead calculation, NFA construction, NFA to DFA conversion, DFA minimization, code to read and write byte code in Bitcode format, and code to do the actual parsing.

If you add the code in Tamarin core, you’re now up to 70,000 lines of code. Add in the regular expression library and you’re up to 95,000. Sure, JavaScript as a language isn’t as minimal as Lua, but is it really 6 times more difficult to implement?

I know that Tamarin is a gift from Adobe, and that I shouldn’t kick a gift horse in the mouth. Their software from which they have taken Tamarin is probably happy to use MMgc everywhere. I just wish that our profession gave greater value to the virtue of brevity.

You might write me off as a sad, strange man to care so much about this, but are you so willing to write off Steve Yegge? Code’s worst enemy is size, he says. So there! Proof by Steve Yegge.

Back to JavaScript and Lua. I think JavaScript is a fine language for my porpoises. I just wish there was an implementation of it that was as good as Lua. Size, speed, flexibility, choose any three. No? But Lua does!

I’ve looked over all of the JavaScript implementations listed on Wikipedia. Spidermonkey is small-ish and flexible, but slow. Of the other implementations I like NJS the best (but it looks unmaintained), and SEE looks ok too. But none of these is nearly as nice and small as Lua. It will be hard to let it go.

Posted in Gazelle| 3 Comments | 

More thoughts on good programmers

Posted by josh at February 12th, 2008

I’m finally responding to Buffalo’s perspective on my last post about “Brilliant programmers.” I didn’t say anything at first because I couldn’t think of anything insightful to say. I still can’t, so I’m going to have to just make shit up.

Buffalo approaches the question from an educator’s perspective:

The real million dollar question in my mind, Josh, is what are these super-programmers doing that others are not? Even if there’s just something irreproducible in their genes - shouldn’t this be the sort of thing we could detect somehow? Or if there are strategies involved, could we use them to make the 90 percentile 5x better than everybody else?

[…]

Which of course brings me around to my research - say you had a group of the 50 cs students who enter an average program around the country. And say your goal was to get absolutely as many of them as possible into that 5% and screw everything else. What would you say to them? What would you do with them?

I guess it’s depressing to put this way, but my true belief is that programmers are born, not made. So I think the absolute number one thing a CS education can do for its students is help them understand whether they were born programmers or not.

This isn’t a binary thing, of course, and I don’t even think it’s single-dimensional. Everyone has areas of strength and weakness. And it’s a big world out there — succeeding at Amazon or Microsoft or Google is different than succeeding in the consulting world or at a startup or the technology department of an unrelated industry. I know at least one person who is a decent programmer but is doing quite well by combining good business understanding and fantastic people skills with his competent but not outstanding technical background. There are a lot of paths for people to take.

Then there is the question of what productivity is. In a corporate setting, productivity is making progress toward keeping your customer happy, whoever your customer may be and however bizarre their wants. Well I doubt that writing a self-compiling C compiler or figuring out how to calculate pi was at the top of any customer’s wishlist. Fabrice Bellard, for all his badassery, could end up being fairly unremarkable when you put him in a cube and tell him to write web applications. Or he might find it excruciating and spend the whole time inventing a framework that lets him write web apps the way he thinks. After all, that’s what Paul Graham did.

So disclaim, disclaim, disclaim. My point so far is just to highlight what is probably already obvious: that the landscape of talent is complex and multidimensional.

That said, I think it’s really key that students understand their strengths and weaknesses as early as possible. It will help them decide whether CS is right for them, and if it is what direction they should go within CS. And the best way to achieve this understanding is to expose students to lots of different things. As a bonus, this is also exactly what the overachievers need and want too; more problems to feast their minds on. So if I could sum it up in a word, I think the number one thing that university can do is expose young programmers to lots of topics and give them feedback about how they do in each one.

As to how we can help budding programmers be in the super-productive elite? I’m somewhat hesitant to answer that question, because I feel the answer reveals what the answerer thinks makes him/her a member of this elite class. But my biased answer is that the #1 most valuable trait a programmer can have is resourcefulness. It’s knowing what’s out there — what tools, strategies, file formats, blogs, or best practices — and knowing how they apply to the problem in front of you. If some programmers truly are outperforming others by 20x, it’s not because they’re typing 20x faster — it’s because they have such a solid understanding of the problem’s context that they have laser-like focus on the shortest possible path between them and their goal.

On the flip side, I think the worst thing that can happen for programmers is for them to get caught in their own little worlds where they only know one way of dealing with problems. Java-only programmers are the easiest to pick on example of this vice, but not the only one. You can probably get by only using/liking Java if you work at an all-Java shop where you are a cog in a machine, but you’ll never have any perspective of the big picture. I think we all fall prey to getting comfortable with our favorite languages/tools and thinking of everything through those lenses, but the more we can avoid that, the better we do IMO.

So that’s my answer: show students what’s out there, and encourage them to be resourceful. If there is any trace of programmer in them, it will come to life and demand that you feed it more. Anyone who doesn’t self-motivate at that point is not destined to be a programmer. They might not have the talent, or they might have the talent but not the motivation; either way, they probably should find something else instead.

Posted in Uncategorized| 1 Comment | 

Brilliant programmers

Posted by josh at February 2nd, 2008

Last November Bruce Eckel (the “Thinking in $FOO” guy) gave a speech claiming that 5% of programmers are 20x more productive than the other 95%. Some people flat out denied that was possible. I think it’s a mistake to think of “20x more productive” as “well, Billy wrote class Foo in 3 hours so I guess a superstar programmer would have written the same class in 9 minutes.” It’s not like that at all.

To me, what makes that 5% so bad-ass is that they write code that the mediocre programmer will never write, not if you let them work until the heat death of all the stars in the sky.

I want to take a moment and recognize a few of the people I consider to be the most bad-ass programmers around. This will necessarily be people whose bad-ass work is open source.

  • Fabrice Bellard. This guy is an animal. He wrote QEMU, an hardware emulator that achieves near-native performance by using dynamic translation (translating code from one instruction set to another on-the-fly). He’s won the obfuscated C code contest twice, once with a C compiler capable of compiling itself (in less than 4k, according to the rules of the contest). He later adapted this compiler to a boot disk in which the boot loader loads and compiles the entire Linux kernel (in approximately 15 seconds on modern hardware) and proceeds to boot from it. Also, though its not exactly computer programming, in 1997 he discovered the fastest known algorithm for computing an arbitrary digit of pi.
  • Julian Seward. He wrote Valgrind, which started as the best memory debugger money can buy, and has grown into far more. He also wrote bzip2, as well as GHC, the most popular native code compiler for Haskell.
  • Mike Pall. You want this guy working on your programming language. He created LuaJIT, one of the best and lightweight JITs available for any dynamic language today. And his plans for where he’s going to take LuaJIT in the next year are going to put Lua so far ahead of other dynamic language’s implementations that it hurts.
  • Linus Torvalds: this one is almost too obvious to include, but I have to mention it because with Git he showed that Linux wasn’t just a fluke. He didn’t just get lucky by being in the right place at the right time. This guy’s the real deal.
  • anyone who has ever won the Obfuscated C Code Contest. Maybe not the must useful code ever, but what these people manage to pull off in less than 4k of code impresses me beyond measure. Special mention to Brian Westley, who has won it nine times! My favorite entry: A letter from Charlie to Charlotte.

Guys like this singlehandedly turn out code that 99.9% of programmers will simply never write. More time won’t help, and putting them into teams of people all working on the problem certainly won’t help. There’s just a very small subset of people who will ever create such brilliant solutions to such hard problems, and I have nothing but respect and awe for people of this caliber.

Posted in Uncategorized| 3 Comments | 

Why Gazelle Matters

Posted by josh at January 7th, 2008

I’m not sure if I’ve managed to convince very many people that my parsing framework Gazelle is a big deal. Most people don’t think of parsing as something they do a lot of. A few quick and dirty regular expressions here and there is what most programmers get by on. “Oh, I need to validate an email address? Hrm, that’s something like /[\w\.\-]*@[\w\.]*/ right? Oh wait, the username can have escaped @-signs in it? Ah hell, my regex is close enough.

But forget about all that for a moment. Let’s forget about the crazy productivity gains you could get from just being able to pull a standard email address parsing module that works from any language. Let’s forget about all that and just talk about performance.

I want to talk about Mongrel for a second. Mongrel has been the star of the Ruby Webserver world for a long time because it vastly outperforms the painfully slow pure-Ruby server “Webrick” that ships with Ruby. Mongrel has become something of a soap opera; I won’t go into it, but while that escapade spiraled into a wild frenzy of name calling, dramatic exits, and infighting, another little webserver called Thin came up out of nowhere and poised itself to do Mongrel better than Mongrel.

I haven’t tried Thin myself or verified any of their claims, but they claim to be even faster than Mongrel. How? Well according to them, it is by using “the Mongrel parser, the root of Mongrel speed and security.”

Did you catch that?

Amidst all this drama, time, and effort, the core of Mongrel technology is a fast parser. Mongrel’s technical edge boils down to this, which is a description of the HTTP language written with the regular language parsing tool Ragel.

Imagine if parsers this fast and this powerful (more powerful actually, since Ragel can’t handle context-free grammars) were available directly from Ruby. You would be able to get the performance of Mongrel or Thin without writing any C code at all. Using the Ragel parser, on the other hand, requires writing a custom extension for Ruby, because Ragel generates plain C code.

What’s better, you wouldn’t have to write the grammar file yourself, because chances are you won’t be the first person who wants to parse HTTP.

That is why Gazelle matters. Am I making any sense?

Posted in Gazelle| 5 Comments | 

Gazelle 0.1 released!

Posted by josh at December 16th, 2007

What is “Gazelle,” you might ask? Great question! It’s the name I finally chose for my parsing framework, and I’m really happy with it. It’s in the tradition of Yacc, Bison, ANTLR, Elkhound, etc, but evokes images of fast and graceful. Booyah.

It turns out there’s already a Gazelle Movie Editor, but I contacted the author and we’re cool. The two projects target really different audiences.

Anyway, Gazelle 0.1 is released! Read about it on the Gazelle web page and in the README.

I used Gazelle 0.1 to create a version of recs-collate that is 10-20x faster than the Perl version! My version is here.

So this is all very exciting. It took a little longer than the 1 month I predicted on July 10, but hey, better late than never.

What’s really exciting though is that this book just came in the mail:

This book is awesome. It strikes the perfect balance of “I’ve read 1700 academic papers on the subject and I will summarize them for you” and “I care about things that are actually practical.” I mean to mine this book for the algorithm or algorithms that will best suit my goals for Gazelle.

Posted in Gazelle| 3 Comments | 

« Previous Postings | Next Postings »