(Not) Porting Gazelle from Lua to JavaScript
Posted by josh at February 18th, 2008
Update: Since writing this entry, I opted not to actually do a port, for reasons that are discussed in the comments. However, I’m leaving the entry around to show the thought process I went through.
I’m strongly considering porting Gazelle from Lua to JavaScript. I haven’t fully decided whether I’ll do this, and to be honest I’m not terribly thrilled about the switch. I’m going to lay the case for switching out and leave it to you, my dear readers (all 3 of you) to weigh in about whether this is a good idea.
This isn’t just about Gazelle — it’s really addressing the bigger question of what language best fills the niche I’m going after. I would describe the niche as: “small, fast, flexible language for embedding.”
Sounds just like what Lua was designed for, no? Indeed, Lua has been something of a dream to use. It’s fast, flexible, easy to learn, easy to embed, tiny, and has a tiny JIT available (disabusing me of the notion that JITs can only accompany bloated monstrosities like the JVM). So why switch from Lua?
Three main reasons. The first is that, as Steve Yegge noted when he explained why he ported Rails to JavaScript, JavaScript is the most accepted language of its type inside Google, and I want to see Gazelle take on a life of its own inside Google. I want to be able to write tools using it and have at least the slightest chance that my colleagues will take it seriously. If the imperative language that accompanies Gazelle is JavaScript, there will be an element of familiarity, and hopefully it will be possible to find someone to hack on it with me. Lua is unknown in comparison.
The second reason is much like the first: JavaScript is just more familiar to programmers in general. People know it from web browsers (even if they still think that the language itself is to blame for their browser compatibility nightmares). It is based in “curly brace syntax,” which gives Java, C, and C++ programmers warm fuzzies. It’s kind of lucky that a language as totally decent as JavaScript is set to become one of the most widely used languages, at least for the next several years.
The third reason is that it gives me a strategy for making Gazelle work on the JVM. There isn’t any real Lua implementation on the JVM, and Java heads HATE anything that uses JNI to talk to the outside world (because then it isn’t “100% pure Java”). But Rhino, on the other hand, is a pretty reasonable JavaScript implementation and it is 100% pure Java. So using JavaScript gives me a good JVM portability strategy.
So why am I less than thrilled about making the switch? The main problem is that there aren’t any JavaScript implementations that are remotely as nice as the Lua interpreter. Lua is idyllic.
- the entire Lua interpreter download is 200Kb
- it compiles in 4 seconds on my desktop
- it compiles to a 200Kb executable (130Kb stripped)
- it runs “Hello, World” in 3ms (startup time is nearly 0)
- it runs nontrivial benchmarks among the fastest of the scripting languages
- it has a JIT available that adds only 32Kb of code (compiled) to the core, and compiles most functions in microseconds (very low overhead and memory footprint)
- it has a really good and well-documented embedding/extension API
I believe there is a profound benefit to making software as lightweight as it can possibly be while still accomplishing its task. Most software fails miserably by this criteria. I often think of Pascal’s quote “I would have written a shorter letter, but I did not have the time” and think how much better off we would be if software developers had the same attitude. Lua is one of those rare jewels that takes this to heart. But let’s take a look at Tamarin, which is the favored next-gen JavaScript implementation. This is from the documentation about it’s garbage collector:.
MMgc is not only a garbage collector, but a general-purpose memory manager. The Flash Player uses it for nearly all memory allocations.
Oh fantastic! Because definitely the one thing that my application doesn’t already have is a memory manager. I am so glad that embedding JavaScript into my program is going to drag in a 15,000 lines of code implementing heaps, spin locks, memory barriers, and complicated C++ macros. Couldn’t you have taken the time to write a shorter letter — err, language implementation? Does JavaScript really need that entire 15,000 line memory manager? Let’s get some perspective on what you can do in 15,000 lines:
- Lua is 14,000 lines total. That includes the lexer, parser, vm, compiler, byte code format, extension APIs, all of the standard libraries, and the garbage collector.
- Gazelle is currently 5000 lines total. That includes the parser, LL lookahead calculation, NFA construction, NFA to DFA conversion, DFA minimization, code to read and write byte code in Bitcode format, and code to do the actual parsing.
If you add the code in Tamarin core, you’re now up to 70,000 lines of code. Add in the regular expression library and you’re up to 95,000. Sure, JavaScript as a language isn’t as minimal as Lua, but is it really 6 times more difficult to implement?
I know that Tamarin is a gift from Adobe, and that I shouldn’t kick a gift horse in the mouth. Their software from which they have taken Tamarin is probably happy to use MMgc everywhere. I just wish that our profession gave greater value to the virtue of brevity.
You might write me off as a sad, strange man to care so much about this, but are you so willing to write off Steve Yegge? Code’s worst enemy is size, he says. So there! Proof by Steve Yegge.
Back to JavaScript and Lua. I think JavaScript is a fine language for my porpoises. I just wish there was an implementation of it that was as good as Lua. Size, speed, flexibility, choose any three. No? But Lua does!
I’ve looked over all of the JavaScript implementations listed on Wikipedia. Spidermonkey is small-ish and flexible, but slow. Of the other implementations I like NJS the best (but it looks unmaintained), and SEE looks ok too. But none of these is nearly as nice and small as Lua. It will be hard to let it go.
So this is a tough question and I tried to give it a little mulling before I
chimed in. Take my advice with more than the usual salt.
I fall into the pro-lua camp, and I have two arguments neither of which has much
to do with technology. Both have to do with social issues, and as Paul will
tell you I am absolute social political genius which everyone should desperately
try to emulate. Also, I am graduate student in a school involving
human-computer interaction, where we learn all sorts of secret interaction
techniques that I can’t tell you about. Woooooo!
1. Seems to me that there are very few good reasons to take a technology that
you know works and you actually really like and replace it with a technology
that may turn out to be neither of those things. Considering that the payoff
we’re looking for here is your personal enjoyment in a lot of ways, that’s a
risk.
Of course, if this gets really popular you might be remembered forever
just like L.R. Grammer, Ichabald Von Regex, or many other of those famous parser
guys we would have all learned about if we had been paying any attention in our
compliers class. And if you had somebody who had came to you and said “your
thing is really cool and I can think of 100 specific ways to use it tomorrow if only it
was javascript”, I think I’d be on the other side. But just your intuition that
people might not be willing to go for a non-curly-brace language - I’m not
convinced it will be a bigger win than adding cool sexy features (which will
feel better in the neat language you already like).
2. Speaking of cool sexy features, it seems maybe a bit earily to be rewriting
from scratch. Let this current path get some play first, figure out what
features you really need and what features you don’t, let the source code get a
little uglier. However much you think you know now, after some real
applications you’ll know more and you’ll have brilliant new ideas. Then you can
write a new even better parser thing from the ground up.
PS. I dig the Outspoken blog’s fancy new do!
Buffalo
Thanks for the input Buffalo. I think you’re right — for the moment at least I’m going to stick with Lua. Its technical strengths are just so much better suited to what I’m trying to do. And you’re also right to say that a rewrite is the last thing I need right now.
But let my source code get uglier?? Never!!
I do wish I had a better strategy for the JVM though. There are just so many people and organizations that are totally bought into it, and won’t look twice at something that doesn’t run on it. How am I going to achieve world domination if I am not also dominating the JVM?
josh
I’m going to have to agree with Buffalo on this one.
Since you wrote a lot of code using Lua, I have gotten interested. After I read that compiler book you suggested (just got it off Amazon a week or so ago), then I hope to jump into learning Lua. If Lua is all that you make it out to be, then it could be a very useful tool in the tool box. BTW, have you looked into Tcl? It seems like Tcl and Lua have a very similar “target audience”.
Personally, I don’t see what is so attractive about running on the JVM. There is already a major competitor to the JVM which might “win” at some point in time: CLR. I mean, the worlds largest software maker is strongly pushing the CLR. We all know that at some point in the future 95% of the computers “out there” will have the CLR running on it (especially if Silverlight takes off). There is even an open-source implementation of the CLR: Mono. Disclaimer: At work I have zero hosts running Windows, and at home, one of my 3 computers is Windows.
BTW, I think there are a couple of reasons Java folks refuse to let JNI code touch there VM:
“platform” dependency, meaning that you can’t just ship some *.jar files and call it good. In the end, I think this is really a problem about deployment.
Fear of the SEGV (and other non-Java’isms)!
Personally, I think you have done a nice job of splitting Gazelle into two pieces: The compiler generator and the interpreter. If you can keep the number of lines necessary to implement the interpreter to a minimum, and make those lines as portable as possible… perhaps even having multiple implementations (one in Java, one in C#, one in C), then you will “win” everyone over.
People don’t care what language their build tools are implemented in, they only care about the “bits” that need to be shipped.
P.S. I like the new look on the blog :-).
Brian Maher