Python threading blues

Posted by josh at March 20th, 2008

Some Python fan please tell me that I’m missing something.

Is this really the boilerplate necessary for creating even the simplest thread in Python?

import threading

class MyThread(threading.Thread):
  def __init__(self, arg, **kwargs):
    threading.Thread.__init__(self, **kwargs)
    self.arg = arg

  def run(self):
    print "I’m running in a thread, with arg %d!" % (self.arg)

thread = MyThread(5)
thread.start()
 

This is making me miss Ruby, for which the equivalent is:

thread = Thread.new(5) { |arg|
  puts "I’m running in a thread, with arg #{arg}!"
}
 

P.S. Gazelle 0.2 is making a lot of progress, but unfortunately won’t hit the 1 month mark I hoped for. Surprise surprise. But when it does come, it’s going to be awesome.

Posted in Uncategorized| 3 Comments | 

What’s the best way to visualize a parse tree?

Posted by josh at February 26th, 2008

I’m asking this question, not answering it!

While you’re sitting tight waiting for Gazelle 0.2, I have a challenge I’m putting to my readers. I want my program gzlparse to parse some input text and output the parse tree in some useful format. What is the most useful format for visualizing a parse tree?

I want both a good text-based format and a good graphical format, if possible. For text formats there’s:

  • XML (ugh. I didn’t want to say it, but I knew someone else would if I didn’t first. I’ll probably support it, but I’ll put ambivalent emoticons in the source code).
  • S-expressions. Maybe I’ll win over some LISPers.
  • ??

A good useful text format would be nice, but a good graphical format could be groundbreaking. I could always draw it as a tree, but I’m wondering if there isn’t something better. Something that keeps the text in its original format, but uses color or borders or something like that to represent the parse tree structure.

Here’s an example of the kind of visualization I think is really great and innovative. It’s the way that Lurker displays an email thread:

lurker

What’s so brilliant about this view is that it shows you both time-order of the messages and complete threading information in an attractive way. Of course, this is simpler than a parse tree, and such a nice view of a parse tree might not be possible. But what I’d really love to see is a parse tree visualization that:

  • kept the original text recognizable (viewing it purely as a tree throws away the original text formatting completely)
  • shows the parse tree structure somehow
  • major bonus: can be rendered in a browser using a DOM. like, would allow me to write JavaScript to create this DOM inside the browser.

If this were possible, then I could write a web-based syntax analyzing text box that parsed your text as you typed it and showed you beautiful graphical representations of the parse. Something like this extremely awesome interactive regex visualizer, but for full context-free grammars. Or something like what ANTLRWorks provides, but on the web.

That would be SO HOT.

Posted in Gazelle| 5 Comments | 

Setting the sights for Gazelle 0.2

Posted by josh at February 26th, 2008

I’m really excited about the interest the Gazelle manual has generated! Thanks for checking it out, and for your feedback.

I got a little scared when people said they were going to start trying Gazelle out now, because it immediately made me think of how many things I’ve been meaning to fix before I unleashed it on anybody else! But on the other hand, it gives me all the more motivation to get it to a point where other people can try it out. And I’m never more motivated or get as much done as when I know people are waiting for me!

So here’s my line for the moment. Don’t try out Gazelle just yet. There are too many things for me to fix at the moment that I know are broken. But I want to fix those things ASAP and get Gazelle 0.2 out the door, so I can finally have a release that I can recommend people try out.

When will Gazelle 0.2 come? I’m hoping no more than a month. Here’s the target feature set:

  • complete Strong-LL(k) lookahead support. I have the code to generate Strong-LL(k) lookahead, I just need to support this at the bytecode and runtime stage.
  • a command-line compiler program (gzlc) that takes reasonable options and is simple enough to use by reading its --help
  • a “tour” section for the manual
  • a command-line program (gzlparse) that can output the parse tree in a useful format, so you can see how Gazelle parses your input text.
  • a test suite, so that when people report bugs I can add the bugs to the test suite and not regress. this will be important for keeping my sanity.
  • (stretch): make Gazelle self-hosting, so that the parser is more robust and easier to understand than the hand-written recursive descent parser I’m currently using. I don’t want people to have to deal with corner-case parser bugs.

Posted in Gazelle| No Comments | 

Gazelle Manual

Posted by josh at February 25th, 2008

Check out the Gazelle manual that I just put online. I’ve put a ton of work into it, and it’s surprisingly substantial for a project at this stage. I invested the work now because I want something I can point people to that demonstrates my plan for Gazelle, even if the implementation isn’t there yet. For people who are veterans in the parsing field, I want to be able to point them here when they ask the question “so what kind of algorithm are you using?” For people who are skeptical that I can really improve on existing tools, I want to point them here to demonstrate my concrete plans for doing so.

Think of it as “the anti-fluff piece.” And as a bonus, when Gazelle is actually ready for general consumption, it will have a great manual ready-to-go.

Posted in Gazelle| 6 Comments | 

(Not) Porting Gazelle from Lua to JavaScript

Posted by josh at February 18th, 2008

Update: Since writing this entry, I opted not to actually do a port, for reasons that are discussed in the comments. However, I’m leaving the entry around to show the thought process I went through.

I’m strongly considering porting Gazelle from Lua to JavaScript. I haven’t fully decided whether I’ll do this, and to be honest I’m not terribly thrilled about the switch. I’m going to lay the case for switching out and leave it to you, my dear readers (all 3 of you) to weigh in about whether this is a good idea.

This isn’t just about Gazelle — it’s really addressing the bigger question of what language best fills the niche I’m going after. I would describe the niche as: “small, fast, flexible language for embedding.”

Sounds just like what Lua was designed for, no? Indeed, Lua has been something of a dream to use. It’s fast, flexible, easy to learn, easy to embed, tiny, and has a tiny JIT available (disabusing me of the notion that JITs can only accompany bloated monstrosities like the JVM). So why switch from Lua?

Three main reasons. The first is that, as Steve Yegge noted when he explained why he ported Rails to JavaScript, JavaScript is the most accepted language of its type inside Google, and I want to see Gazelle take on a life of its own inside Google. I want to be able to write tools using it and have at least the slightest chance that my colleagues will take it seriously. If the imperative language that accompanies Gazelle is JavaScript, there will be an element of familiarity, and hopefully it will be possible to find someone to hack on it with me. Lua is unknown in comparison.

The second reason is much like the first: JavaScript is just more familiar to programmers in general. People know it from web browsers (even if they still think that the language itself is to blame for their browser compatibility nightmares). It is based in “curly brace syntax,” which gives Java, C, and C++ programmers warm fuzzies. It’s kind of lucky that a language as totally decent as JavaScript is set to become one of the most widely used languages, at least for the next several years.

The third reason is that it gives me a strategy for making Gazelle work on the JVM. There isn’t any real Lua implementation on the JVM, and Java heads HATE anything that uses JNI to talk to the outside world (because then it isn’t “100% pure Java”). But Rhino, on the other hand, is a pretty reasonable JavaScript implementation and it is 100% pure Java. So using JavaScript gives me a good JVM portability strategy.

So why am I less than thrilled about making the switch? The main problem is that there aren’t any JavaScript implementations that are remotely as nice as the Lua interpreter. Lua is idyllic.

  • the entire Lua interpreter download is 200Kb
  • it compiles in 4 seconds on my desktop
  • it compiles to a 200Kb executable (130Kb stripped)
  • it runs “Hello, World” in 3ms (startup time is nearly 0)
  • it runs nontrivial benchmarks among the fastest of the scripting languages
  • it has a JIT available that adds only 32Kb of code (compiled) to the core, and compiles most functions in microseconds (very low overhead and memory footprint)
  • it has a really good and well-documented embedding/extension API

I believe there is a profound benefit to making software as lightweight as it can possibly be while still accomplishing its task. Most software fails miserably by this criteria. I often think of Pascal’s quote “I would have written a shorter letter, but I did not have the time” and think how much better off we would be if software developers had the same attitude. Lua is one of those rare jewels that takes this to heart. But let’s take a look at Tamarin, which is the favored next-gen JavaScript implementation. This is from the documentation about it’s garbage collector:.

MMgc is not only a garbage collector, but a general-purpose memory manager. The Flash Player uses it for nearly all memory allocations.

Oh fantastic! Because definitely the one thing that my application doesn’t already have is a memory manager. I am so glad that embedding JavaScript into my program is going to drag in a 15,000 lines of code implementing heaps, spin locks, memory barriers, and complicated C++ macros. Couldn’t you have taken the time to write a shorter letter — err, language implementation? Does JavaScript really need that entire 15,000 line memory manager? Let’s get some perspective on what you can do in 15,000 lines:

  • Lua is 14,000 lines total. That includes the lexer, parser, vm, compiler, byte code format, extension APIs, all of the standard libraries, and the garbage collector.
  • Gazelle is currently 5000 lines total. That includes the parser, LL lookahead calculation, NFA construction, NFA to DFA conversion, DFA minimization, code to read and write byte code in Bitcode format, and code to do the actual parsing.

If you add the code in Tamarin core, you’re now up to 70,000 lines of code. Add in the regular expression library and you’re up to 95,000. Sure, JavaScript as a language isn’t as minimal as Lua, but is it really 6 times more difficult to implement?

I know that Tamarin is a gift from Adobe, and that I shouldn’t kick a gift horse in the mouth. Their software from which they have taken Tamarin is probably happy to use MMgc everywhere. I just wish that our profession gave greater value to the virtue of brevity.

You might write me off as a sad, strange man to care so much about this, but are you so willing to write off Steve Yegge? Code’s worst enemy is size, he says. So there! Proof by Steve Yegge.

Back to JavaScript and Lua. I think JavaScript is a fine language for my porpoises. I just wish there was an implementation of it that was as good as Lua. Size, speed, flexibility, choose any three. No? But Lua does!

I’ve looked over all of the JavaScript implementations listed on Wikipedia. Spidermonkey is small-ish and flexible, but slow. Of the other implementations I like NJS the best (but it looks unmaintained), and SEE looks ok too. But none of these is nearly as nice and small as Lua. It will be hard to let it go.

Posted in Gazelle| 3 Comments | 

« Previous Postings | Next Postings »