Gazelle v0.2 is here!

Posted by josh at June 29th, 2008

It’s been a long time coming, but Gazelle v0.2 is finally here!

To me, Gazelle 0.2 represents a significant shift. With 0.2, Gazelle is finally in a place where I think it’s ready for people to tinker with. In 0.2 there is enough documentation to figure out what’s going on, and the command-line programs like gzlparse have reasonable –help messages and can do useful things. Starting with Gazelle 0.2, your problems are my problems: things that don’t work right should either be fixed or be written down as TODO items.

Gazelle 0.1 to 0.2 was a major overhaul — implementing LL(k) lookahead took major surgery. Half the time the code didn’t actually work, because major rewrites were only partially done. I expect all that to change with Gazelle 0.2 — I want future releases to be far more incremental, and for every commit to leave the repository in a working state. I want a 0.2.1 and 0.2.2 that fix a lot of the edge cases that still aren’t right in 0.2 (you can read more about these shortcomings in the “Tour” section of the manual or in the TODO).

There are still many major features to add to Gazelle in the future — you can see a list in the TODO. But again, I think these can be added without breaking the tree in the meantime.

There’s one major bummer about 0.2. I had to completely remove the “@ignore” feature, which was Gazelle’s answer to letting you ignore whitespace/comments without having a separate lexer. I removed it because I realized that the abstraction I had invented for expressing this concept was not quite right, and that a more general-purpose abstraction was the right answer — the abstraction I have in mind will also handle things like languages embedded in other languages (like Ruby inside HTML: RHTML). But the bummer is that for the moment, Gazelle has no answer for how to ignore whitespace/comments. So it’s clearly not useful for real work yet.

So try it out and send any feedback you have to gazelle-users. Thanks!

Posted in Gazelle| No Comments | 

Gazelle Grammar Visualization

Posted by josh at April 10th, 2008

I’ve been quiet about Gazelle news lately, but since I wrote last I’ve hit 3 of my 6 goals for Gazelle 0.2, and one that I hadn’t thought to include. To review those goals and see which ones I’ve completed:

  • complete Strong-LL(k) lookahead support. (it’s not 100% complete yet, but it’s definitely solid enough for a 0.2 release)
  • a command-line compiler program (gzlc) that takes reasonable options and is simple enough to use by reading its –help
  • a “tour” section for the manual
  • a command-line program (gzlparse) that can output the parse tree in a useful format, so you can see how Gazelle parses your input text.
  • a test suite, so that when people report bugs I can add the bugs to the test suite and not regress.
  • (stretch): make Gazelle self-hosting, so that the parser is more robust and easier to understand than the hand-written recursive descent parser I’m currently using. I don’t want people to have to deal with corner-case parser bugs.
  • a way to visualize grammars, to spot-check them against your expectations

It’s the grammar visualization that I forgot to include. I mentioned parse tree visualization a few blog posts ago, but this is different — one is visualizing how a bunch of text got parsed, the other is visualizing the grammar itself.

It still has room for improvement, but here is what my grammar visualization currently looks like for JSON. You can see an NFA for each one of your rules, a DFA for each state of lookahead, and the DFAs that do the lexing.

The latest code from Git (note that I recently moved from repo.or.cz to Github) can generate these grammar dumps — just pass ‘-d’ to gzlc.

Posted in Gazelle| 1 Comment | 

What’s the best way to visualize a parse tree?

Posted by josh at February 26th, 2008

I’m asking this question, not answering it!

While you’re sitting tight waiting for Gazelle 0.2, I have a challenge I’m putting to my readers. I want my program gzlparse to parse some input text and output the parse tree in some useful format. What is the most useful format for visualizing a parse tree?

I want both a good text-based format and a good graphical format, if possible. For text formats there’s:

  • XML (ugh. I didn’t want to say it, but I knew someone else would if I didn’t first. I’ll probably support it, but I’ll put ambivalent emoticons in the source code).
  • S-expressions. Maybe I’ll win over some LISPers.
  • ??

A good useful text format would be nice, but a good graphical format could be groundbreaking. I could always draw it as a tree, but I’m wondering if there isn’t something better. Something that keeps the text in its original format, but uses color or borders or something like that to represent the parse tree structure.

Here’s an example of the kind of visualization I think is really great and innovative. It’s the way that Lurker displays an email thread:

lurker

What’s so brilliant about this view is that it shows you both time-order of the messages and complete threading information in an attractive way. Of course, this is simpler than a parse tree, and such a nice view of a parse tree might not be possible. But what I’d really love to see is a parse tree visualization that:

  • kept the original text recognizable (viewing it purely as a tree throws away the original text formatting completely)
  • shows the parse tree structure somehow
  • major bonus: can be rendered in a browser using a DOM. like, would allow me to write JavaScript to create this DOM inside the browser.

If this were possible, then I could write a web-based syntax analyzing text box that parsed your text as you typed it and showed you beautiful graphical representations of the parse. Something like this extremely awesome interactive regex visualizer, but for full context-free grammars. Or something like what ANTLRWorks provides, but on the web.

That would be SO HOT.

Posted in Gazelle| 5 Comments | 

Setting the sights for Gazelle 0.2

Posted by josh at February 26th, 2008

I’m really excited about the interest the Gazelle manual has generated! Thanks for checking it out, and for your feedback.

I got a little scared when people said they were going to start trying Gazelle out now, because it immediately made me think of how many things I’ve been meaning to fix before I unleashed it on anybody else! But on the other hand, it gives me all the more motivation to get it to a point where other people can try it out. And I’m never more motivated or get as much done as when I know people are waiting for me!

So here’s my line for the moment. Don’t try out Gazelle just yet. There are too many things for me to fix at the moment that I know are broken. But I want to fix those things ASAP and get Gazelle 0.2 out the door, so I can finally have a release that I can recommend people try out.

When will Gazelle 0.2 come? I’m hoping no more than a month. Here’s the target feature set:

  • complete Strong-LL(k) lookahead support. I have the code to generate Strong-LL(k) lookahead, I just need to support this at the bytecode and runtime stage.
  • a command-line compiler program (gzlc) that takes reasonable options and is simple enough to use by reading its --help
  • a “tour” section for the manual
  • a command-line program (gzlparse) that can output the parse tree in a useful format, so you can see how Gazelle parses your input text.
  • a test suite, so that when people report bugs I can add the bugs to the test suite and not regress. this will be important for keeping my sanity.
  • (stretch): make Gazelle self-hosting, so that the parser is more robust and easier to understand than the hand-written recursive descent parser I’m currently using. I don’t want people to have to deal with corner-case parser bugs.

Posted in Gazelle| No Comments | 

Gazelle Manual

Posted by josh at February 25th, 2008

Check out the Gazelle manual that I just put online. I’ve put a ton of work into it, and it’s surprisingly substantial for a project at this stage. I invested the work now because I want something I can point people to that demonstrates my plan for Gazelle, even if the implementation isn’t there yet. For people who are veterans in the parsing field, I want to be able to point them here when they ask the question “so what kind of algorithm are you using?” For people who are skeptical that I can really improve on existing tools, I want to point them here to demonstrate my concrete plans for doing so.

Think of it as “the anti-fluff piece.” And as a bonus, when Gazelle is actually ready for general consumption, it will have a great manual ready-to-go.

Posted in Gazelle| 6 Comments | 

« Previous Postings | Next Postings »