Posted by josh at February 26th, 2008

I’m asking this question, not answering it!

While you’re sitting tight waiting for Gazelle 0.2, I have a challenge I’m putting to my readers. I want my program gzlparse to parse some input text and output the parse tree in some useful format. What is the most useful format for visualizing a parse tree?

I want both a good text-based format and a good graphical format, if possible. For text formats there’s:

  • XML (ugh. I didn’t want to say it, but I knew someone else would if I didn’t first. I’ll probably support it, but I’ll put ambivalent emoticons in the source code).
  • S-expressions. Maybe I’ll win over some LISPers.
  • ??

A good useful text format would be nice, but a good graphical format could be groundbreaking. I could always draw it as a tree, but I’m wondering if there isn’t something better. Something that keeps the text in its original format, but uses color or borders or something like that to represent the parse tree structure.

Here’s an example of the kind of visualization I think is really great and innovative. It’s the way that Lurker displays an email thread:

lurker

What’s so brilliant about this view is that it shows you both time-order of the messages and complete threading information in an attractive way. Of course, this is simpler than a parse tree, and such a nice view of a parse tree might not be possible. But what I’d really love to see is a parse tree visualization that:

  • kept the original text recognizable (viewing it purely as a tree throws away the original text formatting completely)
  • shows the parse tree structure somehow
  • major bonus: can be rendered in a browser using a DOM. like, would allow me to write JavaScript to create this DOM inside the browser.

If this were possible, then I could write a web-based syntax analyzing text box that parsed your text as you typed it and showed you beautiful graphical representations of the parse. Something like this extremely awesome interactive regex visualizer, but for full context-free grammars. Or something like what ANTLRWorks provides, but on the web.

That would be SO HOT.