Archive

Archive for January 12th, 2009

Gazelle in the browser

January 12th, 2009 Josh No comments

While I’m on a roll with these blog entries, I thought I’d share just one more secret that’s been kicking around in my head for a while. I desperately want to see a Gazelle development environment that runs in your web browser.

I first had this idea when I visited the excellent regular expression reanimator. It’s this beautiful little visualization that runs in your browser and shows you how a DFA transitions in response to text. And all you have to do to use it is click that little link!

My dream is that you can develop your grammars interactively. You type rules and as you type you see little graphs build themselves up, state by state. You add a kleene star and you see the extra transitions add themselves to the existing graph. You put some sample text in another textarea and watch it start syntax-highlighting itself as you add the rules to properly recognize its structure. Or you can develop your own syntax highlighting color schemes by typing syntax highlighting rules and again seeing them incrementally take effect on a block of sample text.

None of these ideas are brand-new. ANTLR of course has ANTLRWorks, which is an IDE for developing ANTLR grammars. But I feel that having this on the web would really lower the barrier to entry and make it accessible to more people. And despite how people poo-poo JavaScript and HTML, the web is not such a bad platform these days — especially now that SVG support is getting really good. SVG is the perfect tool for drawing graphs, styling the nodes in various ways, and having them do useful things when you hover/click on them.

So anyway, I really want this. But the hurdles to making this happen are great!

  • Gazelle’s compiler is written in Lua. No real way to run Lua in a browser right now. Possible options for doing this in the future: a Lua -> JavaScript compiler, a Lua VM written in JavaScript, porting the Gazelle compiler to JavaScript, running the Lua interpreter inside NativeClent, and running the Lua interpreter inside Alchemy.
  • Gazelle’s runtime is written in C. Possible options: port the runtime to JavaScript, run it inside Alchemy, or run it inside NativeClient.
  • The Graphviz graph layout package is also written in C. Possible options are the same as the Gazelle runtime. I was seriously considering porting just the layout algorithms to JavaScript and had even started the port, but have recently been realizing how CPU-intensive these algorithms are even in C. In JavaScript they could only be slower. With Alchemy they would be slower than pure C, though it’s hard to say how much. With NativeClient they would run at native speed.

So it pretty much looks like I’m stuck waiting for either Alchemy or NativeClient to be prime-time. While I could always run all this software on the server and AJAX the results in constantly, I don’t think that would provide a smooth enough user experience (not to mention the load it would put on my server — these algorithms are kind of expensive).

As far as Alchemy vs. NativeClient: Alchemy will be slower (2-10x according to them) since it compiles your C/C++ to the Flash VM instead of to machine code, and it will require the Flash plugin. NativeClient will run at roughly native speeds, but will require a plugin download (most people will have flash already) and one that has potentially scary security implications if the NativeClient guys made any mistakes.

Time to play with Alchemy and see if I can get it to work! And hope that NativeClient makes a lot of progress in no time!

Categories: Gazelle Tags:

Playing with LLVM

January 12th, 2009 Josh 3 comments

For no reason in particular I decided to dig into LLVM today. I decided to try out its assembly language by writing a line count program in it (I chose this problem since I had previously written an optimized x86 version using SSE). This was of course significantly trickier than I initially expected and took me several hours to figure out.

SSA took some getting used to. I understood the basic idea before but had never worked with it. The surprising thing is that you don’t get registers, which means that to do something as simple as the “i++” of a for loop you have to:

  %current_i_val = load i32* %i
  %new_i_val = add i32 %current_i_val, 1
  store i32 %new_i_val, i32* %i

In this example, “%i” is the address of a stack variable where “i” lives. I have to load the current value, increment it, and store it again.

This load/modify/store business made me uncomfortable because it felt really inefficient. You just have to trust that when LLVM does register allocation that it’s smart enough to know that it doesn’t actually have to do the load/store each time.

I was also surprised that LLVM assembly language is more strict that real assembly languages about the structure of each function. Each function has a series of “code blocks” and each code block must begin with a label and end with a “termination instruction.” As an example, here’s a function definition and the first two blocks in that function:

; Returns the number of newlines found in a buffer.
define i32 @count_newlines(i8* %buffer, i32 %bufsize) {
  ; Local variables.
  %i = alloca i32
  %num_lines = alloca i32

  store i32 0, i32* %i           ; Initialize to zero.
  store i32 0, i32* %num_lines   ; Initialize to zero.
  br label %loop  ; can't just fall through to %loop

loop:
  %i_val = load i32* %i
  %is_done = icmp uge i32 %i_val, %bufsize
  br i1 %is_done, label %done, label %continue_loop

So you can’t just put labels anywhere willy-nilly, you only get one label at the beginning of the block. The termination instructions are mostly branches of various sorts. Basically this means that you can’t just “fall through” to the next label without branching — if you want to proceed to the next block you have to explicitly end your current block by branching to the next one. So in this sense the order of the blocks as you write them in the file is insignificant — the only significance is the first one, which is automatically run when you enter the function.

Once you write your LLVM assembly language, you can assemble it with llvm-as, and then either run it with “lli” (which will JIT+run it) or compile it to x86 assembly with “llc” and then assemble it with gcc.

$ llvm-as wc.ll
$ lli wc.bc < /usr/share/dict/american-english-huge
638818
$ llc wc.bc
$ gcc -o wc wc.s
$ ./wc < /usr/share/dict/american-english-huge
638818
Categories: Uncategorized Tags: