Playing with LLVM

For no reason in particular I decided to dig into LLVM today. I decided to try out its assembly language by writing a line count program in it (I chose this problem since I had previously written an optimized x86 version using SSE). This was of course significantly trickier than I initially expected and took me several hours to figure out.

SSA took some getting used to. I understood the basic idea before but had never worked with it. The surprising thing is that you don’t get registers, which means that to do something as simple as the “i++” of a for loop you have to:

  %current_i_val = load i32* %i
  %new_i_val = add i32 %current_i_val, 1
  store i32 %new_i_val, i32* %i

In this example, “%i” is the address of a stack variable where “i” lives. I have to load the current value, increment it, and store it again.

This load/modify/store business made me uncomfortable because it felt really inefficient. You just have to trust that when LLVM does register allocation that it’s smart enough to know that it doesn’t actually have to do the load/store each time.

I was also surprised that LLVM assembly language is more strict that real assembly languages about the structure of each function. Each function has a series of “code blocks” and each code block must begin with a label and end with a “termination instruction.” As an example, here’s a function definition and the first two blocks in that function:

; Returns the number of newlines found in a buffer.
define i32 @count_newlines(i8* %buffer, i32 %bufsize) {
  ; Local variables.
  %i = alloca i32
  %num_lines = alloca i32

  store i32 0, i32* %i           ; Initialize to zero.
  store i32 0, i32* %num_lines   ; Initialize to zero.
  br label %loop  ; can't just fall through to %loop

loop:
  %i_val = load i32* %i
  %is_done = icmp uge i32 %i_val, %bufsize
  br i1 %is_done, label %done, label %continue_loop

So you can’t just put labels anywhere willy-nilly, you only get one label at the beginning of the block. The termination instructions are mostly branches of various sorts. Basically this means that you can’t just “fall through” to the next label without branching — if you want to proceed to the next block you have to explicitly end your current block by branching to the next one. So in this sense the order of the blocks as you write them in the file is insignificant — the only significance is the first one, which is automatically run when you enter the function.

Once you write your LLVM assembly language, you can assemble it with llvm-as, and then either run it with “lli” (which will JIT+run it) or compile it to x86 assembly with “llc” and then assemble it with gcc.

$ llvm-as wc.ll
$ lli wc.bc < /usr/share/dict/american-english-huge
638818
$ llc wc.bc
$ gcc -o wc wc.s
$ ./wc < /usr/share/dict/american-english-huge
638818
This entry was posted in Uncategorized. Bookmark the permalink.

4 Responses to Playing with LLVM

  1. Eugene O'neil says:

    Actually, LLVM has an infinite supply of registers: for example, current_i_value is a register. They just aren’t rewritable. That’s what “static single assignment” means.

    An easy way to get around this limitation is to use loads and stores, as you have discovered, but the “correct” way is to use the mysterious and inscrutable phi instruction:

    http://llvm.org/docs/LangRef.html#i_phi

    This documentation even provides a simple example of how to write a counting loop, without loads and stores.

    There is an optimization pass that tries to convert the load-and-store method to the phi method, but it isn’t in the register allocation phase.

  2. Alex says:

    The pass Eugene mentioned is called ‘mem2reg’.

    http://llvm.org/docs/tutorial/LangImpl7.html

    Actually using this method saves a lot of efforts because it’s not trivial to find where to insert these phi nodes.

  3. Anton Korobeynikov says:

    As mentioned previously – you’ll get SSA ‘for free’. Write everything in load/store manner, then run opt -std-compile-opts and it will try to optimize your bytecode hard (e.g. convert to ssa form with phi’s, etc).

  4. Oh, THAT’S how to do a for loop in LLVM. I’ve chosen LLVM as my first assembly language. Tutorials like this are a dire necessity; Assembly is hard enough without SSA.

    Hello LLVM
    http://www.yellosoft.us/hello-llvm

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>