Reader Challenge: Optimize the BF JIT
In my previous entry Hello, JIT World: The Joy of Simple JITs, the final example was a JIT for BF. The performance of this JIT beat an optimizing interpreter, but lost to a translation to C (which I compiled with gcc). I speculated that the reason for this was that my JIT didn’t have a register allocator (or perform any optimizations at all).
When I asked Mike Pall, author of LuaJIT and DynASM for feedback on the article, he suggested several optimizations that would be much simpler than register allocation. I’ve implemented a few of these already on my own, and already was able to increase performance by 2-3x. Implement all of them, and the JIT would likely beat the compiler.
The process of implementing these optimizations was extremely fun, so I want to give you, my readers, a chance to try it out before I reveal my answers. I encourage you to start by forking my GitHub repository.
Here are the optimizations Mike suggested (I’ll try to give the idea without giving away what you’d optimize these patterns into):
- implement a peephole
optimizer that optimizes runs of
<<<
,>>>
,+++
, and---
(each run can be optimized into a single instruction). - fold
PTR
movements into theinc/dec/add/sub
arguments themselves (use x86 addressing to avoid having to move the pointer between every+/-
instruction). - optimize patterns like
[-]
and[->>>+<<<]
, etc. - take advantage of flag propagation from
add/sub
to optimize awaycmp
instructions when possible (but beware ofinc/dec
and partial flag stalls (see section 5.8, p.58 in The microarchitecture of Intel, AMD and VIA CPUs)).
These are just ideas – not all of them are necessary for good performance and there are almost certainly other ideas I haven’t listed here. Try it out – it’s fun!