Posts Tagged ‘llvm’

Demystifying Unladen Swallow

Tuesday, October 13th, 2009

Unladen Swallow is “an optimization branch of CPython, intended to be fully compatible and significantly faster”. Interesting. In order to understand Unladen Swallow, we first needed to understand the CPython compiler. It’s  written in C and fairly easy to understand, very unlike the self-hosting complex beasts Steel Bank Common Lisp and Glasgow Haskell Compiler. The design of the compiler is explained fully in PEP 339, and I’m just including a small warm-up discussion.

Start off at the top level: the evaluator (Python/cevel.c). It executes Python bytecode. From the Python program you write to the Python bytecode that is produced, there are several stages with optimizations at every stage. It goes like this:

  1. Using the SPARK parser generator, generate the Parse Tree from the Python code  (Parser/pgen.c)
  2. Convert the Parse Tree into an Abstract Syntax Tree (Python/ast.c)
  3. Transform the AST into a Control Flow Graph. Several compiler optimizations like removing unreachable code are implemented here (Python/compile.c)
  4. Post-order DFS the CFG and flatten it to emit the final Python bytecode. The bytecode is then subjected to various peephole optimizations (Python/compile.c)
  5. Instead of having to go through this this long procedure everytime we want to run the same program, the Python bytecode produced is stored in a .pyc file.

Enough warming up. The Unladen Swallow developers want to optimize CPython very smoothly, preferably without breaking any existing code. Their Project Plan is laid out on the Wiki page; I’ll mainly discuss 2009Q2.

If they work at any stage higher than the Python bytecode, they’ll have to re-implement the fantastic peephole optimizations written by the community over the years. The plan is simple: instead of executing the high-level bytecode in the eval loop (which can be painfully slow), compile atleast some part of it into a lower representation that can be executed faster. Why not compile all of it to x86 assembly? Firstly, we’d lose the interactive shell, a very major part of Python. Secondly, we’d lose out on Python’s high portability, that arises because it’s not tied down to any single architecture’s assembly language (I even run Python on my phone using the PyS60 package). Thirdly, it’s terribly difficult to do, makes Python uglier and unmaintainable, and it’s utterly pointless- we’re trading off too much for execution speed.

Just-in-time (JIT) compilation is the solution. Compile the code only when needed. Then again, don’t compile everything; compile when the new execution time is less than the old execution time by an extent enough to justify the compile time. Otherwise, the startup delay will become unbearable and the interactive interpreter won’t really be interactive anymore. The perfect low-level platform independent representation to JIT to is LLVM (Low-level virtual machine). So, the Unladen Swallow developers first wrote a compiler from Python bytecode to LLVM IR from scratch (Python/llvm_compile.cc). The LLVM IR (Intermediate Representation) looks a lot like assembly, except that it’s architecture-independent,  and can be subject to a nice set of optimizations. Then they modified the eval loop to execute LLVM IR, while keeping the existing Python bytecode evaluator (file renamed: Python/eval.cc).

Q3 and Q4 aren’t updated on the Wiki page, but it should be easy enough to find out the work in progress from the IRC channel, mailing list, and repository commits.