Inside a register allocator


We are going to be discussing LLVM's Fast Register Allocator: you might like to open [RegAllocFast.cpp](https://github.com/llvm-mirror/llvm/blob/master/lib/CodeGen/RegAllocFast.cpp) and refer to it as we go through the article.

FastRegAlloc allocates linearly, going through instructions and their operands in order. It uses `PhysRegState` to keep track of the state of various physical registers: they can be 0 (`regDisabled`), 1 (`regFree`), 2 (`regReserved`), or a virtual register number (a large number). At the time of allocation, the full UseDef information is available (so you have information like `LR.LastUse`); a register can either be a `<def>` or a use. If `IsImplicit` is flipped, it could also be a implicit-def (`<imp-def>`) or implicit-use (`<imp-use>`).

%vreg349 = MOVZX32rm8 %vreg351, 1, %noreg, 0, %noreg

turns into

%ESI = MOVZX32rm8 %RDX, 1, %noreg, 0, %noreg

after allocation. The `<kill>` indicates that the instruction is the last use of `%RDX`. In another example,

CQO %RAX, %RDX, %RAX

turns into

CQO %RAX, %RDX, %RAX

Notice that the instruction references physical registers even before the allocation. That's because this instruction specifically wants to sign extend `%RAX` into `%RDX` (not any other register) for use in a later `IDIV`. Remember that `IDIV` operates on `%RDX:%RAX` as the numerator, and writes the quotient in `%RAX`, reminder in `%RDX`.

The instruction is modeled as `MachineInstr`, and the operands as `MachineOperand`. Note that a `MO` doesn't have to be a register: it could also be an immediate value; we use `MO.isReg()` to find out if it's a register, physical or virtual. In addition to the states shown in the pretty-print, `MO` can also be `EarlyClobber`, `Dead`, or `Undef`. A Dead `MO` implies Def, and indicates that the value defined is used no longer.

`AllocateBasicBlock`, which operates on `MachineBasicBlock`, can be separated into three different scans, that operate on register MOs. Before the first scan, we set up the `LiveIns` (registers coming in live from the previous `MBB`) as `regReserved` so they aren't clobbered. The first scan operates on physical registers that are allocatable Uses; it calls `usePhysReg` on them. At this stage, the physical register must be either `regDisabled`, `regReserved`, or `regFree`. It cannot be allocated to a virtual register. Why? Imagine you see:

CQO %RAX, %RDX, %RAX

Now, if `%RAX` were already allocated to a virtual register, we're basically screwed in this linear walk. Otherwise, kill it, mark it as `regFree`, and move on: we have done our part by completing the use of the register that was reserved earlier.

The second scan operates only on virtual register Uses. We add the register to `LiveVirtRegs`, via `reloadVirtReg`. If it didn't already exist in the map, we `allocVirtReg` it, which essentially gets the first `regFree` register not used in an instruction, and calls `assignVirtToPhysReg` on it. `assignVirtToPhysReg` is very simple: it just updates the `PhysRegState` mapping. Finally, `reloadVirtReg` updates the `UsedInInstr` map.

The third scan operates on physical and virtual registers that are Defs. If the register is a physical register, it does `definePhysReg` with `regReserved` unless `MO.isDead()`, in which case it's regFree. `definePhysReg` is very simple: it just puts the register in the state that was requested (`regReserved` or `regFree`, in this case).

To think about the problem, if we have,

%RAX = COPY %vreg342

we should always `regReserve` `%RAX`, right? Except if `%RAX` is dead. What about if it's an `<imp-def>`?

CQO %RAX, %RDX, %RAX

We didn't pass `CQO` `%RAX` explicitly, but that doesn't mean that a later instruction (for instance, `IDIV`) is not relying on this register's value. If we have an instruction between `CQO` and `IDIV`, that can potentially clobber `%RAX`, leading to a regalloc crash. Hence, we `regReserve` even if `MO.isImplicit()`.

The third scan does `defineVirtReg` on virtual registers, to grab the first free physical register for the virtual register.

Note: the "pretty-prints" are generated by setting a breakpoint in `AllocateBasicBlock` and executing `p MBB->dump()`.