You can also read this chapter in a linear Git revision history on the linear branch and compare it to the previous chapter.
The output of Instruction Selection - the mapping from idealized Simple Nodes to machine-specific ones - is difficult to test in bulk without an execution strategy. We won't have an execution stratgy until all three of Instruction Selection, Register Allocation and Encodings are done. Hence in and around the completion of Encoding we expect to find lots of bugs in Instruction Selection and Register Allocation. We certainly hand-inspect the output and fix obvious problems, but still lots bugs linger until we can actually run the code.
This means that other chapters have updates and bug-fixes to the work being described here.
A new abstract Machine class is added, one per unique CPU port. Machine
defines a name for the port, names for all registers, and how to generate
machine Nodes from ideal Nodes. Machine also allows generating split ops
and jumps, both used by the machine independent compiler parts during
code-gen.
A new CallingConv enum is added, to allow picking between various calling conventions.
A new InstructionSelection compile phase is added before Global Code Motion. It is optional, but will replace the ideal Nodes with machine- specific variants. Instruction selection is called with the target CPU name; the porting code is looked up and used then. This allows the Java implementation at least the ability to add CPU ports lazily.
We add a notion of "machine specific" Nodes, which implement the MachNode
interface. MachNode nodes define incoming and outgoing registers for every
instruction, their bit encodings and user representations. Later we'll add
function unit information for a better local schedule.
More static globals move into the CodeGen object, which is passed around at all the top level drivers. It's not passed into the Node constructors, hence not into the idealize() calls, which remains the majority of cases needing a static global. In short the compiler gets a lot more functional, and a big step towards concurrent or multi-threaded compilation.
Machines define registers as small dense integers. The Register Allocator
will use these numbers when allocating. The mappings from small integers to
machine registers is up to the porter, but there's usally an obvious
one-to-one. For the X86, RAX will be register 0, RCX register 1 and so on up
to the 16 GPRs. The XMM registers at 16 and go up to 32. Register numbers
must be unique; this is how the register allocator tracks them.
A RegMask Register Mask class holds onto bitsets of allowed registers. Most
machine ops are limited to some subset of all registers, and the allowed sets
are written as a RegMask. Most RegMasks are immutable, e.g. the set of allowed
registers into a Mul operation, but the register allocator will make
extensive use of mutable register sets.
Calling conventions dictate many fixed registers, and in practice tend to really "bind up" the set of allowed registers.
In the node/cpus/x86_64_v2 directory is a x86_64_v2.java port to an X86 64
V2. This port supports 16 64-bit GPRs, 32 XMM registers and the X86 flags.
There is a pattern matcher for matching X86 ops from idealized Simple
Sea-of-Nodes. Except for the addressing modes pattern matcher matches nearly
one-for-one X86 ops from ideal nodes.
The instSelect phase walks the ideal graph, and maintains a mapping
from ideal-to-machine ops. It does a greedy pattern match, first come first
match. Ops without any special behavior simply do the local lookup e.g. an
Mul exactly matches to an X86 MulX86 directly. The RHS of most matches can
be an immediate constant, matching an immediate-flavored version (MulIX86).
For many X86 operations, if one of the inputs is a Load, the X86 can use an
op-from-memory variant. There's a specific address() call in the X86 matcher
that looks for the various kinds of addressing modes and will build memory-
specific variations. These all need to be called out specifically, but with a
simple one-line match and one-line allocation of the memory version.
In simple, a BoolNode produces a 0/1 value. On an X86 this is done with
the Cmp/SetXX opcode sequence. The X86 Jxx takes in FLAGS and not a
0/1, so the Jxx will skip the SetXX and directly use the Cmp. Using
the ideal Bool directly will require the SetXX. i.e. return a==3 will
match to a CmpX86 RAX,#3; SetEQ RAX; ret; whereas a if( a==3 ) will match
to a CmpX86 RAX,#3; jeq;
In the node/cpus/riscv directory is a riscv.java port to a RISC-V.
This port supports 32 64-bit GPRs, and 32 floating-point registers.
There is a pattern matcher for matching riscv ops from idealized Simple
Sea-of-Nodes. Currently, we are targeting RVA23U64.
Works the same way as X86_64_V2, since riscv doesn't have complex addressing modes, greedy pattern matching doesn't apply to most of the cases. Ops without any special behavior simply do the lookup.
RISC-V is a restricted ISA with simple addressing modes and overloads.
E.g lea is not present.
In x86_64_v2, branching requires a comparison instruction to set flags,
followed by a jump instruction that uses those flags. In contrast, RISC-V
includes all necessary operands within the branch instruction itself,
eliminating the need for a separate comparison step.
E.g
beq x10, x11, some_label
CBranchRISC implements this behavior.
No R-flags exist in RiSC-V. To obtain similar functionality: We can set the value of registers based on the result of the comparison this way explicitly.
xor a0, a0, a1
seqz a0, a0
After the first xor - a0 will be zero if a0 and a1 are equal.
The seqz instruction will set a0 to 1 if a0 is zero from the previous operation.
This matches SetRISC in the nodes/cpus/riscv port.
RISC-V instructions have fixed length of 32 bits(in most cases) which means that we can benefit from displacements in the encoding side.
E.g
flw fa5, 0(a0)
flw fa4, 12(a0)
fadd.s fa0, fa5, fa4
(No load is needed prior to flw)
In RISC-V, general-purpose registers (GPRs) range from x0 to x31, and floating-point registers range
from f0 to f31.
To enhance readability and align with calling conventions, RISC-V defines ABI names for registers, providing more intuitive aliases instead of raw register numbers.
| Register | Alias | Usage | Saved By |
|---|---|---|---|
| x10 | a0 | Function arguments/return values | Caller |
| x11 | a1 | Function arguments/return values | Caller |
| x12 | a2 | Function arguments | Caller |
| x13 | a3 | Function arguments | Caller |
| x14 | a4 | Function arguments | Caller |
| x15 | a5 | Function arguments | Caller |
| x16 | a6 | Function arguments | Caller |
| x17 | a7 | Function arguments | Caller |
When passing in arguments to a function, the first 8 arguments from the caller are passed in registers a0 to a7.
Currently, floating-point constants are first loaded into an integer GPR before being converted into a floating-point register.
This process is rewritten in the RA stage as:
lui a0, 262144
fmv.w.x fa0, a0
For non-constant values, we use a broader register mask that
supports both GPRs and FPRs. This implementation can be found in:
nodes/cpus/riscv/LoadRISC.