Skip to end of metadata
Go to start of metadata

The optimizing compiler intermediate representation (IR) is held in an object of type IR and includes a list of instructions. Every instruction is classified into one of the pre-defined instruction formats. Each instruction includes an operator and zero or more operands. Instructions are grouped into basic blocks; basic blocks are constrained to having control-flow instructions at their end. Basic blocks fall-through to other basic blocks or contain branch instructions that have a destination basic block label. The graph of basic blocks is held in the cfg (control-flow graph) field of IR.

This section documents basic information about the intermediate instruction. For a tutorial based introduction to the material it is highly recommended that you read "Jikes RVM Optimizing Compiler Intermediate Code Representation".

IR Operators

The IR operators are defined by the class Operators, which in turn is automatically generated from a template by a driver. The input to the driver are two files, both called OperatorList.dat. One input file resides in $RVM_ROOT/rvm/src-generated/opt-ir and defines machine-independent operators. The other resides in $RVM_ROOT/rvm/src-generated/opt-ir/${arch} and defines machine-dependent operators, where ${arch} is the specific instruction architecture of interest.

Each operator in OperatorList.dat is defined by a five-line record, consisting of:

  • SYMBOL: a static symbol to identify the operator
  • INSTRUCTION_FORMAT: the instruction format class that accepts this operator.
  • TRAITS: a set of characteristics of the operator, composed with a bit-wise or (|) operator. See for a list of valid traits.
  • IMPLDEFS: set of registers implicitly defined by this operator; usually applies only to machine-dependent operators
  • IMPLUSES: set of registers implicitly used by this operator; usually applies only to machine-dependent operators

For example, the entry in OperatorList.dat that defines the integer addition operator is

<blank line>
<blank line>

The operator for a conditional branch based on values of two references is defined by

branch | conditional
<blank line>
<blank line>

Additionally, the machine-specific OperatorList.dat file contains another line of information for use by the assembler. See the file for details.

Instruction Formats

Every IR instruction fits one of the pre-defined Instruction Formats. The Java package defines roughly 75 architecture-independent instruction formats. For each instruction format, the package includes a class that defines a set of static methods by which optimizing compiler code can access an instruction of that format.

For example, INT_MOVE instructions conform to the Move instruction format. The following code fragment shows code that uses the Operators interface and the Move instruction format:

class X {
  void foo(Instruction s) {
    if (Move.conforms(s)) {     // if this instruction fits the Move format
      RegisterOperand r1 = Move.getResult(s);
      Operand r2 = Move.getVal(s);
      System.out.println("Found a move instruction: " + r1 + " := " + r2);
    } else {
      System.out.println(s + " is not a MOVE");

This example shows just a subset of the access functions defined for the Move format. Other static access functions can set each operand (in this case, Result and Val), query each operand for nullness, clear operands, create Move instructions, mutate other instructions into Move instructions, and check the index of a particular operand field in the instruction. See the Javadoc reference for a complete description of the API.

Each fixed-length instruction format is defined in the text file $RVM_ROOT/rvm/src-generated/opt-ir/InstructionFormatList.dat. Each record in this file has four lines:

  • NAME: the name of the instruction format
  • SIZES: the number of operands defined, defined and used, and used
  • SIG: a description of each operand, each description given by
    • D/DU/U: Is this operand a def, use, or both?
    • NAME: the unique name to identify the operand
    • TYPE: the type of the operand (a subclass of Operand)
    • [opt]: is this operand optional?
  • VARSIG: a description of repeating operands, used for variable-length instructions.

So for example, the record that defines the Move instruction format is

1 0 1
"D Result RegisterOperand" "U Val Operand"
<blank line>

This specifies that the Move format has two operands, one def and one use. The def is called Result and must be of type RegisterOperand. The use is called Val and must be of type Operand.

A few instruction formats have variable number of operands. The format for these records is given at the top of InstructionFormatList.dat. For example, the record for the variable-length Call instruction format is:

1 0 3 1 U 4
"D Result RegisterOperand" \
"U Address Operand" "U Method MethodOperand" "U Guard Operand opt"
"Param Operand"

This record defines the Call instruction format. The second line indicates that this format always has at least 4 operands (1 def and 3 uses), plus a variable number of uses of one other type. The trailing 4 on line 2 tells the template generator to generate special constructors for cases of having 1, 2, 3, or 4 of the extra operands. Finally, the record names the Call instruction operands and constrains the types. The final line specifies the name and types of the variable-numbered operands. In this case, a Call instruction has a variable number of (use) operands called Param. Client code can access the ith parameter operand of a Call instruction s by calling Call.getParam(s,i).

A number of instruction formats share operands of the same semantic meaning and name. For convenience in accessing like instruction formats, the template generator supports four common operand access types:

  • ResultCarrier: provides access to an operand of type RegisterOperand named Result.
  • GuardResultCarrier: provides access to an operand of type RegisterOperand named GuardResult.
  • LocationCarrier: provides access to an operand of type LocationOperand named Location.
  • GuardCarrier: provides access to an operand of type Operand named Guard.

For example, for any instruction s that carries a Result operand (eg. Move, Binary, and Unary formats), client code can call ResultCarrier.conforms(s) and ResultCarrier.getResult(s) to access the Result operand.

Finally, a note on rationale. Religious object-oriented philosophers will cringe at the InstructionFormats. Instead, all this functionality could be implemented more cleanly with a hierarchy of instruction types exploiting (multiple) inheritance. We rejected the class hierarchy approach due to efficiency concerns of frequent virtual/interface method dispatch and type checks. Recent improvements in our interface invocation sequence and dynamic type checking algorithms may alleviate some of this concern.

  • No labels