Core Concepts in Compiler Design and Language Runtime

Classified in Computers

Written on in English with a size of 10.45 KB

Core Concepts in Compiler Design

Compiler

Compiler: Translates entire source code to target code before execution. It requires a full parse and upfront error checking, then executes the generated target code.

Interpreter

Interpreter: Executes source code incrementally (line-by-line or statement-by-statement). It translates and executes on the fly, and may partially execute ill-formed programs until an error is encountered.

LVar vs. x86 Architecture

LVar: Features nested expressions, implicit control flow (represented by an Abstract Syntax Tree - AST), and an unlimited number of logical variables.

x86: Characterized by flat instructions, atomic operands (registers/memory), explicit control flow (jumps), and a limited set of registers. Compilation passes (e.g., Remove Complex Operands - RCO, Explicate Control) bridge the gap between LVar and x86 representations.

Lexical Scope

Lexical Scope: A function can access variables from its definition (enclosing) context, not its call context.

Closure

Closure: A pair consisting of a function code pointer and an environment pointer. The environment stores the values of free variables captured from the lexical scope, enabling lexically scoped functions to operate correctly when passed around or returned from other functions.

Scala Expression Parsing Example

def parseExpr(s: SExpr): Expr = s match {
  case SNum(n) => Num(n)
  case SList(SSym("+") :: e1 :: e2 :: Nil) =>
    Prim(Plus(), List(parseExpr(e1), parseExpr(e2)))
  case SList(SSym("-") :: e1 :: e2 :: Nil) =>
    Prim(Minus(), List(parseExpr(e1), parseExpr(e2)))
  case SList(SSym("switch") :: SNum(n) :: rest) if rest.nonEmpty =>
    // Last element should be default
    val defaultExpr = rest.last match {
      case SList(SSym("default") :: expr :: Nil) => parseExpr(expr)
      case _ => throw new ParseException("Invalid default case format", s)
    }
    // Parse cases (all elements except the last default)
    val cases = rest.dropRight(1).map {
      case SList(SNum(caseVal) :: expr :: Nil) => (Num(caseVal), parseExpr(expr))
      case _ => throw new ParseException("Invalid case format", s)
    }
    // Check for duplicates
    val caseValues = cases.map(_._1)
    if (caseValues.distinct.length != caseValues.length) {
      throw new ParseException("Duplicate case clauses are not allowed", s)
    }
    Switch(Num(n), cases, defaultExpr)
  // Invalid cases
  case SList(Nil) => throw new ParseException("Empty list is not a valid expression", s)
  case SList(_) => throw new ParseException("Invalid expression format", s)
  case _ => throw new ParseException("Invalid expression type", s)
}

Advanced Compiler Concepts and Runtime Details

Liveness Analysis Formula

Liveness: The set of variables live before a program point k is defined by:

L_before(k) = (L_after(k) - W(k)) ∪ R(k)

Where L_after(k) are variables live after k, W(k) are variables written (defined) at k, and R(k) are variables read (used) at k.

Interference Graph Construction

Interference Graph: Used in register allocation to represent variables that cannot reside in the same register.

  • For a move instruction (movq s, d): For every variable v in L_after(k), if v ≠ d and v ≠ s, add an edge (d, v).
  • For other instructions: For every variable d defined (written) at k (d ∈ W(k)) and every variable v live after k (v ∈ L_after(k)), if v ≠ d, add an edge (d, v).

Indirect Jumps and Function Calls

Indirect Jump: An instruction like leaq label(%rip), dest stores the instruction-pointer-relative address of label at dest, which can then be used with call *dest for an indirect function call.

Before a Function Call (Caller Setup)

  • Caller pushes live caller-saved registers onto the stack.
  • Parameters are passed via registers or the stack.
  • The Stack Pointer (SP) is set to point to the last data item on the stack.

Call Prelude (Callee Setup)

  • The callee pushes the old Base Pointer (BP) onto the stack.
  • The new BP is set to the current SP.
  • SP is incremented to allocate space for spills (local variables or temporary values).
  • Callee saves any used callee-saved registers onto the stack.
  • Control jumps to the function body label.

After a Function Call (Caller Cleanup)

  • The Stack Pointer (SP) is adjusted to deallocate parameters and local stack space.
  • Callee-saved registers are restored from the stack.
  • The caller's Base Pointer (BP) is restored.
  • Control returns via retq.
  • The caller reinstates caller-saved registers from the stack.

Intra-procedural vs. Inter-procedural Analysis

Intra-procedural Analysis: Analysis performed within the scope of a single function or procedure.

Inter-procedural Analysis: Analysis performed across multiple functions or procedures, considering their interactions.

Example of inter-procedural data flow (phi function for merging values): i₂ = ⌀/phi (i₁, i₃);

Dominator Nodes in Control Flow Graphs

Dominator: A node d dominates node n if every path from the start node s to n must pass through d.

Switch Tail Explication (Scala Example)

Switch Tail Explication: A compilation pass that transforms switch expressions into a sequence of conditional jumps (IfStmt) and unconditional jumps (Goto), effectively flattening the control flow.

def explicateTail(e: Expr, blocks: ListBuffer[(String, Tail)]): Tail = e match {
  case Num(i) => Return(Num(i))
  case Prim(op, es) => Return(Prim(op, es))
  case Switch(c, branches, default) => {
    val defaultLabel = gensym("block")
    val defaultTail = explicateTail(default, blocks)
    blocks += ((defaultLabel, defaultTail))
    var currentTail: Tail = Goto(defaultLabel)
    branches.reverse.foreach { case (caseNum, expr) =>
      val trueLabel = gensym("block")
      val trueTail = explicateTail(expr, blocks)
      blocks += ((trueLabel, trueTail))
      currentTail = IfStmt(EqC(), c, caseNum, Goto(trueLabel), currentTail)
    }
    currentTail
  }
}
def explicateProgram(p: Lswitch): Cswitch = p match {
  case Lswitch.Program(e) =>
    val blocks = ListBuffer[(String, Tail)]()
    CProgram(List(("start", explicateTail(e, blocks))) ++ blocks)
}

Register Usage Conventions

Caller-saved Registers: Registers whose values must be preserved by the caller across a function call if the caller needs them after the call. These include %rax, %rcx, %rdx, %rsi, %rdi, %r8, %r9, %r10, %r11.

Callee-saved Registers: Registers whose values must be preserved by the callee across a function call. If the callee uses them, it must save and restore their original values. These include %rsp, %rbp, %rbx, %r12, %r13, %r14, %r15.

Compilation Passes Pipeline

A typical sequence of compilation passes:

  1. Uniquify: Ensures all variable names are unique.
  2. Remove Complex Operands (RCO): Simplifies expressions to atomic operands.
  3. Explicate Control: Transforms high-level control structures (like switch) into explicit jumps and conditional branches.
  4. Select Instructions: Maps intermediate representation operations to target machine instructions.
  5. Assign Home: Determines where variables will reside (registers or stack).
  6. Patch Instructions: Fills in addresses and other details after initial instruction selection.
  7. Prelude & Conclusion: Adds function entry and exit code (e.g., stack frame setup/teardown, register saving/restoring).

Dataflow Analysis Techniques

Reaching Definitions Analysis (Forward)

  • Goal: Identify which assignments (e.g., x := ...) might reach a specific program point.
  • Dataflow Value: A set of definitions (e.g., {(x, l1), (y, l2)}, where l1 and l2 are labels of definition sites).
  • Equations:
    • IN(l) = ∪ {OUT(p) | for all predecessors p of l}
    • OUT(l) = GEN(l) ∪ (IN(l) \ KILL(l))
  • GEN/KILL: For an assignment x := ... at label l:
    • GEN(l) = {(x, l)}
    • KILL(l) = all other definitions of variable x in the program.
  • Algorithm: Initialize OUT(entry) = {} and iterate until no further changes occur (fixed point).

Live Variable Analysis (Backward)

  • Goal: Determine which variables might be used later (are "live") at a specific program point.
  • Dataflow Value: A set of variables (e.g., {x, y}).
  • Equations:
    • OUT(l) = ∪ {IN(s) | for all successors s of l}
    • IN(l) = USE(l) ∪ (OUT(l) \ DEF(l))
  • DEF/USE: For a program statement at label l:
    • DEF(l) = variables defined (written) in l.
    • USE(l) = variables read (used) in l.
  • Algorithm: Initialize IN(exit) = {} and iterate (typically in reverse control flow order) until reaching a fixed point.

Related entries: