Core Concepts in Compiler Design and Language Runtime
Classified in Computers
Written on in English with a size of 10.45 KB
Core Concepts in Compiler Design
Compiler
Compiler: Translates entire source code to target code before execution. It requires a full parse and upfront error checking, then executes the generated target code.
Interpreter
Interpreter: Executes source code incrementally (line-by-line or statement-by-statement). It translates and executes on the fly, and may partially execute ill-formed programs until an error is encountered.
LVar vs. x86 Architecture
LVar: Features nested expressions, implicit control flow (represented by an Abstract Syntax Tree - AST), and an unlimited number of logical variables.
x86: Characterized by flat instructions, atomic operands (registers/memory), explicit control flow (jumps), and a limited set of registers. Compilation passes (e.g., Remove Complex Operands - RCO, Explicate Control) bridge the gap between LVar and x86 representations.
Lexical Scope
Lexical Scope: A function can access variables from its definition (enclosing) context, not its call context.
Closure
Closure: A pair consisting of a function code pointer and an environment pointer. The environment stores the values of free variables captured from the lexical scope, enabling lexically scoped functions to operate correctly when passed around or returned from other functions.
Scala Expression Parsing Example
def parseExpr(s: SExpr): Expr = s match {
case SNum(n) => Num(n)
case SList(SSym("+") :: e1 :: e2 :: Nil) =>
Prim(Plus(), List(parseExpr(e1), parseExpr(e2)))
case SList(SSym("-") :: e1 :: e2 :: Nil) =>
Prim(Minus(), List(parseExpr(e1), parseExpr(e2)))
case SList(SSym("switch") :: SNum(n) :: rest) if rest.nonEmpty =>
// Last element should be default
val defaultExpr = rest.last match {
case SList(SSym("default") :: expr :: Nil) => parseExpr(expr)
case _ => throw new ParseException("Invalid default case format", s)
}
// Parse cases (all elements except the last default)
val cases = rest.dropRight(1).map {
case SList(SNum(caseVal) :: expr :: Nil) => (Num(caseVal), parseExpr(expr))
case _ => throw new ParseException("Invalid case format", s)
}
// Check for duplicates
val caseValues = cases.map(_._1)
if (caseValues.distinct.length != caseValues.length) {
throw new ParseException("Duplicate case clauses are not allowed", s)
}
Switch(Num(n), cases, defaultExpr)
// Invalid cases
case SList(Nil) => throw new ParseException("Empty list is not a valid expression", s)
case SList(_) => throw new ParseException("Invalid expression format", s)
case _ => throw new ParseException("Invalid expression type", s)
}
Advanced Compiler Concepts and Runtime Details
Liveness Analysis Formula
Liveness: The set of variables live before a program point k
is defined by:
L_before(k) = (L_after(k) - W(k)) ∪ R(k)
Where L_after(k)
are variables live after k
, W(k)
are variables written (defined) at k
, and R(k)
are variables read (used) at k
.
Interference Graph Construction
Interference Graph: Used in register allocation to represent variables that cannot reside in the same register.
- For a move instruction (
movq s, d
): For every variablev
inL_after(k)
, ifv ≠ d
andv ≠ s
, add an edge(d, v)
. - For other instructions: For every variable
d
defined (written) atk
(d ∈ W(k)
) and every variablev
live afterk
(v ∈ L_after(k)
), ifv ≠ d
, add an edge(d, v)
.
Indirect Jumps and Function Calls
Indirect Jump: An instruction like leaq label(%rip), dest
stores the instruction-pointer-relative address of label
at dest
, which can then be used with call *dest
for an indirect function call.
Before a Function Call (Caller Setup)
- Caller pushes live caller-saved registers onto the stack.
- Parameters are passed via registers or the stack.
- The Stack Pointer (SP) is set to point to the last data item on the stack.
Call Prelude (Callee Setup)
- The callee pushes the old Base Pointer (BP) onto the stack.
- The new BP is set to the current SP.
- SP is incremented to allocate space for spills (local variables or temporary values).
- Callee saves any used callee-saved registers onto the stack.
- Control jumps to the function body label.
After a Function Call (Caller Cleanup)
- The Stack Pointer (SP) is adjusted to deallocate parameters and local stack space.
- Callee-saved registers are restored from the stack.
- The caller's Base Pointer (BP) is restored.
- Control returns via
retq
. - The caller reinstates caller-saved registers from the stack.
Intra-procedural vs. Inter-procedural Analysis
Intra-procedural Analysis: Analysis performed within the scope of a single function or procedure.
Inter-procedural Analysis: Analysis performed across multiple functions or procedures, considering their interactions.
Example of inter-procedural data flow (phi function for merging values): i₂ = ⌀/phi (i₁, i₃);
Dominator Nodes in Control Flow Graphs
Dominator: A node d
dominates node n
if every path from the start node s
to n
must pass through d
.
Switch Tail Explication (Scala Example)
Switch Tail Explication: A compilation pass that transforms switch expressions into a sequence of conditional jumps (IfStmt
) and unconditional jumps (Goto
), effectively flattening the control flow.
def explicateTail(e: Expr, blocks: ListBuffer[(String, Tail)]): Tail = e match {
case Num(i) => Return(Num(i))
case Prim(op, es) => Return(Prim(op, es))
case Switch(c, branches, default) => {
val defaultLabel = gensym("block")
val defaultTail = explicateTail(default, blocks)
blocks += ((defaultLabel, defaultTail))
var currentTail: Tail = Goto(defaultLabel)
branches.reverse.foreach { case (caseNum, expr) =>
val trueLabel = gensym("block")
val trueTail = explicateTail(expr, blocks)
blocks += ((trueLabel, trueTail))
currentTail = IfStmt(EqC(), c, caseNum, Goto(trueLabel), currentTail)
}
currentTail
}
}
def explicateProgram(p: Lswitch): Cswitch = p match {
case Lswitch.Program(e) =>
val blocks = ListBuffer[(String, Tail)]()
CProgram(List(("start", explicateTail(e, blocks))) ++ blocks)
}
Register Usage Conventions
Caller-saved Registers: Registers whose values must be preserved by the caller across a function call if the caller needs them after the call. These include %rax
, %rcx
, %rdx
, %rsi
, %rdi
, %r8
, %r9
, %r10
, %r11
.
Callee-saved Registers: Registers whose values must be preserved by the callee across a function call. If the callee uses them, it must save and restore their original values. These include %rsp
, %rbp
, %rbx
, %r12
, %r13
, %r14
, %r15
.
Compilation Passes Pipeline
A typical sequence of compilation passes:
- Uniquify: Ensures all variable names are unique.
- Remove Complex Operands (RCO): Simplifies expressions to atomic operands.
- Explicate Control: Transforms high-level control structures (like
switch
) into explicit jumps and conditional branches. - Select Instructions: Maps intermediate representation operations to target machine instructions.
- Assign Home: Determines where variables will reside (registers or stack).
- Patch Instructions: Fills in addresses and other details after initial instruction selection.
- Prelude & Conclusion: Adds function entry and exit code (e.g., stack frame setup/teardown, register saving/restoring).
Dataflow Analysis Techniques
Reaching Definitions Analysis (Forward)
- Goal: Identify which assignments (e.g.,
x := ...
) might reach a specific program point. - Dataflow Value: A set of definitions (e.g.,
{(x, l1), (y, l2)}
, wherel1
andl2
are labels of definition sites). - Equations:
IN(l) = ∪ {OUT(p) | for all predecessors p of l}
OUT(l) = GEN(l) ∪ (IN(l) \ KILL(l))
- GEN/KILL: For an assignment
x := ...
at labell
:GEN(l) = {(x, l)}
KILL(l) =
all other definitions of variablex
in the program.
- Algorithm: Initialize
OUT(entry) = {}
and iterate until no further changes occur (fixed point).
Live Variable Analysis (Backward)
- Goal: Determine which variables might be used later (are "live") at a specific program point.
- Dataflow Value: A set of variables (e.g.,
{x, y}
). - Equations:
OUT(l) = ∪ {IN(s) | for all successors s of l}
IN(l) = USE(l) ∪ (OUT(l) \ DEF(l))
- DEF/USE: For a program statement at label
l
:DEF(l) =
variables defined (written) inl
.USE(l) =
variables read (used) inl
.
- Algorithm: Initialize
IN(exit) = {}
and iterate (typically in reverse control flow order) until reaching a fixed point.