编译原理深度理解

用生活化的比喻，让你彻底理解编译器的核心流程：词法分析 → 语法分析 → 语义分析 → IR → 优化 → 代码生成

前置知识：2_1_v8Learn（V8 编译流水线基础）

阅读指南（初学者必看）

为什么你需要学习编译原理？

你每天都在用编译器的产物——V8 引擎把 JavaScript 编译成机器码，TypeScript 编译器把 TS 转成 JS，Babel 把 ES6+ 转成 ES5。但你不知道它们内部是怎么工作的：

V8 的 Scanner、Parser、Ignition、TurboFan 各自做了什么？
TypeScript 的类型检查是怎么实现的？
Babel 的 AST 转换是什么意思？

学完本章，你能回答：

编译器把源代码变成机器码经过了哪些步骤？
词法分析和语法分析的区别是什么？
V8 的编译流水线和编译原理是怎么对应的？
优化 Pass 做了什么？常量折叠、死代码消除是什么？

本文结构

第一部分：为什么要学编译原理（建立学习动机）
第二部分：编译器全景图（理解编译流程的宏观结构）
第三部分：词法分析（字符流 → Token 流）
第四部分：语法分析（Token 流 → AST）
第五部分：语义分析与类型检查（AST → 语义信息）
第六部分：中间表示 IR（平台无关的中间代码）
第七部分：优化 Pass（精简和改进代码）
第八部分：目标代码生成与寄存器分配（IR → 机器码）

1.1 为什么要学编译原理？

生活类比：编译原理就像汽车发动机原理。你不需要会造发动机才能开车，但发动机坏了，懂原理的修理工10分钟找到问题，不懂的只能一个个零件换着试。

直接关联你已学的知识：

编译原理概念	你已经见过的应用
词法分析	V8 的 Scanner（`src/parsing/scanner.cc`）
语法分析	V8 的 Parser（`src/parsing/parser.cc`）
中间表示（IR）	V8 的 Bytecode（Ignition）、Turboshaft IR
优化 Pass	V8 的 TurboFan 优化流水线
代码生成	V8 的 CodeGenerator（生成机器码）
类型推断	TypeScript 编译器
树遍历	Babel 的 AST 转换

学完编译原理，再看 V8 编译流水线，就像看一幅拼图的完整图案。

1.2 编译器全景图

源代码（JavaScript / Java / C++）
  │
  ▼
┌─────────── 词法分析（Lexing）──────────┐
│  输入：字符流                            │
│  输出：Token 流                          │
│  "let x = 1 + 2" → [let, x, =, 1, +, 2] │
└────────────────┬────────────────────────┘
                 │
                 ▼
┌─────────── 语法分析（Parsing）──────────┐
│  输入：Token 流                          │
│  输出：抽象语法树（AST）                  │
│  构建树形结构，表达代码的语法关系          │
└────────────────┬────────────────────────┘
                 │
                 ▼
┌─────────── 语义分析（Semantic）─────────┐
│  类型检查、作用域解析、符号表构建          │
│  "x 未声明就使用" → 报错                  │
└────────────────┬────────────────────────┘
                 │
                 ▼
┌─────────── 中间表示生成（IR）───────────┐
│  AST → IR（三地址码 / SSA / 字节码）      │
│  平台无关的中间表示                        │
└────────────────┬────────────────────────┘
                 │
                 ▼
┌─────────── 优化 Pass ──────────────────┐
│  常量折叠、死代码消除、内联、循环优化      │
│  多遍扫描，逐步优化                       │
└────────────────┬────────────────────────┘
                 │
                 ▼
┌─────────── 目标代码生成 ───────────────┐
│  IR → 机器码（x86 / ARM / Wasm）        │
│  寄存器分配、指令选择、指令调度            │
└─────────────────────────────────────────┘

1.3 词法分析（Lexical Analysis）

生活类比：词法分析就像"切词"。你读英文句子 "I love cats"，你的大脑会自动切成三个词：I / love / cats。词法分析器做的就是这件事——把字符流切成有意义的Token。

Token 是什么

Token = 词法单元，是源代码的最小有意义单位

例如：
- 关键字：let, const, function, if, else
- 标识符：变量名、函数名
- 字面量：数字、字符串
- 运算符：+, -, *, /, =, ==
- 分隔符：(, ), {, }, ;, ,

手写词法分析器

// 简易 JavaScript 词法分析器（完整版）
class Lexer {
  constructor(source) {
    this.source = source;
    this.pos = 0;
    this.line = 1;
    this.column = 1;
    this.tokens = [];
  }

  currentChar() {
    if (this.pos >= this.source.length) return null;
    return this.source[this.pos];
  }

  advance() {
    if (this.currentChar() === '\n') {
      this.line++;
      this.column = 1;
    } else {
      this.column++;
    }
    this.pos++;
  }

  skipWhitespace() {
    while (this.currentChar() && /\s/.test(this.currentChar())) {
      this.advance();
    }
  }

  skipComment() {
    if (this.currentChar() === '/' && this.peek() === '/') {
      while (this.currentChar() && this.currentChar() !== '\n') {
        this.advance();
      }
    }
  }

  peek() {
    if (this.pos + 1 >= this.source.length) return null;
    return this.source[this.pos + 1];
  }

  readNumber() {
    let start = this.column;
    let numStr = '';
    while (this.currentChar() && /\d/.test(this.currentChar())) {
      numStr += this.currentChar();
      this.advance();
    }
    // 小数部分
    if (this.currentChar() === '.' && /\d/.test(this.peek())) {
      numStr += '.';
      this.advance();
      while (this.currentChar() && /\d/.test(this.currentChar())) {
        numStr += this.currentChar();
        this.advance();
      }
    }
    return { type: 'NUMBER', value: parseFloat(numStr), line: this.line, column: start };
  }

  readIdentifier() {
    let start = this.column;
    let idStr = '';
    while (this.currentChar() && /[a-zA-Z0-9_]/.test(this.currentChar())) {
      idStr += this.currentChar();
      this.advance();
    }
    const keywords = {
      'let': 'LET', 'const': 'CONST', 'var': 'VAR',
      'function': 'FUNCTION', 'return': 'RETURN',
      'if': 'IF', 'else': 'ELSE', 'while': 'WHILE', 'for': 'FOR'
    };
    let type = keywords[idStr] || 'IDENTIFIER';
    return { type, value: idStr, line: this.line, column: start };
  }

  readString() {
    let start = this.column;
    let quote = this.currentChar();
    this.advance();
    let str = '';
    while (this.currentChar() && this.currentChar() !== quote) {
      if (this.currentChar() === '\\') {
        this.advance();
        switch (this.currentChar()) {
          case 'n': str += '\n'; break;
          case 't': str += '\t'; break;
          case '\\': str += '\\'; break;
          case '"': str += '"'; break;
          case "'": str += "'"; break;
          default: str += this.currentChar();
        }
      } else {
        str += this.currentChar();
      }
      this.advance();
    }
    this.advance(); // 跳过结束引号
    return { type: 'STRING', value: str, line: this.line, column: start };
  }

  tokenize() {
    while (this.pos < this.source.length) {
      this.skipWhitespace();
      this.skipComment();
      if (!this.currentChar()) break;

      let char = this.currentChar();
      let start = this.column;

      if (/\d/.test(char)) {
        this.tokens.push(this.readNumber());
        continue;
      }

      if (/[a-zA-Z_]/.test(char)) {
        this.tokens.push(this.readIdentifier());
        continue;
      }

      if (char === '"' || char === "'") {
        this.tokens.push(this.readString());
        continue;
      }

      this.advance();
      switch (char) {
        case '+': this.tokens.push({ type: 'PLUS', value: '+', line: this.line, column: start }); break;
        case '-': this.tokens.push({ type: 'MINUS', value: '-', line: this.line, column: start }); break;
        case '*': this.tokens.push({ type: 'STAR', value: '*', line: this.line, column: start }); break;
        case '/': this.tokens.push({ type: 'SLASH', value: '/', line: this.line, column: start }); break;
        case '=':
          if (this.currentChar() === '=') {
            this.advance();
            this.tokens.push({ type: 'EQUAL_EQUAL', value: '==', line: this.line, column: start });
          } else {
            this.tokens.push({ type: 'EQUAL', value: '=', line: this.line, column: start });
          }
          break;
        case '(': this.tokens.push({ type: 'LPAREN', value: '(', line: this.line, column: start }); break;
        case ')': this.tokens.push({ type: 'RPAREN', value: ')', line: this.line, column: start }); break;
        case '{': this.tokens.push({ type: 'LBRACE', value: '{', line: this.line, column: start }); break;
        case '}': this.tokens.push({ type: 'RBRACE', value: '}', line: this.line, column: start }); break;
        case ';': this.tokens.push({ type: 'SEMICOLON', value: ';', line: this.line, column: start }); break;
        case ',': this.tokens.push({ type: 'COMMA', value: ',', line: this.line, column: start }); break;
        default:
          throw new Error(`Unexpected character: ${char} at ${this.line}:${start}`);
      }
    }
    this.tokens.push({ type: 'EOF', value: null, line: this.line, column: this.column });
    return this.tokens;
  }
}

// 测试
const lexer = new Lexer('let x = 1 + 2');
console.log(lexer.tokenize());
// [
//   { type: 'LET', value: 'let', line: 1, column: 1 },
//   { type: 'IDENTIFIER', value: 'x', line: 1, column: 5 },
//   { type: 'EQUAL', value: '=', line: 1, column: 7 },
//   { type: 'NUMBER', value: 1, line: 1, column: 9 },
//   { type: 'PLUS', value: '+', line: 1, column: 11 },
//   { type: 'NUMBER', value: 2, line: 1, column: 13 },
//   { type: 'EOF', value: null, line: 1, column: 14 }
// ]

正则表达式方式（更实用的做法）：

// 用正则表达式定义 Token 规则
const tokenSpecs = [
  ['NUMBER',   /\d+(\.\d+)?/],     // 数字
  ['IDENT',    /[a-zA-Z_]\w*/],     // 标识符
  ['OP',       /[+\-*/=]/],         // 运算符
  ['LPAREN',   /\(/],               // 左括号
  ['RPAREN',   /\)/],               // 右括号
  ['SKIP',     /[ \t\n]+/],         // 空白（跳过）
  ['MISMATCH', /./],                // 其他（报错）
];

function tokenize(code) {
  const tokens = [];
  let pos = 0;
  while (pos < code.length) {
    let match = null;
    for (const [type, regex] of tokenSpecs) {
      const m = code.slice(pos).match(new RegExp('^' + regex.source));
      if (m) {
        match = { type, value: m[0] };
        break;
      }
    }
    if (!match) throw new Error(`Unexpected at ${pos}`);
    if (match.type !== 'SKIP') tokens.push(match);
    pos += match.value.length;
  }
  return tokens;
}

和 V8 的关系：V8 的 Scanner（src/parsing/scanner.cc）做的就是这件事，只不过它支持完整的 ECMAScript 语法（模板字符串、正则、BigInt 等），代码量大约 3000 行。

1.4 语法分析（Syntax Analysis / Parsing）

生活类比：语法分析就像"理解句子结构"。切词只告诉你有哪些词，语法分析告诉你词之间的关系——"主语 + 谓语 + 宾语"。

两种分析方法

自顶向下（Top-Down）：从根节点开始，逐步展开
  - 递归下降（Recursive Descent）── 最直观，V8 用的就是这种
  - 预测分析（Predictive Parsing）

自底向上（Bottom-Up）：从叶子节点开始，逐步归约
  - 移进-归约（Shift-Reduce）
  - LR / LALR / SLR

AST 节点类型

const NodeType = {
  Program: 'Program',
  VariableDeclaration: 'VariableDeclaration',
  VariableDeclarator: 'VariableDeclarator',
  FunctionDeclaration: 'FunctionDeclaration',
  Identifier: 'Identifier',
  Literal: 'Literal',
  BinaryExpression: 'BinaryExpression',
  UnaryExpression: 'UnaryExpression',
  CallExpression: 'CallExpression',
  IfStatement: 'IfStatement',
  WhileStatement: 'WhileStatement',
  ReturnStatement: 'ReturnStatement',
  BlockStatement: 'BlockStatement'
};

手写递归下降解析器

// 简易表达式解析器：支持 +, -, *, / 和括号
// 语法规则（文法）：
//   expr   → term (('+' | '-') term)*
//   term   → factor (('*' | '/') factor)*
//   factor → NUMBER | '(' expr ')'

class Parser {
  constructor(tokens) {
    this.tokens = tokens;
    this.pos = 0;
  }

  peek() { return this.tokens[this.pos]; }
  consume() { return this.tokens[this.pos++]; }

  expect(type) {
    const token = this.consume();
    if (token.type !== type) throw new Error(`Expected ${type}, got ${token.type}`);
    return token;
  }

  match(...types) {
    return types.includes(this.peek()?.type);
  }

  // expr → term (('+' | '-') term)*
  parseExpr() {
    let left = this.parseTerm();
    while (this.peek() && this.match('PLUS', 'MINUS')) {
      const op = this.consume().value;
      const right = this.parseTerm();
      left = { type: 'BinaryExpression', op, left, right };
    }
    return left;
  }

  // term → factor (('*' | '/') factor)*
  parseTerm() {
    let left = this.parseFactor();
    while (this.peek() && this.match('STAR', 'SLASH')) {
      const op = this.consume().value;
      const right = this.parseFactor();
      left = { type: 'BinaryExpression', op, left, right };
    }
    return left;
  }

  // factor → NUMBER | '(' expr ')'
  parseFactor() {
    const token = this.peek();
    if (token.type === 'NUMBER') {
      this.consume();
      return { type: 'Literal', value: token.value };
    }
    if (token.type === 'LPAREN') {
      this.consume(); // 吃掉 '('
      const expr = this.parseExpr();
      this.expect('RPAREN'); // 吃掉 ')'
      return expr;
    }
    throw new Error(`Unexpected token: ${JSON.stringify(token)}`);
  }

  parse() {
    return this.parseExpr();
  }
}

// 测试：解析 "1 + 2 * 3"
const tokens = [
  { type: 'NUMBER', value: 1 },
  { type: 'PLUS', value: '+' },
  { type: 'NUMBER', value: 2 },
  { type: 'STAR', value: '*' },
  { type: 'NUMBER', value: 3 },
  { type: 'EOF', value: null }
];
const parser = new Parser(tokens);
const ast = parser.parse();
console.log(JSON.stringify(ast, null, 2));
// {
//   "type": "BinaryExpression",
//   "op": "+",
//   "left": { "type": "Literal", "value": 1 },
//   "right": {
//     "type": "BinaryExpression",
//     "op": "*",
//     "left": { "type": "Literal", "value": 2 },
//     "right": { "type": "Literal", "value": 3 }
//   }
// }
// 注意：2 * 3 先结合，因为 parseTerm 的优先级高于 parseExpr
// 这就是运算符优先级在解析器中的实现方式！

关键理解：运算符优先级不是硬编码的规则，而是文法规则的层次结构决定的。term 的层次比 expr 深，所以 * 和 / 比 + 和 - 先结合。

Pratt 解析器（处理复杂运算符优先级）

const Precedence = {
  LOWEST: 1,
  OR: 2,           // ||
  AND: 3,          // &&
  EQUALS: 4,       // == !=
  LESSGREATER: 5,  // < > <= >=
  SUM: 6,          // + -
  PRODUCT: 7,      // * /
  PREFIX: 8,       // -X !X
  CALL: 9          // myFunction(X)
};

自问自答：

Q：为什么 V8 选择手写递归下降而不是用解析器生成器？ A：手写解析器有几个优势：1）错误信息更友好（可以精确定位和给出修改建议）；2）性能更好（可以手动优化热点路径）；3）更灵活（JavaScript 的语法有很多歧义需要特殊处理，如 / 既是除号也是正则开头）。

Q：TypeScript 编译器也是手写的吗？ A：是的，TypeScript 编译器（src/compiler/parser.ts）也是手写递归下降。整个 TS 编译器的代码大约 5 万行，其中解析器约 3000 行。

1.5 语义分析与类型检查

生活类比：语法分析检查"句子结构是否正确"，语义分析检查"这句话有没有意义"。就像"无色的绿色想法在疯狂睡觉"——语法正确，但语义不对。

符号表

// 符号表（Symbol Table）── 记录所有声明的变量和函数
class Symbol {
  constructor(name, type, scope) {
    this.name = name;
    this.type = type;
    this.scope = scope;
    this.references = [];
    this.definition = null;
  }
}

class SymbolTable {
  constructor(parent = null) {
    this.symbols = new Map(); // name → Symbol
    this.parent = parent;     // 作用域链（和 JS 的作用域链一样！）
    this.children = [];
  }

  define(name, type) {
    if (this.symbols.has(name)) {
      throw new Error(`Symbol ${name} already defined`);
    }
    let symbol = new Symbol(name, type, this);
    this.symbols.set(name, symbol);
    return symbol;
  }

  lookup(name) {
    if (this.symbols.has(name)) return this.symbols.get(name);
    if (this.parent) return this.parent.lookup(name); // 向上查找
    return null; // 未找到 → ReferenceError
  }

  lookupLocal(name) {
    return this.symbols.get(name) || null;
  }

  pushScope() {
    let child = new SymbolTable(this);
    this.children.push(child);
    return child;
  }

  popScope() {
    return this.parent;
  }
}

类型检查器

const TypeKind = {
  NUMBER: 'number', STRING: 'string', BOOLEAN: 'boolean',
  NULL: 'null', UNDEFINED: 'undefined', ANY: 'any',
  ARRAY: 'array', OBJECT: 'object', FUNCTION: 'function'
};

class Type {
  constructor(kind, properties = {}) {
    this.kind = kind;
    this.properties = properties;
  }
  isAssignableTo(other) {
    if (this.kind === TypeKind.ANY || other.kind === TypeKind.ANY) return true;
    return this.kind === other.kind;
  }
}

class TypeChecker {
  constructor() {
    this.currentScope = new SymbolTable();
    this.errors = [];
  }

  check(ast) {
    this.visitNode(ast);
    return this.errors;
  }

  visitNode(node) {
    let methodName = `check${node.type}`;
    if (this[methodName]) {
      return this[methodName](node);
    }
    return new Type(TypeKind.ANY);
  }

  checkLiteral(node) {
    if (typeof node.value === 'number') return new Type(TypeKind.NUMBER);
    if (typeof node.value === 'string') return new Type(TypeKind.STRING);
    if (typeof node.value === 'boolean') return new Type(TypeKind.BOOLEAN);
    return new Type(TypeKind.ANY);
  }

  checkIdentifier(node) {
    let symbol = this.currentScope.lookup(node.name);
    if (!symbol) {
      this.errors.push({
        type: 'UNDEFINED_VARIABLE',
        message: `Variable '${node.name}' is not defined`,
        node
      });
      return new Type(TypeKind.ANY);
    }
    return new Type(symbol.type);
  }

  checkBinaryExpression(node) {
    let leftType = this.visitNode(node.left);
    let rightType = this.visitNode(node.right);

    switch (node.op) {
      case '+':
        if (leftType.kind === TypeKind.STRING || rightType.kind === TypeKind.STRING) {
          return new Type(TypeKind.STRING);
        }
        if (leftType.kind === TypeKind.NUMBER && rightType.kind === TypeKind.NUMBER) {
          return new Type(TypeKind.NUMBER);
        }
        this.errors.push({
          type: 'TYPE_MISMATCH',
          message: `Operator '+' cannot be applied to types '${leftType.kind}' and '${rightType.kind}'`,
          node
        });
        return new Type(TypeKind.ANY);

      case '-': case '*': case '/': case '%':
        if (leftType.kind !== TypeKind.NUMBER || rightType.kind !== TypeKind.NUMBER) {
          this.errors.push({
            type: 'TYPE_MISMATCH',
            message: `Operator '${node.op}' requires numeric operands`,
            node
          });
        }
        return new Type(TypeKind.NUMBER);

      case '==': case '===': case '!=': case '!==':
        return new Type(TypeKind.BOOLEAN);

      case '<': case '>': case '<=': case '>=':
        if (leftType.kind !== TypeKind.NUMBER || rightType.kind !== TypeKind.NUMBER) {
          this.errors.push({ type: 'TYPE_MISMATCH', message: `Comparison requires numeric operands`, node });
        }
        return new Type(TypeKind.BOOLEAN);

      case '&&': case '||':
        return new Type(TypeKind.BOOLEAN);

      default:
        return new Type(TypeKind.ANY);
    }
  }

  checkCallExpression(node) {
    let calleeType = this.visitNode(node.callee);
    if (calleeType.kind !== TypeKind.FUNCTION) {
      this.errors.push({
        type: 'NOT_CALLABLE',
        message: `Expression of type '${calleeType.kind}' is not callable`,
        node
      });
      return new Type(TypeKind.ANY);
    }
    return calleeType.returnType || new Type(TypeKind.ANY);
  }
}

TypeScript 的类型系统就是语义分析的典型应用。TS 编译器的 checker.ts 有约 2 万行代码，负责类型推断、类型兼容性检查等。

1.6 中间表示（IR）

生活类比：IR 就像"世界语"。源语言有很多种（JS、Java、C++），目标平台也有很多种（x86、ARM、Wasm）。如果直接从源语言翻译到目标平台，需要 M×N 种翻译器。引入 IR 后，只需要 M+N 种翻译器。

JavaScript ─┐
            ├─→ IR ─┬─→ x86 机器码
TypeScript ─┤       ├─→ ARM 机器码
            │       ├─→ Wasm
Java ───────┘       └─→ ...

常见 IR 形式：

IR 类型	示例	使用者
三地址码	`t1 = a + b`	大部分编译器教材
SSA（静态单赋值）	`x1 = a + b; x2 = x1 * c`	LLVM IR、V8 TurboFan
字节码	`LDA a; ADD b; STAR t1`	V8 Ignition、JVM
CPS（续延传递）	`add(a, b, λt1. mul(t1, c, λx. ...))`	函数式语言编译器
Sea of Nodes	图结构，节点 = 操作，边 = 数据流	V8 TurboFan（旧版）
Turboshaft IR	基本块 + 操作列表	V8 Turboshaft（新版）

中间代码生成器

const OpCode = {
  LOAD: 'LOAD', STORE: 'STORE',
  ADD: 'ADD', SUB: 'SUB', MUL: 'MUL', DIV: 'DIV', MOD: 'MOD',
  NEG: 'NEG', NOT: 'NOT',
  CMP: 'CMP', JMP: 'JMP', JZ: 'JZ', JNZ: 'JNZ',
  CALL: 'CALL', RET: 'RET',
  PARAM: 'PARAM', ARG: 'ARG',
  LABEL: 'LABEL'
};

class IntermediateCodeGenerator {
  constructor() {
    this.instructions = [];
    this.tempCount = 0;
    this.labelCount = 0;
  }

  newTemp() { return `t${this.tempCount++}`; }
  newLabel() { return `L${this.labelCount++}`; }

  emit(op, ...args) {
    this.instructions.push({ op, args });
  }

  generate(ast) {
    this.visitNode(ast);
    return this.instructions;
  }

  visitNode(node) {
    let methodName = `visit${node.type}`;
    if (this[methodName]) {
      return this[methodName](node);
    }
    return null;
  }

  visitLiteral(node) {
    let temp = this.newTemp();
    this.emit(OpCode.LOAD, temp, node.value);
    return temp;
  }

  visitIdentifier(node) {
    return node.name;
  }

  visitBinaryExpression(node) {
    let left = this.visitNode(node.left);
    let right = this.visitNode(node.right);
    let result = this.newTemp();

    switch (node.op) {
      case '+': this.emit(OpCode.ADD, result, left, right); break;
      case '-': this.emit(OpCode.SUB, result, left, right); break;
      case '*': this.emit(OpCode.MUL, result, left, right); break;
      case '/': this.emit(OpCode.DIV, result, left, right); break;
      case '%': this.emit(OpCode.MOD, result, left, right); break;
    }
    return result;
  }

  visitIfStatement(node) {
    let test = this.visitNode(node.test);
    let elseLabel = this.newLabel();
    let endLabel = this.newLabel();

    this.emit(OpCode.JZ, test, elseLabel);
    this.visitNode(node.consequent);
    this.emit(OpCode.JMP, endLabel);
    this.emit(OpCode.LABEL, elseLabel);

    if (node.alternate) {
      this.visitNode(node.alternate);
    }
    this.emit(OpCode.LABEL, endLabel);
  }

  visitWhileStatement(node) {
    let startLabel = this.newLabel();
    let endLabel = this.newLabel();

    this.emit(OpCode.LABEL, startLabel);
    let test = this.visitNode(node.test);
    this.emit(OpCode.JZ, test, endLabel);
    this.visitNode(node.body);
    this.emit(OpCode.JMP, startLabel);
    this.emit(OpCode.LABEL, endLabel);
  }
}

// 测试：1 + 2 * 3
// AST:  BinaryExpression(+, 1, BinaryExpression(*, 2, 3))
// const irGen = new IntermediateCodeGenerator();
// const result = irGen.generate(ast);
// console.log(irGen.instructions);
// [
//   { op: 'MUL', args: ['t0', 2, 3] },   // t0 = 2 * 3
//   { op: 'ADD', args: ['t1', 1, 't0'] } // t1 = 1 + t0
// ]

1.7 优化 Pass

生活类比：优化就像"精简文章"。初稿可能啰嗦，优化 Pass 逐个检查，删掉废话、合并同类、提前计算。

优化名称	做了什么	示例
常量折叠	编译期计算常量表达式	`3 + 5` → `8`
常量传播	已知常量直接替换	`x=5; y=x+1` → `y=6`
死代码消除	删除不可能执行或结果未使用的代码	`if(false){...}` → 删除
内联（Inlining）	把函数调用替换为函数体	`add(1,2)` → `1+2`
循环不变量外提	把循环内不变的计算移到循环外	`for(i){a=bc}` → `t=bc;for(i){a=t}`
公共子表达式消除	相同计算只做一次	`a=bc; d=bc` → `t=b*c; a=t; d=t`
尾调用优化	尾递归改为循环	递归 → 迭代，不增加栈

// 常量折叠 + 常量传播示例
// 优化前
function compute() {
  const PI = 3.14159;
  const r = 10;
  const area = PI * r * r;  // 每次运行都算
  return area;
}

// 优化后（常量折叠 + 常量传播）
function compute() {
  return 314.159;  // 编译期就算好了！
}

// 死代码消除示例
// 优化前
function debug(x) {
  console.log(x);  // 生产环境不需要
  return x * 2;
}

// 优化后（如果 DEBUG=false）
function debug(x) {
  return x * 2;    // console.log 被消除
}

// 循环不变量外提
// 优化前
for (let i = 0; i < n; i++) {
  let x = a * b;  // a*b 不依赖 i，每次循环都算
  arr[i] = x + i;
}

// 优化后
let t = a * b;    // 提到循环外
for (let i = 0; i < n; i++) {
  arr[i] = t + i;
}

和 V8 的关系：V8 的 TurboFan 就是一个多 Pass 优化编译器。它把字节码转成 Turboshaft IR，然后跑一系列优化 Pass（内联、逃逸分析、类型简化等），最后生成机器码。你在 2_1 项目中学的"投机性优化"和"去优化"，本质上就是优化 Pass + 回退机制。

1.8 目标代码生成与寄存器分配

生活类比：寄存器分配就像"厨房台面"。厨房台面（寄存器）有限，你需要在有限的台面上放当前要用的食材。放不下时就得放到冰箱（内存）里，需要时再拿出来。

寄存器分配问题：
- CPU 寄存器有限（x86-64 只有 16 个通用寄存器）
- 变量可能比寄存器多
- 需要决定哪些变量放在寄存器，哪些放在内存

解决方法：图着色算法
- 每个变量是一个节点
- 如果两个变量同时活跃，用边连接
- 用 K 种颜色给图着色（K = 寄存器数）
- 相邻节点颜色不同 → 不冲突
- 着色失败 → 需要溢出（spill）到内存

简单汇编生成器

class AssemblyGenerator {
  constructor() {
    this.registers = ['eax', 'ebx', 'ecx', 'edx', 'esi', 'edi'];
    this.registerMap = new Map();
    this.currentReg = 0;
  }

  generate(instructions) {
    let asm = [];
    asm.push('.section .text');
    asm.push('.globl _main');
    asm.push('_main:');

    for (let inst of instructions) {
      asm.push(this.generateInstruction(inst));
    }
    asm.push('    ret');
    return asm.join('\n');
  }

  getRegister(varName) {
    if (!this.registerMap.has(varName)) {
      this.registerMap.set(varName, this.registers[this.currentReg++]);
    }
    return this.registerMap.get(varName);
  }

  generateInstruction(inst) {
    switch (inst.op) {
      case 'LOAD':
        return `    movl $${inst.args[1]}, %${this.getRegister(inst.args[0])}`;
      case 'STORE':
        return `    movl %${this.getRegister(inst.args[1])}, %${this.getRegister(inst.args[0])}`;
      case 'ADD':
        let addDest = this.getRegister(inst.args[0]);
        let addLeft = this.getRegister(inst.args[1]);
        let addRight = this.getRegister(inst.args[2]);
        if (addDest !== addLeft) {
          return `    movl %${addLeft}, %${addDest}\n    addl %${addRight}, %${addDest}`;
        }
        return `    addl %${addRight}, %${addDest}`;
      case 'JMP':
        return `    jmp ${inst.args[0]}`;
      case 'LABEL':
        return `${inst.args[0]}:`;
      default:
        return `    # ${inst.op}`;
    }
  }
}

WebAssembly 生成器

class WasmGenerator {
  constructor() {
    this.functions = [];
    this.locals = [];
    this.code = [];
  }

  generate(ast) {
    this.functions = [];
    for (let node of ast.body) {
      if (node.type === 'FunctionDeclaration') {
        this.generateFunction(node);
      }
    }
    return this.buildModule();
  }

  generateFunction(node) {
    this.locals = [];
    this.code = [];
    let params = node.params.map(p => {
      this.locals.push(p.name);
      return 'i32';
    });
    this.visitNode(node.body);
    this.functions.push({
      name: node.id.name,
      params,
      result: ['i32'],
      locals: this.locals.slice(params.length),
      code: this.code
    });
  }

  visitNode(node) {
    let methodName = `visit${node.type}`;
    if (this[methodName]) {
      return this[methodName](node);
    }
  }

  visitLiteral(node) {
    this.code.push(`i32.const ${node.value}`);
  }

  visitBinaryExpression(node) {
    this.visitNode(node.left);
    this.visitNode(node.right);
    switch (node.operator) {
      case '+': this.code.push('i32.add'); break;
      case '-': this.code.push('i32.sub'); break;
      case '*': this.code.push('i32.mul'); break;
      case '/': this.code.push('i32.div_s'); break;
    }
  }

  buildModule() {
    let lines = ['(module'];
    for (let func of this.functions) {
      lines.push(`  (func $${func.name}`);
      if (func.params.length > 0) {
        lines.push(`    (param ${func.params.join(' ')})`);
      }
      lines.push(`    (result ${func.result.join(' ')})`);
      for (let line of func.code) {
        lines.push(`    ${line}`);
      }
      lines.push('  )');
    }
    lines.push(')');
    return lines.join('\n');
  }
}

自问自答

Q：词法分析和语法分析的区别是什么？ A：词法分析把字符流切成 Token（"切词"），语法分析把 Token 组装成 AST（"理解句子结构"）。词法分析不关心 Token 之间的关系，语法分析才关心。

Q：为什么需要 IR？直接从 AST 生成机器码不行吗？ A：理论上可以，但 IR 有两个关键优势：1）解耦前端和后端，支持 M 种源语言 × N 种目标平台只需 M+N 个翻译器；2）IR 更适合做优化，很多优化在 AST 上很难做（如常量传播需要在指令层面分析）。

Q：V8 的 TurboFan 和 LLVM 的优化有什么区别？ A：核心思想相同，都是多 Pass 优化编译器。区别在于：TurboFan 是 JIT 编译器，需要权衡编译时间和运行性能；LLVM 是 AOT 编译器，可以花更多时间优化。TurboFan 有"投机性优化"——假设某种类型，运行时发现假设错误就去优化。

Q：编译器前端的"前端"和后端的"后端"分别指什么？ A：前端 = 和源语言相关的部分（词法分析、语法分析、语义分析），后端 = 和目标平台相关的部分（代码生成、寄存器分配）。IR 是前端的输出、后端的输入，是两者的分界线。

Q：Babel 是编译器吗？ A：是的！Babel 是一个"源到源"编译器（transpiler）。它的前端（parser）把 JS 解析成 AST，中间（transform plugins）对 AST 做转换，后端（generator）把 AST 转回 JS 代码。只是它没有 IR 和优化 Pass，目标语言也是 JS。

实践任务

任务1：手写一个支持变量赋值和四则运算的词法分析器（扩展本章的 Lexer，支持 let、=、标识符）
任务2：手写一个递归下降语法分析器，支持带括号的四则运算表达式，输出 AST
任务3：实现三地址码 IR 生成器，从 AST 生成三地址码指令列表
任务4：实现常量折叠优化 Pass，对生成的 IR 做常量折叠和死代码消除
任务5：用 tsc --emitDeclarationOnly 编译一个 TS 项目，对比源码和编译产物，理解 TS 编译流程

与其他章节的关联

本章内容	关联章节	关联点
词法分析+语法分析	第02章链接器与加载器	编译器产出目标文件，链接器把目标文件链接成可执行文件
优化 Pass	第03章进程与线程	编译器优化的线程模型影响运行时调度
IR + 代码生成	2_1_v8Learn	V8 的 Ignition/TurboFan 就是编译原理的工业级实现
语义分析	第09章网络安全	类型安全是安全的基础

上一章：学习路线图 | 下一章：02-链接器与加载器