操作系统核心原理 — I/O 与文件系统

用生活化的比喻，让你理解五种 I/O 模型、epoll、inode、日志文件系统——这些是理解 Node.js/Netty 事件循环的底层机制

前置知识：第03章进程与线程 + 第04章内存管理（理解进程和虚拟内存）

阅读指南（初学者必看）

为什么你需要学习 I/O 与文件系统？

你每天都在用 I/O 模型——Node.js 的事件循环基于 epoll，Netty 基于 I/O 多路复用，但你不理解底层机制：

Node.js 怎么做到"单线程处理万级并发"？epoll 是什么？
阻塞 I/O 和非阻塞 I/O 的区别是什么？
为什么游戏服务器用 epoll 而不是 select？

学完本章，你能回答：

五种 I/O 模型的区别是什么？为什么 I/O 多路复用最重要？
select、poll、epoll 的区别是什么？为什么 epoll 性能最好？
文件系统的 inode 是什么？日志文件系统为什么能防断电损坏？

本文结构

第一部分：五种 I/O 模型（理解 I/O 的本质）
第二部分：多路复用详解（select/poll/epoll）
第三部分：异步 I/O 与完成端口
第四部分：文件系统原理（理解文件是怎么存储的）
第五部分：缓存与优化

5.1 五种 I/O 模型

生活类比：去餐厅点餐。

1. 阻塞 I/O（Blocking I/O）
   点餐后站在柜台等 → 啥也不干直到餐好了
   → 简单但浪费等待时间

2. 非阻塞 I/O（Non-blocking I/O）
   点餐后不断问"好了吗？" → 反复轮询
   → 不阻塞但浪费 CPU

3. I/O 多路复用（I/O Multiplexing）⭐ 最重要
   点餐后拿个叫号器，同时可以干别的 → 叫号器响了再去取
   → select/poll/epoll，一个线程管理多个连接
   → Netty、Node.js、Nginx 都用这种方式

4. 信号驱动 I/O（Signal-driven I/O）
   点餐后留电话 → 餐好了打电话通知
   → 内核发信号通知，适合 UDP

5. 异步 I/O（Asynchronous I/O）
   点外卖 → 骑手直接送到家 → 你全程不用管
   → 真正的异步，Windows IOCP 支持，Linux 不完善

完整的 I/O 操作分两步

1. 等待数据准备好（从磁盘/网络读取到内核缓冲区）
2. 将数据从内核拷贝到用户空间

不同 I/O 模型的区别主要在这两步的处理方式上

5.2 I/O 多路复用详解

select

class SelectModel {
  constructor(maxFds = 1024) {
    this.maxFds = maxFds;
    this.readFds = new Set();
    this.writeFds = new Set();
    this.callbacks = new Map();
  }

  addReadFd(fd, callback) {
    this.readFds.add(fd);
    this.callbacks.set(`read_${fd}`, callback);
  }

  select(timeout = null) {
    let readyRead = [];
    for (let fd of this.readFds) {
      if (this.isReady(fd, 'read')) {
        readyRead.push(fd);
      }
    }
    return { read: readyRead };
  }
}

poll

class PollModel {
  constructor() {
    this.fds = new Map();
  }
  add(fd, events, callback) {
    this.fds.set(fd, { events, callback, revents: 0 });
  }
  poll(timeout = 0) {
    let ready = [];
    for (let [fd, info] of this.fds) {
      if (this.checkRead(fd)) {
        ready.push({ fd, callback: info.callback });
      }
    }
    return ready;
  }
}

epoll（Linux 最高效的 I/O 多路复用）

// epoll（Linux 最高效的 I/O 多路复用）

// 1. 创建 epoll 实例
int epfd = epoll_create1(0);

// 2. 注册感兴趣的 fd
struct epoll_event ev;
ev.events = EPOLLIN;  // 关注可读事件
ev.data.fd = server_fd;
epoll_ctl(epfd, EPOLL_CTL_ADD, server_fd, &ev);

// 3. 等待事件
struct epoll_event events[MAX_EVENTS];
int n = epoll_wait(epfd, events, MAX_EVENTS, timeout);

// 4. 处理就绪的 fd
for (int i = 0; i < n; i++) {
  if (events[i].data.fd == server_fd) {
    // 新连接
    int client_fd = accept(server_fd, ...);
    // 注册 client_fd
  } else {
    // 已有连接的数据
    read(events[i].data.fd, buffer, ...);
  }
}

select vs poll vs epoll：

| | select | poll | epoll |
|---|---|---|---|
| 最大连接数 | 1024 | 无限制 | 无限制 |
| 性能 | O(n) | O(n) | O(1) |
| 每次调用 | 传入所有 fd | 传入所有 fd | 只返回就绪的 fd |
| 适用场景 | 少量连接 | 中等连接 | 大量连接（游戏服务器）|

游戏服务器为什么用 epoll？
- 10万连接，select 每次要遍历10万个fd
- epoll 只返回就绪的fd，不遍历
- Netty 底层在 Linux 上用 epoll

epoll 的 O(1) 是怎么实现的？ epoll 用红黑树管理所有注册的 fd，用双向链表管理就绪的 fd。当 fd 有事件时，内核通过回调函数把 fd 加入就绪链表。epoll_wait 只需返回就绪链表中的 fd，不需要遍历所有 fd。所以性能和连接数无关，只和活跃连接数有关。

5.3 异步 I/O 与完成端口

Proactor 模式

class Proactor {
  constructor() {
    this.operations = new Map();
    this.opId = 0;
  }

  asyncRead(fd, buffer, callback) {
    let opId = this.opId++;
    this.operations.set(opId, {
      type: 'read', fd, buffer, callback, status: 'pending'
    });
    this.simulateAsyncRead(opId);
    return opId;
  }

  simulateAsyncRead(opId) {
    setTimeout(() => {
      let op = this.operations.get(opId);
      if (op) {
        op.status = 'completed';
        op.bytesRead = Math.floor(Math.random() * 1024) + 1;
        op.callback(op);
        this.operations.delete(opId);
      }
    }, Math.random() * 500);
  }
}

完成端口（Windows IOCP）

class CompletionPort {
  constructor(numWorkers = 4) {
    this.workers = [];
    this.taskQueue = [];
    this.completionQueue = [];
    this.running = true;
    for (let i = 0; i < numWorkers; i++) {
      this.workers.push(this.createWorker(i));
    }
  }

  postTask(task) {
    return new Promise((resolve, reject) => {
      this.taskQueue.push({ ...task, resolve, reject });
      this.dispatchTasks();
    });
  }

  dispatchTasks() {
    while (this.taskQueue.length > 0) {
      let worker = this.workers.find(w => !w.busy);
      if (!worker) break;
      let task = this.taskQueue.shift();
      worker.busy = true;
      this.executeTask(worker, task);
    }
  }

  async executeTask(worker, task) {
    try {
      let result = await task.execute();
      task.resolve(result);
    } catch (error) {
      task.reject(error);
    } finally {
      worker.busy = false;
      this.dispatchTasks();
    }
  }
}

5.4 文件系统原理

inode（索引节点）── 文件的"身份证"
├── 文件大小
├── 所有者（uid/gid）
├── 权限（rwx）
├── 时间戳（创建/修改/访问）
├── 数据块位置指针（直接 + 间接）
└── 链接计数

目录 = inode 号 → 文件名的映射表

读取文件的流程：
文件名 → 查目录项 → 得到 inode 号 → 读 inode → 找到数据块 → 读取数据

文件分配方式

// 连续分配
class ContiguousAllocation {
  constructor(totalBlocks) {
    this.disk = new Array(totalBlocks).fill(null);
    this.files = new Map();
  }
  createFile(name, size) {
    let startBlock = this.findFreeBlocks(size);
    if (startBlock === -1) throw new Error('Not enough space');
    this.files.set(name, { name, startBlock, size });
    for (let i = startBlock; i < startBlock + size; i++) {
      this.disk[i] = name;
    }
    return startBlock;
  }
}

// 索引分配（现代文件系统常用）
class IndexedAllocation {
  constructor(totalBlocks, blockSize = 1024) {
    this.disk = new Array(totalBlocks).fill(null);
    this.files = new Map();
    this.freeBlocks = new Set();
    for (let i = 0; i < totalBlocks; i++) this.freeBlocks.add(i);
  }
  createFile(name, data) {
    let numBlocks = Math.ceil(data.length / this.blockSize);
    let blocks = this.allocateBlocks(numBlocks);
    this.files.set(name, { name, indexBlock: blocks, size: data.length });
    return blocks;
  }
}

日志文件系统

为什么需要日志？
- 突然断电可能导致文件系统不一致
- 比如删除文件：1) 删除目录项 2) 释放 inode 3) 释放数据块
- 如果在步骤2断电，inode 被删但数据块没释放 → 空间泄漏

日志文件系统：
- 先把要做的操作写入日志（Write-Ahead Log）
- 然后执行实际操作
- 崩溃后重放日志恢复一致性

→ 这和数据库的 WAL（Write-Ahead Log）原理完全一样！
→ MySQL 的 InnoDB redo log 也是这个原理

5.5 缓存与优化

文件缓存

class FileCache {
  constructor(maxSize = 100 * 1024 * 1024) {
    this.cache = new Map();
    this.maxSize = maxSize;
    this.currentSize = 0;
    this.lruList = [];
  }

  get(path) {
    if (!this.cache.has(path)) return null;
    this.updateLRU(path);
    return this.cache.get(path).data;
  }

  set(path, data) {
    let size = this.getSize(data);
    while (this.currentSize + size > this.maxSize && this.lruList.length > 0) {
      this.evict();
    }
    if (this.cache.has(path)) {
      this.currentSize -= this.cache.get(path).size;
    }
    this.cache.set(path, { data, size, timestamp: Date.now() });
    this.currentSize += size;
    this.updateLRU(path);
  }

  updateLRU(path) {
    let index = this.lruList.indexOf(path);
    if (index !== -1) this.lruList.splice(index, 1);
    this.lruList.push(path);
  }

  evict() {
    let oldest = this.lruList.shift();
    if (oldest && this.cache.has(oldest)) {
      this.currentSize -= this.cache.get(oldest).size;
      this.cache.delete(oldest);
    }
  }
}

mmap

Q：mmap 是什么？为什么能提升 I/O 性能？ A：mmap 把文件映射到进程的虚拟地址空间，读写内存 = 读写文件。优势：1）避免 read/write 的内核空间↔用户空间拷贝（零拷贝）；2）由操作系统的页面置换管理文件缓存，不需要应用层自己管理。游戏常用 mmap 加载大文件（如资源包、地图数据）。

自问自答

Q：为什么 I/O 多路复用比阻塞 I/O 高效？ A：阻塞 I/O 的问题在于"一个线程只能等一个 I/O"——如果有 1 万个连接，就需要 1 万个线程，每个线程占 8MB 栈 = 80GB 内存！I/O 多路复用一个线程就能管理所有连接，epoll_wait 只返回就绪的 fd，不需要遍历。核心区别：阻塞 I/O 是"一客一服务员"，I/O 多路复用是"一个服务员管所有桌"。

Q：为什么 Linux 的异步 I/O（AIO）不完善？ A：Linux 的 AIO 只支持 O_DIRECT 模式（绕过页缓存），普通文件 I/O 不支持。而且 AIO 的 API 设计不够优雅（需要用 io_submit/io_getevents）。Windows 的 IOCP 是真正的完成端口模型，更成熟。所以 Linux 上通常用 epoll + 非阻塞 I/O 模拟异步。

Q：日志文件系统和数据库的 WAL 有什么关系？ A：原理完全一样！都是"先写日志，再执行操作，崩溃后重放日志恢复"。ext4/XFS 的日志保护文件系统一致性，MySQL InnoDB 的 redo log 保护数据库事务的持久性。本质上都是 Write-Ahead Log 思想。

实践任务

任务1：用 strace 跟踪一个 Node.js 程序的系统调用，观察 epoll_wait 的调用模式
任务2：编写一个简单的 epoll 服务器，处理多客户端连接
任务3：对比阻塞 I/O 和 epoll 的并发性能（用 ab 或 wrk 压测）
任务4：用 iostat 观察 I/O 性能，理解 IOPS 和吞吐量的关系
任务5：用 stat 命令查看文件的 inode 信息，理解 inode 的结构

与其他章节的关联

本章内容	关联章节	关联点
I/O 多路复用	第03章进程与线程	事件循环基于 I/O 多路复用，避免多线程
内存映射文件	第04章内存管理	mmap 基于分页机制，页面置换管理文件缓存
epoll	第06章 TCP/IP	epoll 管理 TCP 连接的读写事件
日志文件系统	3_1_java-backend-deep-dive	WAL 在数据库和文件系统中的原理相同

上一章：04-操作系统内存管理 | 下一章：06-TCPIP协议栈深度