File System Journaling: Mechanisms, ext3, and NTFS Recovery
Journaling Motivation and Necessity
File System Check (FSck) ensures metadata consistency after crashes but is slow and requires deep file system knowledge. Recovery time should ideally depend on the number of recent writes.
File System Transactions and ACID Properties
Transactions provide ACID guarantees:
- Atomicity
- Consistency
- Isolation
- Durability
These are used to treat file system operations (like file creation) as transactions. Recovery ensures committed transactions are applied and uncommitted ones are discarded.
ext3 Journaling File System
ext3 is a journaling file system using physical redo logging, adding journaling to existing ext2 structures.
Redo Logging Mechanism in ext3
The process involves writing updates to a journal first, then committing the transaction, and then performing in-place writes. Replay on crash recovery is idempotent.
Physical Block Logging Details
Entire physical blocks are logged even for small updates (e.g., inode updates).
ext3 Write Strategies and Protocol
Common strategies include:
- Serial writes: Safe but slow.
- Simultaneous writes: Fast but risky due to reordering.
ext3 uses a staged protocol: write everything except Transaction End (TxEnd), then write TxEnd, then checkpoint.
Journal Structure and Management
The journal is a circular buffer with a superblock recording the start and end. Entries are deallocated after checkpoint.
ext3 Journaling Modes
- Data Mode
- Logs both data and metadata (most consistent, highest cost).
- Ordered Mode (Default)
- Logs metadata; data is written in-place before journaling.
- Writeback Mode
- No data logging; fastest but carries the risk of junk data post-crash.
Transaction Batching
ext3 collects multiple updates into one transaction using an in-memory dirty block list to reduce redundant updates.
Summary of ext3 Operation (Ordered Mode)
- Write data.
- Journal Transaction Begin (TxBegin) plus metadata.
- Write Transaction End (TxEnd).
- Checkpoint metadata.
- Update journal superblock.
Advanced Logging Techniques
Redo Logging
Logs new values first, commits, then updates in-place. Replay redoes committed transactions.
Undo Logging
Logs undo instructions first, then updates in-place, then commits. Recovery undoes uncommitted transactions.
Combined Redo and Undo Logging
This combines benefits: redo lets commits happen before in-place updates; undo allows flushing dirty blocks early. Recovery involves a forward redo pass and a backward undo pass.
NTFS Journaling and Recovery
NTFS, the Windows file system, uses redo plus undo logging, journaling metadata only. It supports file compression and encryption. Special files like $MFT, $LogFile, and $Bitmap contain critical metadata.
NTFS Operation Logging
NTFS uses operation logging (e.g., “set bit in bitmap”) with smaller log entries than ext3. Each file system operation gets its own transaction. Sub-operations include redo, undo, and a link to the previous operation.
NTFS Crash Recovery Process
First, redo all sub-operations (even for uncommitted transactions), then undo only those from uncommitted transactions.
Rationale for Two Recovery Passes
Why both passes? Because log entries might have hit the disk before the crash. If the transaction didn’t commit, it must be undone.
English with a size of 3.75 KB