Cache Mapping, Virtual Memory and I/O Techniques in Computer Architecture

Posted by Anonymous and classified in Computers

Written on January 31, 2026 in English with a size of 27.48 KB

Q1. Cache Mapping Techniques

Cache memory is a small, fast memory placed between the CPU and main memory to speed up memory access. Mapping techniques determine how blocks from main memory are mapped to cache lines.

Three Types of Cache Mapping

1. Direct Mapping

In direct mapping, each block of main memory maps to exactly one specific cache line. The cache line number is determined by the formula:

Cache Line Number = (Block Address) mod (Number of Cache Lines)

Address format: The memory address is divided into three fields:

Tag: Identifies which block is currently stored.
Index: Identifies the cache line number.
Block offset: Identifies the word within the block.

Advantages:

Simple and easy to implement
Fast access time
Low-cost hardware

Disadvantages:

High miss rate due to conflicts
If two frequently used blocks map to the same line, they repeatedly replace each other (thrashing)
Poor cache utilization in worst cases

Example: If cache has 8 lines, then block 0, 8, 16, 24... all map to cache line 0.

2. Fully Associative Mapping

In fully associative mapping, any block from main memory can be placed in any cache line. There is no restriction on placement.

Address format: The memory address is divided into two fields:

Tag: Contains the block address.
Block offset: Identifies the word within the block.

Advantages:

Lowest miss rate
No thrashing problem
Best cache utilization and flexibility

Disadvantages:

Most expensive to implement
Slowest search time (must search all cache lines)
Complex hardware required (comparators for each line)

3. Set-Associative Mapping

Set-associative mapping is a compromise between direct and fully associative mapping. The cache is divided into sets; each block maps to a specific set but can be placed in any line within that set.

Set Number = (Block Address) mod (Number of Sets)

In a k-way set-associative cache, each set contains k cache lines.

Address format:

Tag: Identifies the block within a set.
Set: Identifies which set.
Block offset: Identifies the word within the block.

Advantages:

Better hit rate than direct mapping
Less expensive than fully associative
Balanced performance and cost; reduced thrashing

Disadvantages:

More complex than direct mapping
Higher cost than direct mapping
Requires a replacement algorithm within each set

Example: In 2-way set-associative mapping, each set has two lines. A block can be placed in either of the two lines within its designated set.

Comparison Summary

Feature	Direct	Associative	Set-Associative
Flexibility	Low	High	Medium
Cost	Low	High	Medium
Speed	Fast	Slow	Medium
Hit Rate	Low	High	Medium

Q2. Virtual Memory and Page Address Mapping

Definition: Virtual memory is a memory management technique that allows execution of programs that are not completely loaded in the physical main memory. It creates an illusion of a very large main memory by using disk storage as an extension of RAM.

Basic concept: The logical (virtual) address space is divided into fixed-size blocks called pages, and physical memory is divided into blocks of the same size called frames. The OS maintains a page table to map virtual addresses to physical addresses.

Address Mapping Using Pages

Components:

Virtual address: Generated by CPU (larger address space).
Physical address: Actual location in main memory (smaller address space).
Page table: Contains mapping information.

Address formats:

Virtual address = Page number + Page offset.
Physical address = Frame number + Page offset (page offset unchanged).

Mapping process:

CPU generates a virtual address.
Page number is extracted from the virtual address.
Page table is accessed using the page number as an index.
If valid bit = 1: the page is in memory and the frame number is retrieved.
If valid bit = 0: a page fault occurs and the page must be loaded from disk.
Physical address is formed by concatenating the frame number and page offset.

Page table entry contains: frame number, valid bit, dirty bit, reference bit, and protection bits (read/write/execute).

Benefits of Virtual Memory

Larger address space: Programs can be larger than physical memory.
Efficient memory utilization: Only required pages are loaded into RAM.
Multiprogramming: Multiple programs run simultaneously even if combined size exceeds physical memory.
Protection and security: Each process has isolated address space.
Simplified memory management: Programmers need not manage physical memory placement.
Relocation: Programs can be loaded anywhere without modification.
Sharing: Processes can map common pages to the same frames.
Demand paging: Pages loaded only when needed, reducing initial load time.

Page Fault Handling

CPU generates page fault interrupt.
OS suspends the current process.
OS locates the page on disk.
OS finds a free frame or evicts a page if memory is full.
Page is loaded from disk to the frame.
Page table is updated and the process resumes execution.

Disadvantages: Page fault handling is time-consuming, thrashing can occur with excessive faults, disk space is required for swap, and additional hardware (MMU, TLB) is needed.

Q3. Memory Hierarchy: Speed, Size, and Cost

Introduction to Memory Hierarchy

Memory hierarchy organizes memory in a pyramid structure based on speed, cost, and capacity. Its goal is to provide the processor with fast data access while keeping cost reasonable and maintaining large storage capacity. Two guiding principles are:

Principle of locality: Programs access a small part of their address space at any time.
Cost-performance trade-off: Faster memories are more expensive per byte.

Levels of Memory Hierarchy (Top to Bottom)

CPU Registers
- Location: Inside the CPU
- Speed: Fastest (sub-nanosecond to nanoseconds)
- Size: Smallest (tens of registers)
- Cost: Most expensive per bit
- Technology: Flip-flops
- Managed by: Compiler/programmer
- Purpose: Store immediate operands and intermediate results
Cache Memory
- Location: Between CPU and main memory
- Speed: Very fast (1–10 ns)
- Size: Small (L1: 32–64KB, L2: 256KB–2MB, L3: 4–32MB)
- Cost: Very expensive
- Technology: SRAM
- Managed by: Hardware
- Purpose: Store frequently accessed data and instructions
Main Memory (RAM)
- Location: On motherboard
- Speed: Moderate (tens of ns)
- Size: Medium (GBs)
- Cost: Moderate
- Technology: DRAM
- Volatile: Yes
- Managed by: Operating system
Secondary Storage (HDD, SSD)
- Location: Peripheral devices
- Speed: Slow (ms for HDD, sub-ms for SSD)
- Size: Large (hundreds of GB to TB)
- Cost: Cheaper per GB
- Non-volatile: Yes
Tertiary Storage (Tape, Optical)
- Location: Offline/removable storage
- Speed: Very slow (seconds to minutes)
- Size: Very large (archival)
- Cost: Lowest per GB
- Use: Backup and archival

Level	Speed	Size	Cost/Bit	Volatility
Registers	Fastest	Smallest	Highest	Volatile
Cache	Very Fast	Small	Very High	Volatile
Main Memory	Fast	Medium	Moderate	Volatile
Secondary	Slow	Large	Low	Non-volatile
Tertiary	Slowest	Largest	Lowest	Non-volatile

Working Principle

Inclusion property: Data at level i is likely to be present at level i+1 (lower level).
Data movement: Data moves between adjacent levels in blocks; smaller blocks at higher levels, larger blocks at lower levels.
Access pattern: CPU requests data from the highest level; on miss, request moves down and data is copied up when found, producing hits or misses.

Advantages: Balanced performance, cost-effectiveness, scalability, transparency to programmers, and efficiency by exploiting locality.

Performance metric example: Average access time = (Hit rate × Cache access time) + (Miss rate × Main memory access time). If hit rate = 90%, cache time = 10 ns, memory time = 100 ns, average = (0.9×10) + (0.1×100) = 19 ns.

Q4. RAM vs ROM and SRAM vs DRAM

RAM vs ROM Comparison

Feature	RAM (Random Access Memory)	ROM (Read-Only Memory)
Full form	Random Access Memory	Read-Only Memory
Volatility	Volatile (data lost when power off)	Non-volatile (data retained)
Operations	Read and write	Read only (usually written once)
Speed	Faster	Slower than RAM
Cost	More expensive	Less expensive
Usage	Temporary storage for running programs	Permanent storage for firmware/BIOS

Explanation: RAM stores data and programs currently being executed and loses data on power off. ROM contains permanent instructions needed to start the computer and retains data without power.

SRAM vs DRAM Comparison

Feature	SRAM (Static RAM)	DRAM (Dynamic RAM)
Storage element	Flip-flops (6 transistors)	Capacitor + transistor (1T1C)
Speed	Faster (1–10 ns)	Slower (50–100 ns)
Refresh	No refresh required	Requires periodic refresh
Density	Lower	Higher
Cost	More expensive	Less expensive
Usage	Cache memory (L1, L2, L3)	Main memory (RAM)

Why SRAM for cache and DRAM for main memory? Cache needs speed, so SRAM is used. Main memory needs capacity, so DRAM is used because it is denser and cheaper, providing an optimal cost-performance balance.

Q5. Cache Coherence and Solutions

Definition of Cache Coherence

Cache coherence is a problem in multiprocessor systems where each processor has its own cache. Inconsistency arises when one processor modifies a cached copy of a memory location while other processors still hold stale copies.

Problem Scenario

Processor P1 reads variable X (X = 5) and caches it. Processor P2 also reads and caches X = 5. If P1 modifies X to 10 in its cache, P2's cache still contains X = 5, and main memory may also show 5. This inconsistency can cause incorrect program behavior.

Solutions to Cache Coherence

Write-through policy: Every write to cache is immediately written to main memory. Advantage: simple. Disadvantage: slow due to frequent memory writes.
Write-back with invalidation protocol: When a processor writes, the cache line is marked Modified and other caches with copies are sent an invalidation signal so they invalidate their copies. This reduces memory traffic.
Write-back with update protocol: Updated values are broadcast to all other caches so they update their copies. Advantage: avoids refetching; disadvantage: high bus traffic.
MESI protocol: Each cache line is in one of four states: Modified, Exclusive, Shared, or Invalid. State transitions maintain coherence. MESI with bus snooping is common in modern processors.
Snooping protocol: All caches monitor the shared bus and take action when writes occur. Disadvantage: does not scale well to many processors.
Directory-based protocol: A central directory keeps track of which caches have copies and notifies only those caches on changes. Advantage: scalable; disadvantage: directory overhead.

Conclusion: Cache coherence is essential for correct multiprocessor operation. Hardware solutions like MESI (with snooping) or directory-based protocols are commonly used to balance correctness and performance.

Q6. DMA: Definition and Operation

Definition

DMA (Direct Memory Access) is a method that allows an I/O device to transfer data directly to or from main memory without continuous CPU intervention. A DMA controller manages transfers so the CPU can perform other tasks.

Need for DMA

In programmed I/O and interrupt-driven I/O, the CPU is involved in every data transfer, which wastes CPU time for high-speed devices. DMA transfers data directly between memory and device, freeing the CPU.

DMA Controller Components

Address Register (AR): Holds the memory address for transfer and auto-increments.
Word Count Register (WC): Holds the number of words to transfer and decrements each time; transfer completes when WC = 0.
Control Register (CR): Stores mode information (read/write, enable, interrupt enable, transfer mode).
Data Register (DR): Temporary buffer for data.
Control logic: Manages bus request/acknowledge, read/write signals, addressing and counting.

DMA Transfer Process

Initialization (CPU): CPU writes starting address, count, and operation type to DMA registers.
Data transfer (DMA controller): I/O device requests DMA; DMA issues Bus Request; CPU grants the bus; DMA controls the bus and transfers data between memory and device while updating AR and WC.
Completion: DMA releases the bus and interrupts the CPU if enabled; CPU resumes control.

Why Read/Write Lines Are Bidirectional

DMA can perform both memory-to-device (memory read then device write) and device-to-memory (device read then memory write) operations. Control lines must support both directions so DMA can signal read/write operations appropriately.

DMA Transfer Modes

Burst mode: DMA transfers the entire block in one burst; CPU blocked during transfer; fastest.
Cycle stealing: DMA takes one bus cycle at a time; CPU slows slightly but remains operational.
Transparent mode: DMA transfers only when CPU is not using the bus; no CPU delay but slowest for DMA.

Advantages and Disadvantages

Advantages: High-speed transfer, CPU freed for other tasks, efficient block transfers.
Disadvantages: Additional hardware cost, bus contention, greater complexity, limited DMA channels.

Q7. Programmed I/O vs Interrupt-Driven I/O

Programmed I/O (Polling)

Definition: CPU continuously polls the device status to determine readiness for data transfer. It executes a loop checking the status flag.

Working: CPU issues I/O command, polls the status register, and transfers data when the device is ready.

Interrupt-Driven I/O

Definition: CPU initiates an I/O operation and continues other tasks. The device sends an interrupt when ready; the CPU invokes an interrupt service routine (ISR) to handle transfer.

Working: CPU issues I/O command, continues work; device signals interrupt when ready; CPU saves context, executes ISR, transfers data, restores context, and resumes.

Detailed Comparison

Aspect	Programmed I/O	Interrupt-Driven I/O
CPU involvement	CPU continuously polls device	CPU works on other tasks until interrupted
CPU efficiency	Very inefficient	Highly efficient
Speed	Slow	Fast (multitasking)
Complexity	Simple	More complex (interrupt hardware and software)
Hardware required	Minimal	Interrupt controller, priority circuits
Software complexity	Simple polling loop	Requires ISRs and context switching
Power consumption	High (polling)	Lower (idle between interrupts)
Suitable for	Simple/slow devices (keyboard, basic sensors)	Fast devices and multitasking systems (disks, network cards)

Interrupt Handling Process

Device sends interrupt request.
CPU acknowledges after completing current instruction.
CPU saves context (PC, registers) on stack.
Interrupt source identified.
ISR executes and services the device.
ISR clears interrupt and restores context; CPU resumes previous task.

Q8. Daisy Chaining and Parallel Priority Interrupts

When multiple devices generate interrupts simultaneously, a priority mechanism decides which device is serviced first. Two common hardware methods are daisy chaining and parallel priority (priority encoder).

Daisy Chain Priority

Daisy chaining connects devices in series. Priority is determined by device position in the chain: the device closest to the CPU has the highest priority. This method is simple and inexpensive but has serial propagation delay and single-point failure risk.

Parallel Priority Interrupt

In parallel priority, each device has an independent interrupt line and all lines are fed to a priority encoder. The encoder determines the highest-priority active interrupt in parallel. This is faster and more flexible (maskable priorities) but requires more hardware.

Feature	Daisy Chain	Parallel Priority
Hardware complexity	Simple (serial)	Complex (encoder)
Speed	Slower (serial propagation)	Faster (parallel detection)
Number of lines	One common + INTA chain	One line per device
Flexibility	Fixed by position	Programmable via masks
Scalability	Easy to add devices	Limited by encoder size
Reliability	Low (chain break affects all)	High (independent lines)

Q9. Modes of Data Transfer

Introduction

Data transfer modes manage movement between CPU, memory, and I/O devices to optimize CPU utilization, transfer speed, and efficiency. Common modes include programmed I/O, interrupt-driven I/O, DMA, and I/O processors (IOP).

Mode 1: Programmed I/O

CPU actively controls data transfer and polls the device status. It is simple but inefficient and suitable only for slow or dedicated systems.

Mode 2: Interrupt-Driven I/O

CPU initiates I/O and continues other work. The device interrupts when ready and CPU services the interrupt via ISR. This improves CPU efficiency and is suitable for moderate-speed devices.

Mode 3: Direct Memory Access (DMA)

DMA controller transfers data directly between device and memory with minimal CPU involvement (initialization only). DMA supports burst, cycle stealing, and transparent modes and is ideal for high-speed block transfers (disks, network cards).

Mode 4: I/O Processor (IOP) / Channel

An IOP is a dedicated processor that executes I/O programs independently. It provides the highest offload of CPU and supports complex, concurrent I/O operations (used in mainframes and high-performance servers).

Comparison of Modes

Aspect	Programmed I/O	Interrupt I/O	DMA	IOP
CPU utilization	Very poor	Moderate	Good	Excellent
Transfer speed	Slowest	Moderate	Fast	Very fast
Hardware complexity	Lowest	Simple	Moderate	Highest
Suitable for	Slow devices	Moderate devices	Fast devices	Many fast devices

Synchronous vs Asynchronous Transfer

Synchronous: Uses a common clock for sender and receiver; data transferred at fixed intervals; high speed; used on buses and DMA links.

Asynchronous: No common clock; uses handshaking (ready/acknowledge or strobe); slower but flexible; used for serial communication (UART).

Q10. Addressing Modes

Definition

An addressing mode specifies how an instruction accesses its operand or computes the effective address. Addressing modes provide flexibility, reduce code size, and enable diverse programming techniques.

Common Addressing Modes

Immediate: Operand is part of the instruction. Example: MOV A, #50. Fast but limited range.
Direct (Absolute): Instruction contains the memory address. Example: MOV A, 500.
Indirect: Instruction points to a memory location that contains the effective address. Example: MOV A, @500.
Register: Operand is in a processor register. Example: MOV A, R1.
Register indirect: A register contains the memory address. Example: MOV A, @R1.
Indexed: Effective address = base + index register. Example: MOV A, 100(R1). Useful for arrays.
Relative (PC-relative): Effective address = PC + offset. Example: JMP +10. Good for position-independent code.
Auto-increment: Use register then increment it automatically. Example: MOV A, (R1)+.
Auto-decrement: Decrement register then use it as address. Useful for stack operations.

RISC vs CISC Comparison

Feature	RISC (Reduced Instruction Set Computer)	CISC (Complex Instruction Set Computer)
Instruction set	Small, simple instructions	Large, complex instructions
Instruction format	Fixed length	Variable length
Memory access	Load–store architecture (memory via LOAD/STORE)	Instructions can directly access memory
Examples	ARM, MIPS, RISC-V	Intel x86, VAX

Booth's Algorithm for Multiplication

Booth's algorithm multiplies signed binary numbers (2's complement) efficiently by examining pairs of bits and reducing operations when the multiplier has consecutive 1s.

Basic idea: Examine (Q0, Q-1) pairs and perform add/subtract or no operation, then arithmetic right-shift the accumulator and multiplier. Repeat for n bits.

Algorithm steps and an example multiplying +5 × −3 are retained in full detail above, demonstrating how Booth's method reduces the number of additions/subtractions.

IEEE 754 Floating Point Representation

IEEE-754 defines single (32-bit) and double (64-bit) precision formats using Sign, Exponent, and Fraction (mantissa) fields. The value for normalized numbers is:

Value = (−1)^Sign × 1.Fraction × 2^(Exponent − Bias)

Field	Meaning	Single (32-bit)	Double (64-bit)
Sign	0 = positive, 1 = negative	1 bit	1 bit
Exponent	Determines range	8 bits	11 bits
Fraction (mantissa)	Determines precision	23 bits	52 bits
Bias	Used to store exponent as unsigned	127	1023

Carry Look-Ahead Adder (CLA)

A Carry Look-Ahead Adder computes carries in parallel using generate (G_i) and propagate (P_i) signals to eliminate ripple delay. The carry equations for a 4-bit CLA are:

Carry Look-Ahead Equations

C1 = G0 + (P0 · C0)

C2 = G1 + (P1 · G0) + (P1 · P0 · C0)

C3 = G2 + (P2 · G1) + (P2 · P1 · G0) + (P2 · P1 · P0 · C0)

C4 = G3 + (P3 · G2) + (P3 · P2 · G1) + (P3 · P2 · P1 · G0) + (P3 · P2 · P1 · P0 · C0)

These carries are computed simultaneously, reducing addition delay compared with ripple-carry adders.

Synchronous Data Transfer

Definition

Synchronous data transfer uses a common clock signal for sender and receiver and transfers data in a continuous, clock-synchronized stream.

Key Features

Common clock: Ensures precise coordination.
Continuous data flow: Data sent in frames or blocks without start/stop bits.
Higher speed: No overhead bits; efficient for high-speed links.
Well-suited for: Computer buses, networks, and DMA transfers.

How it works

Clock is generated by transmitter or via a separate clock line.
Both sides sample data on clock edges.
Data valid only during specified clock intervals.

Advantages and Disadvantages

Advantages: High throughput, no start/stop bits, accurate timing.
Disadvantages: Requires clock synchronization; additional clock line cost; alignment errors if clock drifts.

Applications

System buses, DMA transfers
Ethernet and other high-speed links (when synchronous modes are used)
Modems and dedicated synchronous links

Related entries:

Tags: