Cache Mapping, Virtual Memory and I/O Techniques in Computer Architecture
Q1. Cache Mapping Techniques
Cache memory is a small, fast memory placed between the CPU and main memory to speed up memory access. Mapping techniques determine how blocks from main memory are mapped to cache lines.
Three Types of Cache Mapping
1. Direct Mapping
In direct mapping, each block of main memory maps to exactly one specific cache line. The cache line number is determined by the formula:
Cache Line Number = (Block Address) mod (Number of Cache Lines)
Address format: The memory address is divided into three fields:
- Tag: Identifies which block is currently stored.
- Index: Identifies the cache line number.
- Block offset: Identifies the word within the block.
Advantages:
- Simple and easy to implement
- Fast access time
- Low-cost hardware
Disadvantages:
- High miss rate due to conflicts
- If two frequently used blocks map to the same line, they repeatedly replace each other (thrashing)
- Poor cache utilization in worst cases
Example: If cache has 8 lines, then block 0, 8, 16, 24... all map to cache line 0.
2. Fully Associative Mapping
In fully associative mapping, any block from main memory can be placed in any cache line. There is no restriction on placement.
Address format: The memory address is divided into two fields:
- Tag: Contains the block address.
- Block offset: Identifies the word within the block.
Advantages:
- Lowest miss rate
- No thrashing problem
- Best cache utilization and flexibility
Disadvantages:
- Most expensive to implement
- Slowest search time (must search all cache lines)
- Complex hardware required (comparators for each line)
3. Set-Associative Mapping
Set-associative mapping is a compromise between direct and fully associative mapping. The cache is divided into sets; each block maps to a specific set but can be placed in any line within that set.
Set Number = (Block Address) mod (Number of Sets)
In a k-way set-associative cache, each set contains k cache lines.
Address format:
- Tag: Identifies the block within a set.
- Set: Identifies which set.
- Block offset: Identifies the word within the block.
Advantages:
- Better hit rate than direct mapping
- Less expensive than fully associative
- Balanced performance and cost; reduced thrashing
Disadvantages:
- More complex than direct mapping
- Higher cost than direct mapping
- Requires a replacement algorithm within each set
Example: In 2-way set-associative mapping, each set has two lines. A block can be placed in either of the two lines within its designated set.
Comparison Summary
| Feature | Direct | Associative | Set-Associative |
|---|---|---|---|
| Flexibility | Low | High | Medium |
| Cost | Low | High | Medium |
| Speed | Fast | Slow | Medium |
| Hit Rate | Low | High | Medium |
Q2. Virtual Memory and Page Address Mapping
Definition: Virtual memory is a memory management technique that allows execution of programs that are not completely loaded in the physical main memory. It creates an illusion of a very large main memory by using disk storage as an extension of RAM.
Basic concept: The logical (virtual) address space is divided into fixed-size blocks called pages, and physical memory is divided into blocks of the same size called frames. The OS maintains a page table to map virtual addresses to physical addresses.
Address Mapping Using Pages
Components:
- Virtual address: Generated by CPU (larger address space).
- Physical address: Actual location in main memory (smaller address space).
- Page table: Contains mapping information.
Address formats:
- Virtual address = Page number + Page offset.
- Physical address = Frame number + Page offset (page offset unchanged).
Mapping process:
- CPU generates a virtual address.
- Page number is extracted from the virtual address.
- Page table is accessed using the page number as an index.
- If valid bit = 1: the page is in memory and the frame number is retrieved.
- If valid bit = 0: a page fault occurs and the page must be loaded from disk.
- Physical address is formed by concatenating the frame number and page offset.
Page table entry contains: frame number, valid bit, dirty bit, reference bit, and protection bits (read/write/execute).
Benefits of Virtual Memory
- Larger address space: Programs can be larger than physical memory.
- Efficient memory utilization: Only required pages are loaded into RAM.
- Multiprogramming: Multiple programs run simultaneously even if combined size exceeds physical memory.
- Protection and security: Each process has isolated address space.
- Simplified memory management: Programmers need not manage physical memory placement.
- Relocation: Programs can be loaded anywhere without modification.
- Sharing: Processes can map common pages to the same frames.
- Demand paging: Pages loaded only when needed, reducing initial load time.
Page Fault Handling
- CPU generates page fault interrupt.
- OS suspends the current process.
- OS locates the page on disk.
- OS finds a free frame or evicts a page if memory is full.
- Page is loaded from disk to the frame.
- Page table is updated and the process resumes execution.
Disadvantages: Page fault handling is time-consuming, thrashing can occur with excessive faults, disk space is required for swap, and additional hardware (MMU, TLB) is needed.
Q3. Memory Hierarchy: Speed, Size, and Cost
Introduction to Memory Hierarchy
Memory hierarchy organizes memory in a pyramid structure based on speed, cost, and capacity. Its goal is to provide the processor with fast data access while keeping cost reasonable and maintaining large storage capacity. Two guiding principles are:
- Principle of locality: Programs access a small part of their address space at any time.
- Cost-performance trade-off: Faster memories are more expensive per byte.
Levels of Memory Hierarchy (Top to Bottom)
-
CPU Registers
- Location: Inside the CPU
- Speed: Fastest (sub-nanosecond to nanoseconds)
- Size: Smallest (tens of registers)
- Cost: Most expensive per bit
- Technology: Flip-flops
- Managed by: Compiler/programmer
- Purpose: Store immediate operands and intermediate results
-
Cache Memory
- Location: Between CPU and main memory
- Speed: Very fast (1–10 ns)
- Size: Small (L1: 32–64KB, L2: 256KB–2MB, L3: 4–32MB)
- Cost: Very expensive
- Technology: SRAM
- Managed by: Hardware
- Purpose: Store frequently accessed data and instructions
-
Main Memory (RAM)
- Location: On motherboard
- Speed: Moderate (tens of ns)
- Size: Medium (GBs)
- Cost: Moderate
- Technology: DRAM
- Volatile: Yes
- Managed by: Operating system
-
Secondary Storage (HDD, SSD)
- Location: Peripheral devices
- Speed: Slow (ms for HDD, sub-ms for SSD)
- Size: Large (hundreds of GB to TB)
- Cost: Cheaper per GB
- Non-volatile: Yes
-
Tertiary Storage (Tape, Optical)
- Location: Offline/removable storage
- Speed: Very slow (seconds to minutes)
- Size: Very large (archival)
- Cost: Lowest per GB
- Use: Backup and archival
| Level | Speed | Size | Cost/Bit | Volatility |
|---|---|---|---|---|
| Registers | Fastest | Smallest | Highest | Volatile |
| Cache | Very Fast | Small | Very High | Volatile |
| Main Memory | Fast | Medium | Moderate | Volatile |
| Secondary | Slow | Large | Low | Non-volatile |
| Tertiary | Slowest | Largest | Lowest | Non-volatile |
Working Principle
- Inclusion property: Data at level i is likely to be present at level i+1 (lower level).
- Data movement: Data moves between adjacent levels in blocks; smaller blocks at higher levels, larger blocks at lower levels.
- Access pattern: CPU requests data from the highest level; on miss, request moves down and data is copied up when found, producing hits or misses.
Advantages: Balanced performance, cost-effectiveness, scalability, transparency to programmers, and efficiency by exploiting locality.
Performance metric example: Average access time = (Hit rate × Cache access time) + (Miss rate × Main memory access time). If hit rate = 90%, cache time = 10 ns, memory time = 100 ns, average = (0.9×10) + (0.1×100) = 19 ns.
Q4. RAM vs ROM and SRAM vs DRAM
RAM vs ROM Comparison
| Feature | RAM (Random Access Memory) | ROM (Read-Only Memory) |
|---|---|---|
| Full form | Random Access Memory | Read-Only Memory |
| Volatility | Volatile (data lost when power off) | Non-volatile (data retained) |
| Operations | Read and write | Read only (usually written once) |
| Speed | Faster | Slower than RAM |
| Cost | More expensive | Less expensive |
| Usage | Temporary storage for running programs | Permanent storage for firmware/BIOS |
Explanation: RAM stores data and programs currently being executed and loses data on power off. ROM contains permanent instructions needed to start the computer and retains data without power.
SRAM vs DRAM Comparison
| Feature | SRAM (Static RAM) | DRAM (Dynamic RAM) |
|---|---|---|
| Storage element | Flip-flops (6 transistors) | Capacitor + transistor (1T1C) |
| Speed | Faster (1–10 ns) | Slower (50–100 ns) |
| Refresh | No refresh required | Requires periodic refresh |
| Density | Lower | Higher |
| Cost | More expensive | Less expensive |
| Usage | Cache memory (L1, L2, L3) | Main memory (RAM) |
Why SRAM for cache and DRAM for main memory? Cache needs speed, so SRAM is used. Main memory needs capacity, so DRAM is used because it is denser and cheaper, providing an optimal cost-performance balance.
Q5. Cache Coherence and Solutions
Definition of Cache Coherence
Cache coherence is a problem in multiprocessor systems where each processor has its own cache. Inconsistency arises when one processor modifies a cached copy of a memory location while other processors still hold stale copies.
Problem Scenario
Processor P1 reads variable X (X = 5) and caches it. Processor P2 also reads and caches X = 5. If P1 modifies X to 10 in its cache, P2's cache still contains X = 5, and main memory may also show 5. This inconsistency can cause incorrect program behavior.
Solutions to Cache Coherence
- Write-through policy: Every write to cache is immediately written to main memory. Advantage: simple. Disadvantage: slow due to frequent memory writes.
- Write-back with invalidation protocol: When a processor writes, the cache line is marked Modified and other caches with copies are sent an invalidation signal so they invalidate their copies. This reduces memory traffic.
- Write-back with update protocol: Updated values are broadcast to all other caches so they update their copies. Advantage: avoids refetching; disadvantage: high bus traffic.
- MESI protocol: Each cache line is in one of four states: Modified, Exclusive, Shared, or Invalid. State transitions maintain coherence. MESI with bus snooping is common in modern processors.
- Snooping protocol: All caches monitor the shared bus and take action when writes occur. Disadvantage: does not scale well to many processors.
- Directory-based protocol: A central directory keeps track of which caches have copies and notifies only those caches on changes. Advantage: scalable; disadvantage: directory overhead.
Conclusion: Cache coherence is essential for correct multiprocessor operation. Hardware solutions like MESI (with snooping) or directory-based protocols are commonly used to balance correctness and performance.
Q6. DMA: Definition and Operation
Definition
DMA (Direct Memory Access) is a method that allows an I/O device to transfer data directly to or from main memory without continuous CPU intervention. A DMA controller manages transfers so the CPU can perform other tasks.
Need for DMA
In programmed I/O and interrupt-driven I/O, the CPU is involved in every data transfer, which wastes CPU time for high-speed devices. DMA transfers data directly between memory and device, freeing the CPU.
DMA Controller Components
- Address Register (AR): Holds the memory address for transfer and auto-increments.
- Word Count Register (WC): Holds the number of words to transfer and decrements each time; transfer completes when WC = 0.
- Control Register (CR): Stores mode information (read/write, enable, interrupt enable, transfer mode).
- Data Register (DR): Temporary buffer for data.
- Control logic: Manages bus request/acknowledge, read/write signals, addressing and counting.
DMA Transfer Process
- Initialization (CPU): CPU writes starting address, count, and operation type to DMA registers.
- Data transfer (DMA controller): I/O device requests DMA; DMA issues Bus Request; CPU grants the bus; DMA controls the bus and transfers data between memory and device while updating AR and WC.
- Completion: DMA releases the bus and interrupts the CPU if enabled; CPU resumes control.
Why Read/Write Lines Are Bidirectional
DMA can perform both memory-to-device (memory read then device write) and device-to-memory (device read then memory write) operations. Control lines must support both directions so DMA can signal read/write operations appropriately.
DMA Transfer Modes
- Burst mode: DMA transfers the entire block in one burst; CPU blocked during transfer; fastest.
- Cycle stealing: DMA takes one bus cycle at a time; CPU slows slightly but remains operational.
- Transparent mode: DMA transfers only when CPU is not using the bus; no CPU delay but slowest for DMA.
Advantages and Disadvantages
- Advantages: High-speed transfer, CPU freed for other tasks, efficient block transfers.
- Disadvantages: Additional hardware cost, bus contention, greater complexity, limited DMA channels.
Q7. Programmed I/O vs Interrupt-Driven I/O
Programmed I/O (Polling)
Definition: CPU continuously polls the device status to determine readiness for data transfer. It executes a loop checking the status flag.
Working: CPU issues I/O command, polls the status register, and transfers data when the device is ready.
Interrupt-Driven I/O
Definition: CPU initiates an I/O operation and continues other tasks. The device sends an interrupt when ready; the CPU invokes an interrupt service routine (ISR) to handle transfer.
Working: CPU issues I/O command, continues work; device signals interrupt when ready; CPU saves context, executes ISR, transfers data, restores context, and resumes.
Detailed Comparison
| Aspect | Programmed I/O | Interrupt-Driven I/O |
|---|---|---|
| CPU involvement | CPU continuously polls device | CPU works on other tasks until interrupted |
| CPU efficiency | Very inefficient | Highly efficient |
| Speed | Slow | Fast (multitasking) |
| Complexity | Simple | More complex (interrupt hardware and software) |
| Hardware required | Minimal | Interrupt controller, priority circuits |
| Software complexity | Simple polling loop | Requires ISRs and context switching |
| Power consumption | High (polling) | Lower (idle between interrupts) |
| Suitable for | Simple/slow devices (keyboard, basic sensors) | Fast devices and multitasking systems (disks, network cards) |
Interrupt Handling Process
- Device sends interrupt request.
- CPU acknowledges after completing current instruction.
- CPU saves context (PC, registers) on stack.
- Interrupt source identified.
- ISR executes and services the device.
- ISR clears interrupt and restores context; CPU resumes previous task.
Q8. Daisy Chaining and Parallel Priority Interrupts
When multiple devices generate interrupts simultaneously, a priority mechanism decides which device is serviced first. Two common hardware methods are daisy chaining and parallel priority (priority encoder).
Daisy Chain Priority
Daisy chaining connects devices in series. Priority is determined by device position in the chain: the device closest to the CPU has the highest priority. This method is simple and inexpensive but has serial propagation delay and single-point failure risk.
Parallel Priority Interrupt
In parallel priority, each device has an independent interrupt line and all lines are fed to a priority encoder. The encoder determines the highest-priority active interrupt in parallel. This is faster and more flexible (maskable priorities) but requires more hardware.
| Feature | Daisy Chain | Parallel Priority |
|---|---|---|
| Hardware complexity | Simple (serial) | Complex (encoder) |
| Speed | Slower (serial propagation) | Faster (parallel detection) |
| Number of lines | One common + INTA chain | One line per device |
| Flexibility | Fixed by position | Programmable via masks |
| Scalability | Easy to add devices | Limited by encoder size |
| Reliability | Low (chain break affects all) | High (independent lines) |
Q9. Modes of Data Transfer
Introduction
Data transfer modes manage movement between CPU, memory, and I/O devices to optimize CPU utilization, transfer speed, and efficiency. Common modes include programmed I/O, interrupt-driven I/O, DMA, and I/O processors (IOP).
Mode 1: Programmed I/O
CPU actively controls data transfer and polls the device status. It is simple but inefficient and suitable only for slow or dedicated systems.
Mode 2: Interrupt-Driven I/O
CPU initiates I/O and continues other work. The device interrupts when ready and CPU services the interrupt via ISR. This improves CPU efficiency and is suitable for moderate-speed devices.
Mode 3: Direct Memory Access (DMA)
DMA controller transfers data directly between device and memory with minimal CPU involvement (initialization only). DMA supports burst, cycle stealing, and transparent modes and is ideal for high-speed block transfers (disks, network cards).
Mode 4: I/O Processor (IOP) / Channel
An IOP is a dedicated processor that executes I/O programs independently. It provides the highest offload of CPU and supports complex, concurrent I/O operations (used in mainframes and high-performance servers).
Comparison of Modes
| Aspect | Programmed I/O | Interrupt I/O | DMA | IOP |
|---|---|---|---|---|
| CPU utilization | Very poor | Moderate | Good | Excellent |
| Transfer speed | Slowest | Moderate | Fast | Very fast |
| Hardware complexity | Lowest | Simple | Moderate | Highest |
| Suitable for | Slow devices | Moderate devices | Fast devices | Many fast devices |
Synchronous vs Asynchronous Transfer
Synchronous: Uses a common clock for sender and receiver; data transferred at fixed intervals; high speed; used on buses and DMA links.
Asynchronous: No common clock; uses handshaking (ready/acknowledge or strobe); slower but flexible; used for serial communication (UART).
Q10. Addressing Modes
Definition
An addressing mode specifies how an instruction accesses its operand or computes the effective address. Addressing modes provide flexibility, reduce code size, and enable diverse programming techniques.
Common Addressing Modes
-
Immediate: Operand is part of the instruction. Example:
MOV A, #50. Fast but limited range. -
Direct (Absolute): Instruction contains the memory address. Example:
MOV A, 500. -
Indirect: Instruction points to a memory location that contains the effective address. Example:
MOV A, @500. -
Register: Operand is in a processor register. Example:
MOV A, R1. -
Register indirect: A register contains the memory address. Example:
MOV A, @R1. -
Indexed: Effective address = base + index register. Example:
MOV A, 100(R1). Useful for arrays. -
Relative (PC-relative): Effective address = PC + offset. Example:
JMP +10. Good for position-independent code. -
Auto-increment: Use register then increment it automatically. Example:
MOV A, (R1)+. - Auto-decrement: Decrement register then use it as address. Useful for stack operations.
RISC vs CISC Comparison
| Feature | RISC (Reduced Instruction Set Computer) | CISC (Complex Instruction Set Computer) |
|---|---|---|
| Instruction set | Small, simple instructions | Large, complex instructions |
| Instruction format | Fixed length | Variable length |
| Memory access | Load–store architecture (memory via LOAD/STORE) | Instructions can directly access memory |
| Examples | ARM, MIPS, RISC-V | Intel x86, VAX |
Booth's Algorithm for Multiplication
Booth's algorithm multiplies signed binary numbers (2's complement) efficiently by examining pairs of bits and reducing operations when the multiplier has consecutive 1s.
Basic idea: Examine (Q0, Q-1) pairs and perform add/subtract or no operation, then arithmetic right-shift the accumulator and multiplier. Repeat for n bits.
Algorithm steps and an example multiplying +5 × −3 are retained in full detail above, demonstrating how Booth's method reduces the number of additions/subtractions.
IEEE 754 Floating Point Representation
IEEE-754 defines single (32-bit) and double (64-bit) precision formats using Sign, Exponent, and Fraction (mantissa) fields. The value for normalized numbers is:
Value = (−1)^Sign × 1.Fraction × 2^(Exponent − Bias)
| Field | Meaning | Single (32-bit) | Double (64-bit) |
|---|---|---|---|
| Sign | 0 = positive, 1 = negative | 1 bit | 1 bit |
| Exponent | Determines range | 8 bits | 11 bits |
| Fraction (mantissa) | Determines precision | 23 bits | 52 bits |
| Bias | Used to store exponent as unsigned | 127 | 1023 |
Carry Look-Ahead Adder (CLA)
A Carry Look-Ahead Adder computes carries in parallel using generate (G_i) and propagate (P_i) signals to eliminate ripple delay. The carry equations for a 4-bit CLA are:
Carry Look-Ahead Equations
C1 = G0 + (P0 · C0)
C2 = G1 + (P1 · G0) + (P1 · P0 · C0)
C3 = G2 + (P2 · G1) + (P2 · P1 · G0) + (P2 · P1 · P0 · C0)
C4 = G3 + (P3 · G2) + (P3 · P2 · G1) + (P3 · P2 · P1 · G0) + (P3 · P2 · P1 · P0 · C0)
These carries are computed simultaneously, reducing addition delay compared with ripple-carry adders.
Synchronous Data Transfer
Definition
Synchronous data transfer uses a common clock signal for sender and receiver and transfers data in a continuous, clock-synchronized stream.
Key Features
- Common clock: Ensures precise coordination.
- Continuous data flow: Data sent in frames or blocks without start/stop bits.
- Higher speed: No overhead bits; efficient for high-speed links.
- Well-suited for: Computer buses, networks, and DMA transfers.
How it works
- Clock is generated by transmitter or via a separate clock line.
- Both sides sample data on clock edges.
- Data valid only during specified clock intervals.
Advantages and Disadvantages
- Advantages: High throughput, no start/stop bits, accurate timing.
- Disadvantages: Requires clock synchronization; additional clock line cost; alignment errors if clock drifts.
Applications
- System buses, DMA transfers
- Ethernet and other high-speed links (when synchronous modes are used)
- Modems and dedicated synchronous links
English with a size of 27.48 KB