Digital Image Processing: Transforms, Compression, and Filters
Posted by Anonymous and classified in Technology
Written on in
with a size of 143.34 KB
Fourier Transform
1. Definition
The Fourier Transform (FT) is used to convert an image from the spatial domain to the frequency domain. It expresses the image as a sum of sinusoidal functions of varying frequencies, amplitudes, and phases.
2. Intuition
In the spatial domain, we deal with pixel intensity at each location. In the frequency domain, we analyze how intensity varies—i.e., how fast brightness changes across pixels. This helps in analyzing patterns, removing noise, and performing filtering.
3. Mathematical Forms
- A. 1D Continuous FT (for signals):
- B. 2D Continuous FT (for images):
- C. 2D Discrete FT (DFT):
Used in DIP because images are digital:
- f(x,y): Input image
- F(u,v): Frequency domain representation
- M×N: Size of image
4. Inverse Fourier Transform
Used to go back from the frequency domain to the spatial domain.
Linearity | FT of sum = sum of FTs |
Translation | Shifting in space → phase shift in frequency |
Scaling | Compress in space → stretch in frequency and vice versa |
Rotation | Rotating image → rotates spectrum by same angle |
Convolution | Convolution in space ↔ Multiplication in frequency |
Correlation | Similar to convolution, used in pattern matching |
6. Applications in DIP
- Image filtering (low-pass, high-pass)
- Image compression (JPEG uses DCT, a form of FT)
- Edge detection
- Image enhancement
- Noise removal
Discrete Cosine Transform (DCT)
Definition
The Discrete Cosine Transform converts an image from the spatial domain to the frequency domain using only cosine functions. Unlike the Fourier Transform, which uses both sine and cosine, DCT uses only cosine, making it more efficient for image compression. It produces real-valued output—no imaginary components.
1D DCT Equation
2D DCT for Images
Properties of DCT
| Property | Description |
|---|---|
| Real-valued | Output contains no imaginary values |
| Energy compaction | Most information is concentrated in fewer coefficients |
| Orthogonality | Basis functions are orthogonal |
| Separability | 2D DCT = 1D DCT on rows + 1D DCT on columns |
Applications
- Image and video compression (JPEG, MPEG)
- Image denoising
- Feature extraction (e.g., in face recognition)
- Watermarking and image hiding
DCT vs FFT
| Feature | DCT | FFT |
|---|---|---|
| Functions used | Cosines only | Sines and cosines (complex) |
| Output | Real numbers | Complex numbers |
| Energy | More compact | Spread across frequencies |
| Use-case | Compression | Frequency analysis |
Wavelet Transform
Definition
The Wavelet Transform represents an image in terms of both space and frequency, unlike FT or DCT which focus on global frequency. It uses small, localized waveforms called wavelets instead of sine or cosine functions and offers multi-resolution analysis.
Key Concept
Fourier and DCT use global basis functions: good for frequency, bad for localization. The Wavelet transform uses short basis functions that are scaled and shifted—ideal for localized changes, like edges and textures.
Types
- 1. Continuous Wavelet Transform (CWT): Infinite number of scales and positions; mainly used in theoretical analysis.
- 2. Discrete Wavelet Transform (DWT): Used in practical applications (e.g., image processing); decomposes image into approximations and details at different scales.
DWT Process
At each level, the image is divided into 4 parts:
- LL (Approximation): Low-frequency components (smooth areas)
- LH (Horizontal detail): Vertical edges
- HL (Vertical detail): Horizontal edges
- HH (Diagonal detail): High-frequency corners and noise
Properties of Wavelet Transform
| Property | Description |
|---|---|
| Multi-resolution | Can analyze image at different scales/resolutions |
| Localization | Good spatial and frequency localization |
| Energy efficient | Stores edge and texture information compactly |
| Time-frequency | Combines advantages of both domains |
Run Length Coding (RLC)
Definition
Run Length Coding (RLC) is a lossless compression technique used to reduce the size of data by encoding repeated values (runs) as a single value and count. It is simple, effective, and fast—especially when data has many repeating elements.
Working Principle
Instead of storing repeated values individually, RLC stores: (Value, Run Length).
Advantages and Limitations
| Feature | Explanation |
|---|---|
| Simple algorithm | Easy to implement and decode |
| Lossless | No loss of image quality |
| Efficient for sparse images | Great for binary and fax images |
| Issue | Why it matters |
|---|---|
| Not good for complex images | High variation leads to poor performance |
| Inefficient for noisy images | Random pixel changes break runs |
| Dependent on scanning order | Different orders give different results |
Lempel-Ziv Coding (LZ Coding)
A lossless compression algorithm that reduces file size by replacing repeated patterns with pointers to previous occurrences. It uses a dictionary-based approach.
Variants
- LZ77: Uses a sliding window to find repeated sequences.
- LZ78: Builds a dictionary dynamically.
- LZW: Improves LZ78 by using fixed-length codes.
Image Filtering Techniques
Median Filter
A non-linear filter that replaces each pixel with the median value of the surrounding pixels. It is highly effective for removing impulse noise (salt-and-pepper noise) without blurring edges.
Geometric Mean Filter
A non-linear filter that replaces each pixel with the geometric mean of the pixel values in its neighborhood.
Harmonic Mean Filter
A non-linear filter that replaces each pixel with the harmonic mean of the pixel values in its neighborhood.
Lossy Compression Techniques
1. Transform Coding
Converts an image from the spatial domain to a frequency domain. High-frequency components are discarded or quantized, while low-frequency components are kept.
2. K-L Transform (Karhunen-Loève Transform)
A technique used for dimensionality reduction. It finds the orthogonal basis that best approximates the image, preserving the most significant features.
3. Discrete Cosine Transform (DCT)
Separates the image into a sum of cosine functions. It is the standard for JPEG image compression.
4. Block Truncation Coding (BTC)
Divides the image into blocks and represents each block with a limited number of pixel values (usually two representative values based on mean and variance).