MPEG Audio Compression: Understanding the Fundamentals

Classified in Computers

Written on June 13, 2024 in English with a size of 2.71 KB

Introduction

The principle of MPEG audio compression is quantization. However, the values being quantized are not the audio samples themselves, but rather numbers (called signals) taken from the frequency domain of the sound.

Encoding Process

Bit Allocation: The encoder knows the compression ratio (or bit rate), allowing it to determine how many bits to allocate to the quantized signals. The adaptive bit allocation algorithm uses the bitrate and frequency spectrum of recent audio samples to minimize audible quantization noise (the difference between the original and quantized signal).
Discrete Fourier Transform: Psychoacoustic models, which determine the quantization coarseness, rely on sound frequency. Since the input is audio samples, the first step is a Discrete Fourier Transform (DFT). This transforms sets of consecutive audio samples (e.g., 12) into the frequency domain.
Frequency Subbands: The DFT can produce a huge number of frequencies. To manage this, frequencies are grouped into equal-width subbands (e.g., 32 in Layer III). Each subband's intensity is represented by a signal, which is then quantized.
Masking Threshold: The coarseness of quantization in each subband depends on its masking threshold and available bits. Psychoacoustic models calculate this threshold, simulating how loud sounds mask nearby frequencies (frequency masking) or sounds close in time (temporal masking).
Psychoacoustic Models: MPEG uses psychoacoustic models to exploit frequency and temporal masking. These models divide the frequency range into critical bands (e.g., 24) and define masking effects within each. The models consider the frequency and amplitude of sounds, playback amplitude (worst-case scenario), and sound source nature (tonal or noise-like).

Decoding Process

Simplified Decoding: Unlike the encoder, the decoder is designed for speed and simplicity to enable real-time playback. It does not use psychoacoustic models or bit allocation algorithms.
Information in Compressed Stream: The compressed stream contains all necessary information for the decoder to dequantize the signals. This includes data for dequantization and ancillary data for specific applications.

Conclusion

MPEG audio compression relies on a complex interplay of signal processing, psychoacoustic models, and efficient encoding/decoding schemes. By understanding these fundamentals, we can appreciate the technology that allows us to enjoy high-quality audio while minimizing storage and bandwidth requirements.

Related entries:

Tags: