Unleashing the Power of Audio Processing: A Guide to Enhancing Embedded Media
Audio processing is a crucial aspect of embedded media
processing. Although audio data requires less memory and processing power
compared to video, it plays a significant role in delivering high-quality sound
experiences. In this article, we will explore the fundamentals of audio
processing, including the anatomy of the human ear from an audio processing
perspective, the digital filter model of the basilar membrane, and the concept
of human auditory masking.
Understanding Audio Processing Fundamentals: Audio signals
are captured in digital format and undergo compression techniques to store or
stream high-fidelity audio data efficiently. The human ear acts as a spectrum
analyzer, with the outer ear directing soundwaves into the eardrum and causing
vibrations in the middle ear. The cochlea, a part of the inner ear, can be
mathematically modeled using the digital filter model, allowing for the
analysis of different audio frequencies.
Digital Filter Model of the Basilar
Membrane: The basilar membrane, a curved portion within the cochlea,
contains digital filters that process specific frequency bands. Implementing
128 filters sequentially would be impractical due to the time delay and
processing power required. To achieve real-time capability, a parallel filter
bank model is used, where 32 filter banks process the audio signals
simultaneously.
Human Auditory Masking: The human ear's ability to mask certain audio
components is essential in audio coding systems. Simultaneous masking occurs
when certain frequencies with lower sound pressure levels are silenced out by
louder frequencies, while non-simultaneous masking occurs within specific time
intervals. These masking effects are taken into consideration when designing
audio coding systems, such as the popular MP3 and AAC codecs.
Wide Band Audio Coding: To achieve low bit rates while maintaining
high audio quality, the psychoacoustic properties of the human ear are
utilized. The frequency range is divided into bands, and redundant masked
samples are discarded to reduce the output bit rate. This lossy compression
technique is applied in codecs like MP3, AAC, WMA, Vorbis, FLAC, and AC-3.
Comments
Post a Comment