lectures.alex.balgavy.eu

Lecture notes from university.
git clone git://git.alex.balgavy.eu/lectures.alex.balgavy.eu.git
Log | Files | Refs | Submodules

Sound processing.html (6138B)


      1 <?xml version="1.0" encoding="UTF-8"?>
      2 <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
      3 <html><head><link rel="stylesheet" href="sitewide.css" type="text/css"><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"/><meta name="exporter-version" content="Evernote Mac 7.5.2 (457164)"/><meta name="altitude" content="1.400665044784546"/><meta name="author" content="Alex Balgavy"/><meta name="created" content="2017-11-13 12:50:06 +0000"/><meta name="latitude" content="52.33301919689172"/><meta name="longitude" content="4.865518015192567"/><meta name="source" content="desktop.mac"/><meta name="updated" content="2017-11-13 19:31:10 +0000"/><title>Sound processing</title></head><body><div><span style="font-weight: bold;">Problems with sound capture:</span></div><div><ol><li>Acquired signals are very noisy</li><li>Context information is hidden</li></ol><div><br/></div></div><div>How do we process sound to classify and extract information?</div><div><br/></div><div><span style="font-weight: bold;">Basic features of sound</span></div><div><ul><li>Volume (air pressure) or loudness (dB) — amplitude of wave</li><li>Frequency (Hz) or pitch — frequency of wave</li></ul><div><br/></div></div><div><span style="font-weight: bold;">Periodic signal — Fourier</span></div><div>When the signal is sinusoidal, it’s simple to calculate the frequency with a physics formula.</div><div>But if it’s not sinusoidal, what do you do? Analyse frequency spectrum. Enter Fourier.</div><div><br/></div><div>Fourier: almost every signal can be broken down into multiple sinusoidal waves with different frequencies and amplitudes.</div><div><br/></div><div>Instead of having signal amplitude as function of time, represent it by function of frequencies.</div><div><br/></div><div><img src="Sound%20processing.resources/screenshot_4.png" height="320" width="429"/><br/></div><div><br/></div><div>Then you end up with a Fourier series — sum of simple sinusoidal waves with frequencies kf₀, amplitudes A<span style="vertical-align: sub;">k</span> and phase shifts  𝜑<span style="vertical-align: sub;">k:</span></div><div><img src="Sound%20processing.resources/screenshot_10.png" height="88" width="407"/><br/></div><div><br/></div><div>The periodic signal has a frequency spectrum of various harmonics:</div><div><br/></div><div><img src="Sound%20processing.resources/screenshot_2.png" height="242" width="514"/><br/></div><div><br/></div><div>Component frequencies are a multiple of the fundamental frequency, called harmonics.</div><div><br/></div><div>You can calculate amplitudes A<span style="vertical-align: sub;">k</span> with an algorithm called FFT (Fast Fourier Transform), in a vector.</div><div>You put in the vector of samples and the number of samples N, and you get out a vector of amplitudes of length N+1</div><div><ul><li>First element is DC component with frequency 0</li><li>You can really only use the first half of the vector</li></ul><div><br/></div></div><div>Formulas:</div><div><table style="border-collapse: collapse; min-width: 100%;"><colgroup><col style="width: 130px;"/><col style="width: 130px;"/><col style="width: 130px;"/><col style="width: 130px;"/></colgroup><tbody><tr><td style="border: 1px solid rgb(219, 219, 219); width: 130px; padding: 8px;"><div>Frequency step</div></td><td style="border: 1px solid rgb(219, 219, 219); width: 130px; padding: 8px;"><div>Frequency at amplitude</div></td><td style="border: 1px solid rgb(219, 219, 219); width: 130px; padding: 8px;"><div>Nyquist frequency</div></td><td style="border: 1px solid rgb(219, 219, 219); width: 130px; padding: 8px;"><div>Last useful amplitude</div></td></tr><tr><td style="border: 1px solid rgb(219, 219, 219); width: 130px; padding: 8px;"><div><img src="Sound%20processing.resources/screenshot_6.png" height="107" width="141"/></div></td><td style="border: 1px solid rgb(219, 219, 219); width: 130px; padding: 8px;"><div><img src="Sound%20processing.resources/screenshot_3.png" height="92" width="248"/></div></td><td style="border: 1px solid rgb(219, 219, 219); width: 130px; padding: 8px;"><div><img src="Sound%20processing.resources/screenshot_9.png" height="37" width="67"/></div></td><td style="border: 1px solid rgb(219, 219, 219); width: 130px; padding: 8px;"><div><img src="Sound%20processing.resources/screenshot_1.png" height="40" width="206"/></div></td></tr></tbody></table><div><br/></div></div><div><br/></div><div>Nyquist frequency (fc): maximum freq. detected using FFT; half sampling rate Fs. </div><div><br/></div><div><span style="font-weight: bold;">Not periodic — short time analysis</span></div><div>some sound signals are periodic for a very short time</div><div><img src="Sound%20processing.resources/screenshot_8.png" height="337" width="488"/><img src="Sound%20processing.resources/screenshot.png" height="318" width="534"/><br/></div><div><br/></div><div>Cut the speech in segments (frames). Then you can apply FFT on those pieces.</div><div>This is called segmentation or windowing.</div><div><br/></div><div><span style="font-weight: bold;">Spectrogram</span></div><div>Freq. spectrum varies in time</div><div>Graph with time on x-axis, frequency on y-axis and colour being amplitude of each frequency</div><div><br/></div><div><img src="Sound%20processing.resources/screenshot_5.png" height="151" width="509"/><br/></div><div><br/></div><div><img src="Sound%20processing.resources/screenshot_7.png" height="401" width="525"/><br/></div><div><br/></div><div><br/></div><div><span style="font-weight: bold;">Digital filtering</span></div><div>Time domain: moving average filter</div><div>Frequency domain:</div><div><ul><li>Low-pass</li><li>High-pass</li><li>Band-pass — allow only a certain frequency band</li><li>Band-reject (notch-filter) — allow everything but a certain frequency band</li><ul><li>sample signal, compute spectrum using FFT, set to zero portions of spectrum that are just noise, and inverse FFT to synthesise improved signal</li></ul></ul></div><div><br/></div><div><br/></div></body></html>