ntmusic logo more navigation
FFT Fast Fourier Transform cover

Fast Fourier Transform (FFT)

FFT Audio Examples
 

Introduction

 

FFT or better known as the Fast Fourier Transform is a process where within a snap shot (window) of time any given sound can be captured and displayed in the form of multiple sine and cosine waves. This information is then transferred into what is known as the frequency domain.

Production

Within the frequency domain FFT creates bins that are of the Liner nature (not like our natural logarithmically way of hearing). These are decided by a window size of an FFT analysis. To calculate each bin size we use the nyquist theory; this is where twice the speed of the highest signal to be captured is needed to gather all the necessary Fourier components. For the examples in this project a set rate of 44.1 kHz has been used, although this is always changeable as long as the sample rate is to the power of two. To calculate this we need a divisional sum that reads as follows.

If a window size of 512 is used this will create the equation of 512 divided by the sample rate. As the sample rate is 44.1 kHz this would create a bin size of 86.1238125 Hz. If the same sample rate is used but the window size was changed to 1024, this would then create bins sizes of 43.06640625 Hz in width. Carrying this on even higher to 4096 would give a size of 10.7666015625 Hz. If in the course of processing, the user needed a tight bin size it would seem sensible to use the 4096 window size or maybe even higher to create even smaller distances between the frequencies. However a trade off that takes place with each widow size has to be considered when deciding the appropriate window to use. Using high window sizes gives very good frequency coverage, with bin widths of approximately 10 Hz but a bad coverage of time takes place. If however the window size is changed to a lower value 512 the time coverage will increase in its accuracy but decrease in its frequency coverage. At a window size of 512 the lower frequencies are approximately 88 Hz apart almost two octaves in difference, as we move up the bins this becomes less and less of a distance on a piano scale. A window called a Hamming window is also applied although there are other available types such as Bartlett, Hanning, Blackman or rectangular; audio example five shows how this effectively could be anything.

Audio examples of FFT

Using the plug-in PVACCU.DLL this shows how the incoming signal has its spectral components echoed. This method is similar to the idea of delay but the interesting side to this approach is that the actual sound either moulds itself seemingly within the spectral delay or still becomes audible as the delay effects is increased. Normally the original audio signal would reappear every so often with a normal delay patch so that all frequencies would become used, this way of delay performs a type of morphing effect as it moves from the original sound of a piano into a more distinct pitch sound. The plug-in also allows the user to perform a glissando on the spectral delay, if there is no delay then this attribute has no effect on the sound. This control however can perform extravagant effects on the incoming delayed spectral data as demonstrated in this example.

This example uses the PVEXAG.DLL plug-in to either increase the peaks of the spectral or decrease them. As the spectral peaks are increased more audio pitches become present creating what is closely described as chorusing. When the opposite effect is added (as is used on the introduction of this sample) a more granulated sound becomes apparent.

Example three uses both the plug-ins from example one and two, and continues to use the piano loop as the source audio. What is created is a spectral extravaganza that resembles nothing like a piano. The overlapping creates a water bubble timbre to the noise like frequencies created by the FFT process within the plug-in. The only downside to these plug-in is the inability to change the FFT window size which is set at 1024. This example and examples one and two are recorded in real-time.

Using PVEXAG.DLL, this example takes a drum loop that is continuous, aiming to show the difference between instrument timbres and rhythm based timbres. Percussion seemed to be the best for this type of process out of all the sources that were used adding a perception of further rhythms within the original rhythm.

When reading the information available on this subject references to the Hamming window were referred to as the best window to use. It was interesting to experiment in Pure Data (PD) with the drawing of my own windows in real-time mainly to see the effects that could happen. Free hand drawing unfortunately always gives a harsher sound than that of a smooth curve such as the Hamming window. The example starts with a line that would suggest an oscillation of around 1-12 Hz, the sound is very quiet and at the same time small in its amplitude. When the Hamming window is applied, its upper curve applies wider amplitude which is instantly noticeable. The drawing is created on a FFT window setting of 256. This process has been normalized to emphasize the lower amplitudes that start the example.

Using the sine wave generator as the audio source this example takes the FFT and demonstrates the tight control that can be obtained by isolating an individual bin and rising up the frequency scale at the rate of 1 Hz. This effect can give the impression of pitch, at the same time it could become a Theremin like oscillator in a live performance. The second part to this example shows how bad a small FFT size can affect the difference in bin widths and how a large window size can reverse this effect.

This example takes the spectral graph of an incoming sound and applies that spectral data to the sample being used; in this case a rap vocal is used with the piano loop taken from the first three examples as the incoming signal to be analyzed. The FFT size is set to 512 which seemed to give the best results to show this process.

This example uses a spectral pitch shifter with a FFT window size deciding the way the pitch rises, the first example uses a very small window of 32 the result is very grainy and almost unusable in music. When the window size is increased to a more common 2048 the results are very impressive giving a smooth transition between two octaves.

Using convolution with a FFT window of 2048 this example takes two vocal sounds and then changes to a percussive sound, showing the difference that two different timbres can have on the processed sound.

Using delay dots spectral shaper this example shows how an incoming sound file can be shaped by its spectral components either with preset pattern built in to the plug-in or drawn as I have done to finish the example. The second part to this example takes a vocal input to show the difference between the two approaches. The percussive sound gives a very usable effect where the vocals seem to be too grainy to be considered usable. This approach also allows the plug-in to act as a filter pass/reject tool; the end of this example does just that.

When mixing there is always the time when a closer look is needed away from the normal EQ meters sequencers have. Within my work I have always used a spectrum analyzer as a way to put a visual finishing touch on mastering. This is also a useful way to determine whether the speakers that are playing the audio are set correctly or are good enough for the job. It can also determine whether the mix is out of context and in need of EQ. Although the results were not perfect the PAZ spectrum analisis program did offer presets that were along this line of thinking. The 3d analyzer (Wavelab) is also good for situations such as mastering, and as the example shows there is obvious overpowering amplitudes coming form the top end of the frequency scale created by the deliberate over pushing of the frequencies from the EQ applied. Therefore you would expect to hear load overpowering sounds from that end of the frequency scale.

FFT And Spectral Analysis Techniques

FFT or better known as the Fast Fourier Transfer is a process where within a snap shot (window) of time any given sound can be captured and displayed in the form of multiple sine and cosine waves. This information is then transferred into what is known as the frequency domain. Within the frequency domain FFT creates bins that are of the Liner nature (not like our natural logarithmically way of hearing). These are decided by a window size of an FFT analysis. To calculate each bin size we use the nyquist theory; this is where twice the speed of the highest signal to be captured is needed to gather all the necessary Fourier components. For the examples in this project a set rate of 44.1 kHz has been used, although this is always changeable as long as the sample rate is to the power of two. To calculate this we need a divisional sum that reads as follows.

If a window size of 512 is used this will create the equation of 512 divided by the sample rate. As the sample rate is 44.1 kHz this would create a bin size of 86.1238125 Hz. If the same sample rate is used but the window size was changed to 1024, this would then create bins sizes of 43.06640625 Hz in width. Carrying this on even higher to 4096 would give a size of 10.7666015625 Hz. If in the course of processing, the user needed a tight bin size it would seem sensible to use the 4096 window size or maybe even higher to create even smaller distances between the frequencies. However a trade off that takes place with each widow size has to be considered when deciding the appropriate window to use. Using high window sizes gives very good frequency coverage, with bin widths of approximately 10 Hz but a bad coverage of time takes place. If however the window size is changed to a lower value 512 the time coverage will increase in its accuracy but decrease in its frequency coverage. At a window size of 512 the lower frequencies are approximately 88 Hz apart almost two octaves in difference, as we move up the bins this becomes less and less of a distance on a piano scale. A window called a Hamming window is also applied although there are other available types such as Bartlett, Hanning, Blackman or rectangular. Audio example five shows how this effectively could be anything.

Useful Uses Of FFT

The telephone produces two simultaneous pure sine waves/tones. There two Fundamental frequencies determine the key pressed on a phone, within the normal time domain the calculation of these frequencies would be difficult, transferring the signal into the frequency domain helps us determine the two fundamentals easier. By moving a signal from the time domain into the frequency domain the components (sine waves) that produced the Time signal can be seen in the Frequency domain by the way of fundamentals A hanning (or any of the others available window helps prevents Spectral leakage (spectral leakage takes place when a FFT does not have a constant repeat signal without the introduction of displaced sinusoid waves (these have the sound of noise in the time domain.

 

© 2017 http://www.ntmusic.org.uk/ All Rights Reserved. All Trademarks Recognised.