Sound in surveillance

Sound advice

Let’s look at some basic facts about the nature of sound.

Sound is compression waves in the air caused by mechanical vibrations, for example, in a loudspeaker or a person’s vocal chords. These compression waves spread through the air, much like rings on water. The human ear picks up these vibrations and interprets them as sound.

The frequency of the vibrations determines the perceived pitch of the sound. Low frequencies come across as bass tones, and higher frequencies as treble, or high tones. The human ear is capable of detecting frequencies in the range 20 hertz (vibrations per second) to 20 kilohertz, although the upper limit drops slightly as a person gets older. The sounds made by a human voice are normally in the range 150 hertz to 5 kilohertz.

To record sound we need a microphone. A thin membrane in the microphone reacts to the incoming pressure variations and starts to vibrate at the same frequencies. These vibrations produce a varying electric voltage that can be amplified and transferred over a conductor (wires). When fed into a loudspeaker, the varying voltage transfers the original vibrations back into the air, and the sound is reproduced.

During recording, the electrical signal is sampled (examined or read) several thousand times per second. The sample rate must be at least double the frequency you want to record, so for any sound up to 11 kilohertz the sampling frequency must be at least 22,000 times per second.

Figure 1: Low versus high sample rate.

The bit depth controls the resolution of each sample. At a higher bit depth, the signal level in each sample can be stored with greater precision.

Figure 2: Low versus high bit rate.

If you multiply the sample rate by the bit depth you get the bit rate – the actual data bandwidth required to convey or store the sound digitally. Encoding the digital data will compress it, thus lowering the bit rate. Several encoders are available for this purpose, such as G.711 or the Advanced Audio Codec.

Get closer