Audio analytics: Enhancing IP cameras with intelligent ears
When perceiving threats, our brain relies on all our faculties to assess the danger. Was that a car backfiring or a gunshot? Was that a window shattering or did somebody just drop a glass? Was that voice shouting in anger or yelling for joy? Adding a visual context might make it easier to discern the difference. But oftentimes we only have an instant to make those judgment calls.
In the physical security world, surveillance cameras augment that perception using onboard analytics to mimic the human process of intelligently weighing sensory data and concluding whether a threat is imminent.
Most of us are familiar with video analytics. These applications operate in the visual realm—capturing, analyzing and presenting situational awareness of things seen. They enable cameras to measure crowd size, alert us to someone loitering or detect intruders crossing into areas they shouldn’t.
On the other hand, audio analytics operate in the acoustic realm—detecting and identifying different sounds even when the camera is pointing in the opposite direction. These applications can alert security personnel to specific sounds like verbal aggression, a window pane being smashed or even a gun being fired.
Understanding the basics of audio analytics
Sound detection software listens for a complex combination of characteristics from decibel level to the energy in different frequencies over time. Programmed to ignore background noise like traffic, conversations, music, etc., these applications trigger an alert when they hear a very specific acoustic pattern. This helps minimize the risk of false positives (or false negatives) even in challenging environments like train platforms and prison cellblocks.
Unlike the separate audio streaming feature of a video camera, audio analytics do not continuously record sound. The program only operates in buffer mode, recording a few seconds before and after the detection to allow security to verify the sound and preserve it for forensic evidence.
Embedding audio analytics on edge devices like network cameras, network door stations, network speakers and network audio bridges affords several advantages.
- Audio processing takes place inside the device, eliminating the need for a central server.
- Video feeds stream only when triggered by an audio event, reducing bandwidth consumption and storage needs.
- It improves real-time surveillance because the alert immediately directs security’s attention to the potential incident.
Augmenting video surveillance systems with audio analytics not only provides an extra layer of protection to the premises, the immediacy of its alerts enables security to be more proactive in identifying and responding to threats.
Making a sound decision: What you hear informs what you do
Given modern machine-learning techniques, the potential number of cataloged sounds in an analytics library could be nearly infinite, limited only by the processing capacity onboard the camera. In a security context, some of the most common sound signatures that analytics listen for include aggression, car alarms, gunshots and breaking glass.
- Aggression detection identifies verbal aggression before it escalates into assault. The analytics software listens for sound patterns associated with duress, anger or fear. Interestingly enough, the American Psychological Association found that 90 percent of all aggressive incidents are preceded by anger. When aggression detection technology recognizes any of these unique sound signatures, the system automatically triggers an alarm and/or streams real-time video footage to a control room or mobile device. With early warning, personnel can quickly intervene and manage the incident before hostilities turn physical. By embedding the application in the cameras, security staff can watch events unfold and provide additional situational awareness to responders en route to the scene.
- Car alarm detection listens for the specific sound pattern produced by the most common car alarm systems on the market. The analytics can detect the sound pattern in a sizable radius surrounding the camera, making the application particularly useful in parking garages and expansive parking lots. By sending real-time alerts with location information, the analytics helps security staff react quicker to incidents and prevent theft or vandalism.
- Gunshot detection recognizes the sound of gunfire from a variety of firearms: handguns, shotguns, rifles and automatic weapons. Once the software detects a weapon being discharged, it triggers an immediate alert to security personnel who can instantly replay the sound for verification and use the video cameras to further assess the threat so they can quickly and safely respond to the event.
- Breaking glass detection recognizes the sound of breaking glass whether the pane is laminated, single or double plate, tempered or wired. Once detected, the software sends an alert to security to investigate the breach. Having analytics reside inside the camera saves the expense of installing motion sensors on every window.
Augmenting video surveillance solutions with these and other custom audio analytics elevates situational awareness to a whole new level. Equipping systems with intelligent analysis of what’s being heard and seen helps security quickly ascertain the nature of a threat and act appropriately.
But just to clarify, when it comes to “listening,” audio analytics only detects and identifies certain acoustic patterns of sound, not actual speech. This is an important distinction because in most states there are strict regulations for recording devices.
From the drawing board to the real-world: two user stories
It’s one thing to talk about the technology in the abstract, but how it works in the real world demonstrates its actual potential. So let’s look at (or should I say hear) how three different entities are integrating audio analytics into their video surveillance solutions
Billerica Police Department: Creating Safer Cellblocks
In the Massachusetts town of Billerica, audio analytics plays a big role in maintaining prison security. The Police Department installed Axis video cameras on the ceiling of each cellblock and embedded Sound Intelligence’s Aggression Detection analytics in the cameras to listen for hostile sound waves, such as a person yelling. When the analytics detects this acoustic signature, it sends an alert to AXIS Camera Station in the police dispatch center as well as to the mobile phone of the duty officer. The notification shows the cellblock number where the disturbance is occurring as well as streams video from the camera for visual verification of the ruckus.
When asked about the value of the technology, Lieutenant Greg Katz, Accreditation and Technology Manager for the Billerica Police Department remarked, “The good thing about sound analytics is that it allows officers to be proactive. The earlier they know something is going on, the quicker they can respond before any damage might happen.”
Lt. Katz also noted that being IP-based technology means audio analytics can easily integrate with other network-based security technology. The Police Department’s next step is to install an Axis network speaker in the station’s dispatch center to report any commotion. For instance, if the analytics picks up aggression, it can trigger a verbal alert, such as “Disturbance in cellblock two. Send officer immediately.”

Rock Hill Schools: Keeping quarrels from getting physical
Rock Hill Schools decided to install Sound Intelligence’s Aggression Detection at Rock Hill High School to help resource officers and staff curb student fights. An integrator embedded the application on the school’s existing Axis video cameras to listen for aggressive voices and trigger alerts to administrators.
“The thing about high schoolers is that they can be joking and goofing around one minute and the next thing you know, one kid pushes another a little too hard and you’ve got a full blown physical altercation on your hands,” explained Kevin Wren, Director of Risk Security Emergency Management for Rock Hill Schools.
In the past, administrators relied on someone pushing a button to call for help. That could delay response time over two minutes. “Now the audio analytics automatically notifies the administrator who immediately dispatches a security officer,” said Wren. “This gets our response time down to seconds instead of minutes.”
Because the microphones are mapped to the cameras, when the administrator gets an alert and the audio clip, the live video feed also appears to determine whether the alert is real.
While it’s early days yet, Wren is optimistic. “We have 2,300 kids just in this one high school. If we can cut the number of fights by 10 percent or more, it’ll be a big win for us.”
Why pair intelligent ears with intelligent eyes?
In their infancy, surveillance cameras were merely dumb recording devices. As on board processing power increased over the years, they gained the capacity to be a whole lot smarter. Today, video analytics provide cameras with intelligent eyes to help security personnel interpret what the camera sees. When you endow the cameras with audio analytics you now have intelligent ears to hear and discern the importance of what else you might be missing.
To learn more about integrating audio analytics into video surveillance solutions, see “Listen Up,” James Marcella, Security Industry Association Magazine, April 2018.
Share your view
You must be logged in to post a comment.