The importance of testing analytics algorithms in the real world

Timo Sachse

Anything developed and tested in ideal ‘lab’ conditions can find its performance compromised in real-world environments. The perfect conditions in a wind tunnel might, on paper, have a designer believe she’s engineered the world’s best Formula One racing car. But on a blustery April day on the Portuguese coast the testing models can, quite literally, be blown out of the window.

It’s the same with video analytics, and why testing in real-life user environments is essential in ensuring the machine learning algorithms are effective for specific use cases and scenarios.

Before testing, training

Before testing in the real-world, machine learning algorithms – such as those used for object detection and recognition in video analytics – need to be developed and trained.

To develop a deep or machine learning-based analytics application you need to collect large amounts of data. In video surveillance, this typically consists of images and video clips of humans and vehicles or other objects of interest. Given the breadth of these objects of interest in video analytics, the data sets required can be huge.

In order to make the data recognizable for a machine or computer, a data annotation process is necessary through which relevant objects are categorized and labeled. The annotated data needs to cover a large-enough variety of samples that are relevant for the context where the analytics application will be used. Data annotation is, unfortunately, largely a manual and labor-intense task.

The algorithmic model is fed annotated data and a training framework is used to iteratively modify and improve the model until the desired quality is reached. In other words, the model is optimized to solve the defined task.

Machine learning training methods

Training of the machine learning-based algorithm can be undertaken in three main ways:

  1. Supervised learning: the model learns to make accurate predictions
  2. Unsupervised learning: The model learns to identify clusters
  3. Reinforcement learning: The model learns from mistakes

Supervised learning is the most common method in machine learning today and can broadly be described as ‘learning by example’. The training data is clearly annotated, meaning that the input data is already paired with the desired output result. Supervised learning generally requires a very large amount of annotated data and the performance of the trained algorithm is directly dependent on the quality of that training data.

The most important aspect in supervised learning is to use a dataset that represents all potential input data from a real deployment situation. For object detectors, the developer must make sure to train the algorithm with a wide variety of images, with different objects instances, orientations, scales, light situations, backgrounds, and distractions. Only if the training data is representative for the planned use case will the final analytics application will be able to make accurate predictions when processing new data.

Unsupervised learning uses algorithms to analyze and group unlabeled datasets. This is not a common training method in the surveillance industry, because the model requires a lot of calibration and testing while the quality can still be unpredictable. The datasets must be relevant for the analytics application but do not have to be clearly labeled or marked.

The manual annotation work is eliminated in unsupervised learning, but the number of images or videos needed for the training must be greatly increased, by several orders of magnitude. During the training phase, the to-be-trained model is identifying common features in the datasets, supported by the training framework. During the deployment phase this enables it to group data according to patterns while also allowing it to detect anomalies which do not fit into any of the learned groups.

Reinforcement learning is used in, for example, robotics, industrial automation, and business strategy planning, but due to the need for large amounts of feedback the method has limited use in surveillance today. Reinforcement learning is about taking suitable action to maximize the potential reward in a specific situation, a reward that gets larger when the model makes the right choices. The algorithm does not use data/label pairs for training, but is instead optimized by testing its decisions through interaction with the environment while measuring the reward. The goal of the algorithm is to learn a policy for actions that will help maximize the reward.

Testing before deployment, and in the real-world

Once the model is trained, it needs to be thoroughly tested, typically combining an automated part with extensive testing in real-life deployment situations. In the automated part, the application is benchmarked with new datasets, unseen by the model during its training. If these benchmarks are not where they are expected to be, the process starts over again: new training data is collected, annotations are made or refined and the model is retrained.

After reaching the desired level of quality, a field test starts. In this test, the application is exposed to real world scenarios. The amount and variation depend on the scope of the application: the narrower the scope, the fewer variations need to be tested; the broader the scope, the more tests are needed. Results are again compared and evaluated which, again, can cause the process to start over if results are not where they need to be. Another potential outcome could be to define preconditions, explaining a known scenario in which the application is not or only partly recommended to be used.

Exposing an analytics application to real-world scenarios, however isn’t the same as running the application in the real world. Deployment – also called inference or prediction – is the process of executing a trained machine learning model on a surveillance system monitoring real life scenes to test if the algorithm uses what it learned during the training phase to produce its desired output. It is only at this stage – when the ‘clean’ data used throughout development and testing is replaced with real data, which can deviate greatly in quality – that we discover whether the algorithm is fit for the purpose for which is has been designed.

Analytics testing: often overlooked

Perhaps it is the perception of the increasing ‘intelligence’ of processors and analytics that means customers believe that analytics will perform perfectly, ‘out of the box’, but this is not the case. Testing in the real world is vital, and something which needs to be factored into analytics deployment costs.

Video surveillance analytics will continue to advance, gaining in accuracy and effectiveness, but it is often developed in perfect conditions. The imperfect world in which we live can undermine any algorithm, and testing against these imperfections is essential.

Imperfections are not automatically caused by the application itself – they tend to be even more often caused by a mismatch in expectations. This means testing will not only help in finding flaws or limitations, it will also greatly enhance the understanding of the general capabilities of the analytics. In turn, this will assist in preparing the installation of the detector (camera) in the best possible way, and even result in possible modifications to the scene itself (e.g. lighting).

You can read more about AI in video analytics in our whitepaper.

Download whitepaper