Video compression

Video compression

Video compression technologies are about reducing and removing redundant video data so that a digital video file can be effectively sent over a network and stored on computer disks. With efficient compression techniques, a significant reduction in file size can be achieved with little or no adverse effect on the visual quality. The video quality, however, can be affected if the file size is further lowered by raising the compression level for a given compression technique.

Different compression technologies, both proprietary and industry standards, are available. Most network video vendors today use standard compression techniques. Standards are important in ensuring compatibility and interoperability. They are particularly relevant to video compression since video may be used for different purposes and, in some video surveillance applications, needs to be viewable many years from the recording date. By deploying standards, end users are able to pick and choose from different vendors, rather than be tied to one supplier when designing a video surveillance system.

Axis uses three different video compression standards. They are Motion JPEG, MPEG-4 Part 2 (or simply referred to as MPEG-4) and H.264. H.264 is the latest and most efficient video compression standard.

Video codec

The process of compression involves applying an algorithm to the source video to create a compressed file that is ready for transmission or storage. To play the compressed file, an inverse algorithm is applied to produce a video that shows virtually the same content as the original source video. The time it takes to compress, send, decompress and display a file is called latency. The more advanced the compression algorithm, the higher the latency.

A pair of algorithms that works together is called a video codec (encoder/decoder). Video codecs of different standards are normally not compatible with each other; that is, video content that is compressed using one standard cannot be decompressed with a different standard. For instance, an MPEG-4 decoder will not work with an H.264 encoder. This is simply because one algorithm cannot correctly decode the output from another algorithm but it is possible to implement many different algorithms in the same software or hardware, which would then enable multiple formats to coexist.

Image compression vs. video compression

Different compression standards utilize different methods of reducing data, and hence, results differ in bit rate, quality and latency. Compression algorithms fall into two types: image compression and video compression.

Image compression uses intraframe coding technology. Data is reduced within an image frame simply by removing unnecessary information that may not be noticeable to the human eye. Motion JPEG is an example of such a compression standard. Images in a Motion JPEG sequence is coded or compressed as individual JPEG images.

With the Motion JPEG format, the three images in the above sequence are coded and sent as separate unique images (I-frames) with no dependencies on each other.

Video compression algorithms such as MPEG-4 and H.264 use interframe prediction to reduce video data between a series of frames. This involves techniques such as difference coding, where one frame is compared with a reference frame and only pixels that have changed with respect to the reference frame are coded. In this way, the number of pixel values that is coded and sent is reduced. When such an encoded sequence is displayed, the images appear as in the original video sequence.

With difference coding, only the first image (I-frame) is coded in its entirety. In the two following images (P-frames), references are made to the first picture for the static elements, i.e. the house. Only the moving parts, i.e. the running man, are coded using motion vectors, thus reducing the amount of information that is sent and stored.

Other techniques such as block-based motion compensation can be applied to further reduce the data. Block-based motion compensation takes into account that much of what makes up a new frame in a video sequence can be found in an earlier frame, but perhaps in a different location. This technique divides a frame into a series of macroblocks (blocks of pixels). Block by block, a new frame can be composed or ‘predicted’ by looking for a matching block in a reference frame. If a match is found, the encoder codes the position where the matching block is to be found in the reference frame. Coding the motion vector, as it is called, takes up fewer bits than if the actual content of a block were to be coded.

Illustration of block-based motion compensation.

With interframe prediction, each frame in a sequence of images is classified as a certain type of frame, such as an I-frame, P-frame or B-frame.

An I-frame, or intra frame, is a self-contained frame that can be independently decoded without any reference to other images. The first image in a video sequence is always an I-frame. I-frames are needed as starting points for new viewers or resynchronization points if the transmitted bit stream is damaged. I-frames can be used to implement fast-forward, rewind and other random access functions. An encoder will automatically insert I-frames at regular intervals or on demand if new clients are expected to join in viewing a stream. The drawback of I-frames is that they consume much more bits, but on the other hand, they do not generate many artifacts, which are caused by missing data.

A P-frame, which stands for predictive inter frame, makes references to parts of earlier I and/or P frame(s) to code the frame. P-frames usually require fewer bits than I-frames, but a drawback is that they are very sensitive to transmission errors because of the complex dependency on earlier P and/or I frames.

A B-frame, or bi-predictive inter frame, is a frame that makes references to both an earlier reference frame and a future frame. Using B-frames increases latency.

A typical sequence with I-, B- and P-frames. A P-frame may only reference preceding I- or P-frames, while a B-frame may reference both preceding and succeeding I- or P-frames.

When a video decoder restores a video by decoding the bit stream frame by frame, decoding must always start with an I-frame. P-frames and B-frames, if used, must be decoded together with the reference frame(s).

Axis network video products allow users to set the GOV (group of video) length, which determines how many P-frames should be sent before another I-frame is sent. By decreasing the frequency of I-frames (longer GOV), the bit rate can be reduced. To reduce latency, B-frames are not used.

Besides difference coding and motion compensation, other advanced methods can be employed to further reduce data and improve video quality. H.264, for example, supports advanced techniques that include prediction schemes for encoding I-frames, improved motion compensation down to sub-pixel accuracy, and an in-loop deblocking filter to smooth block edges (artifacts).

Next topic: Compression formats

Compression formats