Upper and lower clothing color has been added as an attribute to the human object class within the analytics scene description metadata stream in AXIS OS 11.5. This new capability enables video management systems and other applications that utilize metadata to filter results based on upper and lower clothing color. 

For example: Let’s say an investigator is looking for a person of interest wearing a green top. They could limit their search results to just display people wearing green tops by filtering the search criteria based on a person’s attributes, in this case the person’s upper clothing color.

Which products are supported?

The upper and lower clothing color attributes are available on ARPTEC-8 and Ambarella CV25 products, with the following exceptions:

How is the clothing color presented in the scene description analytics stream?

The data format is based on ONVIF specification and Profile M. Attribute values are represented using the RGB color presentation. A list of the supported colors for each attribute along with their respective RGB values is found below:

Upper (RGB value)Lower (RGB value)
White (255, 255, 255)White (255, 255, 255)
Gray (128, 128, 128)Gray (128, 128, 128)
Black (0, 0, 0)Black (0, 0, 0)
Red (255, 0, 0)Red (255, 0, 0)
Blue (0, 0, 255)Blue (0, 0, 255)
Green (0, 128, 0)Green (0, 128, 0)
Yellow (255, 255, 0)Beige (245, 245, 220)
Beige (245, 245, 220) 

Some less frequent colors are remapped to one of the main color clusters listed above due to the similarity between these depending on illumination conditions at the scene. Please refer to the table below for additional information.


ColorMaps to


ColorMaps to

The upper and lower clothing attribute is an element within the human body descriptor of the appearance node of an object node. The sub-feature “Tops” maps to the upper clothing attribute, while the sub-feature “Bottoms” maps to the lower clothing attribute. 

The following example describes a detected person that is likely wearing a blue top and gray bottoms:

<tt:MetadataStream xmlns:tt=http://www.onvif.org/ver10/schema>

  <tt:VideoAnalytics xmlns:tt=http://www.onvif.org/ver10/schema>

    <tt:Frame xmlns:bd=http://www.onvif.org/ver20/analytics/humanbody UtcTime=”2023-05-03T10:55:27.203945Z” Source=”AnalyticsSceneDescription”>

      <tt:Object ObjectId=”1”>



            <tt:BoundingBox left=”-0.360317” top=”0.998612” right=”0.413819” bottom=”-0.937289” />

            <tt:CenterOfGravity x=”0.0267509” y=”0.0306613” />


              <tt:Point x=”-0.360317” y=”0.998612” />

              <tt:Point x=”-0.360317” y=”-0.937289” />

              <tt:Point x=”0.413819” y=”-0.937289” />

              <tt:Point x=”0.413819” y=”0.998612” />








            <tt:Type Likelihood=”0.7”>Human</tt:Type>







                    <tt:Color X=”0” Y=”0” Z=”255” Likelihood=”0.95” Colorspace=”RGB” />







                    <tt:Color X=”128” Y=”128” Z=”128” Likelihood=”0.8” Colorspace="RGB" />