liblarod  @VERSION_STR@
Preprocessing

What is preprocessing?

Preprocessing in larod can be used to process input data so that it has the format, size and shape a neural network expects. For optimal performance the processing operations can be offloaded to specialized preprocessing hardware. Each supported preprocessing hardware accelerator is exposed to applications through a backend (larodChip) in larod.

Currently only image processing operations are supported.

Preprocessing job configuration

Preprocessing jobs are configured with key-value parameters in a larodMap. The configuration describes the data that you have and how you want the data to be. The selected backend will crop, scale and convert according to what the description requires.

Below is an example job configuration. It describes a job that takes a 1280x720 input image in NV12 format, crops out 200x200 from the center of the image (X offset 540 and Y offset 260), converts it to the RGB interleaved format and scales it down to 48x48. In this case the libyuv backend will be used to perform these operations. In the interest of brevity error handling has been omitted.

larodMap* modelParams = larodCreateMap(NULL);
larodMapSetStr(modelParams, "image.input.format", "nv12", NULL);
larodMapSetIntArr2(modelParams, "image.input.size", 1280, 720, NULL);
larodMapSetStr(modelParams, "image.output.format", "rgb-interleaved", NULL);
larodMapSetIntArr2(modelParams, "image.output.size", 48, 48, NULL);
// Our modelParams larodMap replaces the model fd as a model description.
LAROD_ACCESS_PRIVATE, "", modelParams, NULL);
larodMap* jobParams = larodCreateMap(NULL);
// We can change the value of "image.input.crop" in our map between running jobs
// on the same model if we wish.
larodMapSetIntArr4(jobParams, "image.input.crop", 540, 260, 200, 200, NULL);
larodSetJobRequestParams(jobReq, jobParams, NULL);
larodRunJob(conn, jobReq, NULL);

Note that should one be interested in just scaling the original image (from 1280x720 to 48x48) without cropping it first, one could simply neglect to provide a larodMap altogether in the larodJobRequest used.

Common operations

Image preprocessing backends may support the following common image processing operations. Backends are not required to support all operations and image formats.

Operation Description
Image crop Crop out a part of an image.
Image scale Scale up or down an image.
Image convert Convert an image between two color formats.

Common configuration parameters

Image preprocessing backends may support the common image processing parameters in the tables below to describe processing jobs. Backends are not required to support all parameters and values.

Load model parameters

The following are parameters that can be set on a larodMap provided when loading a model using e.g. larodLoadModel.

Key Value
image.input.format* String, describing input image format.
image.input.size* 2-integer-tuple, describing input image width and height.
image.input.row-pitch Integer, describing input image width including padding, in bytes. Inferred if not explicilty given.
image.output.format* String, describing output image format.
image.output.size* 2-integer-tuple, describing output image width and height.
image.output.row-pitch Integer, describing output image width including padding, in bytes. Inferred if not explicitly given.

*: This parameter is mandatory for all preprocessing backends outlined in this document.

Job request parameters

The following are parameters that can be set on a larodMap of a larodJobRequest. The map can be attached to a job request upon its creation (larodCreateJobRequest) or later using larodSetJobRequestParams. The parameters of the map will then be used in a subsequent call to e.g. larodRunJob using this job request.

Since these parameters are not attached to a model it's possible to send job requests having larodMaps with different values for these parameters to the same model.

Key Value
image.input.crop 4-integer-tuple, describing the crop window. The elements in the tuple are: X offset in input image, Y offset in input image, crop window width, crop window height.

Supported backends

Currently the following image preprocessing backends are supported by larod.

libyuv backend

The libyuv backend uses the open source library libyuv. It runs on most CPUs and in particular uses the SIMD technology Neon on Arm architectures to accelerate parallelizable computation. It supports image crop, scale and format conversion.

This backend is represented by the larodChip LAROD_CHIP_LIBYUV in larod.h.

libyuv backend constraints

  • The width, height and row pitch, for both the input and the output image, must be a multiple of 2.

Supported buffer properties for running jobs

This backend only supports the fd access types LAROD_FD_PROP_MAP and LAROD_FD_PROP_READWRITE.

As the name indicates the former fd prop will allow the libyuv backend to map (using mmap) the tensor's file descriptor instead of reading or writing from them. Combined with tensor tracking (e.g. using larodTrackTensor) the libyuv backend may be able to cache a tensor's mapping and thus allow for a very efficient zero-copy map-once memory access pattern.

The access type LAROD_FD_PROP_READWRITE will introduce a memory copy and read()/write() calls for each input and output tensor buffer - these extra operations will degrade performance.

Allocation support

Tensors allocated using the calls larodAllocModelInputs() and larodAllocModelOutputs() with a model loaded to this backend will have file descriptors that are readable, writable and mappable. Accordingly the tensors will have the fd props LAROD_FD_PROP_READWRITE and LAROD_FD_PROP_MAP set.

ACE backend

The ACE backend uses Axis Compute Engine in Axis ARTPEC series chips. It only supports image format conversion.

This backend is represented by the larodChip LAROD_CHIP_ACE in larod.h.

ACE backend constraints

  • The input image width must be a multiple of 8.
  • The input image height must be 4 or larger.
  • The input image row pitch must be equal to input image width.
  • The output image row pitch must be equal to output image width.

Supported buffer properties for running jobs

This backend only supports the fd access types LAROD_FD_PROP_MAP and LAROD_FD_PROP_READWRITE.

As the name indicates the former fd prop will allow the ACE backend to map (using mmap) the tensor's file descriptor instead of reading or writing directly from it. Combined with tensor tracking (e.g. using larodTrackTensor) the ACE backend may be able to cache a tensor's mapping. The backend does not support zero copy, meaning that data will still be copied from the memory mapping to the actual buffer for the job.

The access type LAROD_FD_PROP_READWRITE will do a memory copy through read()/write() calls for each input and output tensor buffer - these extra operations will degrade performance.

Allocation support

Tensors allocated using the calls larodAllocModelInputs() and larodAllocModelOutputs() with a model loaded to this backend will have file descriptors that are readable, writable and mappable. Accordingly the tensors will have the fd props LAROD_FD_PROP_READWRITE and LAROD_FD_PROP_MAP set.

VProc backend

The VProc backend uses VPROC in Ambarella CV series chips. It supports image crop, scale and format conversion.

This backend is represented by the larodChip LAROD_CHIP_CVFLOW_PROC in larod.h.

VProc backend constraints

  • The input image width must be a multiple of 2 when input format is nv12.
  • The input image height must be a multiple of 2 when input format is nv12.
  • The input image row pitch must be a multiple of 32.
  • The output image width must be a multiple of 2 when output format is nv12.
  • The output image height must be a multiple of 2 when output format is nv12.
  • The output image row pitch must be a multiple of 32.
  • For operations requiring both color format conversion and scaling the scale factor must be at most 4.
  • For operations not requiring color format conversion the scale factor must be at most 256.

Supported buffer properties for running jobs

This backend supports the fd access types LAROD_FD_PROP_DMABUF and LAROD_FD_PROP_READWRITE.

The access type LAROD_FD_PROP_DMABUF provides less overhead since the buffer will be passed directly the underlying processing framework without extra copies in larod. When using LAROD_FD_PROP_DMABUF the input tensor buffers will have CPU cache flushed before the processing starts, and output buffers buffers will have cache invalidated before the results are delivered to the application. Note that the supplied dmabufs must be allocated by the Ambarella platform.

The access type LAROD_FD_PROP_READWRITE will introduce a memory copy and read()/write() calls for each input and output tensor buffer - these extra operations will degrade performance.

Allocation support

Using the calls larodAllocModelInputs() and larodAllocModelOutputs() with a model loaded to this backend two kinds of tensor buffers can be allocated. If the LAROD_FD_PROP_READWRITE is not set as required in the call, then tensors with mappable file descriptors based on Cavalry Mem dma-bufs will be returned. As such these tensors will have the fd props LAROD_FD_PROP_MAP and LAROD_FD_PROP_DMABUF set. If however LAROD_FD_PROP_READWRITE is required, then tensors with readable, writable and mappable file descriptors will be returned. As such these tensors will have the fd props LAROD_FD_PROP_READWRITE and LAROD_FD_PROP_MAP set.

OpenCL backend

OpenCL is a compute framework which enables programmers to write programs that execute across heterogeneous platforms such as CPUs, GPUs and more. larod contains predefined OpenCL programs which lets a larod user through its OpenCL backend conveniently run image crop, scale and format conversion.

The platform larod runs on may have several devices supporting the OpenCL framework. larod can run its operations on any of these devices but will choose the device with the lowest cl_device_id by default. See instructions below for how to choose a specific device for larod to run its OpenCL operations on.

This backend is represented by the larodChip LAROD_CHIP_OPENCL in larod.h.

Choosing a specific device

When loading a model onto the OpenCL backend there is an additional key that one can put in the larodMap:

Key Value
device String specifying which OpenCL device to use.

The device value string will be matched to the list of OpenCL devices available on the platform when loading a model, and use the matched device for the model's OpenCL operations. Note that a partial match will suffice.

OpenCL devices on Axis product platforms

Currently the OpenCL backend is supported on two of the SoCs used in Axis products, namely Artpec-7 and Artpec-8. On platforms using these SoCs a larod user can choose among the following values for "device" to run on either GPU or DLPU:

SoC "device" value for GPU "device" value for DLPU
Artpec-7 GC
Artpec-8 GC VIP

For example, setting

larodMapSetStr(modelParams, "device", "GC", NULL);

will load the model onto the GPU of any product based on Artpec-7 or Artpec-8.

Supported buffer properties for running jobs

This backend only supports the fd access type LAROD_FD_PROP_READWRITE.

Allocation support

Tensors allocated using the calls larodAllocModelInputs() and larodAllocModelOutputs() with a model loaded to this backend will have file descriptors that are readable, writable and mappable. Accordingly the tensors will have the fd props LAROD_FD_PROP_READWRITE and LAROD_FD_PROP_MAP set.

Supported operations

The following table describes supported operations for each backend.

Backend crop convert scale
libyuv Yes Yes Yes
ACE Yes
VProc Yes Yes Yes
OpenCL Yes Yes Yes

Supported formats

Operations requiring a color format conversion

The following table describes supported input formats for operations requiring a color format conversion.

Backend nv12 rgb-interleaved rgb-planar
libyuv Yes Yes
ACE Yes
VProc Yes
OpenCL Yes

The following table describes supported output formats for operations requiring a color format conversion.

Backend nv12 rgb-interleaved rgb-planar
libyuv Yes Yes
ACE Yes
VProc Yes
OpenCL Yes

Operations <em>not</em> requiring a color format conversion

The following table describes supported image formats for operations not requiring a color format conversion, i.e. the input and output formats are identical. This could be e.g. a pure scaling operation.

Backend nv12 rgb-interleaved rgb-planar
libyuv Yes Yes
VProc Yes Yes
OpenCL Yes

Supported buffer properties

This is an overview of what file descriptor properties are supported by the various preprocessing backends. Note that the LAROD_FD_PROP_ prefix have been omitted from the table headers in the interest of brevity. Please see larod.h for more info about the LAROD_FD_PROD_* flags.

When running jobs

Please note that though several properties may be supported by a backend, a tensor buffer supplied for running a job need only have at least one of the backend's supported properties to be usable for the job. Having said that, each property comes with different implications on memory access performance.

Input tensors

Backend READWRITE MAP DMABUF
libyuv Yes Yes
ACE Yes Yes
VProc Yes Yes
OpenCL Yes

Output tensors

Backend READWRITE MAP DMABUF
libyuv Yes Yes
ACE Yes Yes
VProc Yes Yes
OpenCL Yes

When allocating tensors

Please note that though several properties may be supported by a backend, it may not be possible to allocate buffers having all the properties at the same time.

Input tensors

Backend READWRITE MAP DMABUF
libyuv Yes Yes
ACE Yes Yes
VProc Yes Yes Yes
OpenCL Yes Yes

Output tensors

Backend READWRITE MAP DMABUF
libyuv Yes Yes
ACE Yes Yes
VProc Yes Yes Yes
OpenCL Yes Yes
larodJobRequest
struct larodJobRequest larodJobRequest
Type describing a job request.
Definition: larod.h:388
larodMapSetStr
bool larodMapSetStr(larodMap *map, const char *key, const char *value, larodError **error)
Add a string to a larodMap object.
larodCreateJobRequest
larodJobRequest * larodCreateJobRequest(const larodModel *model, larodTensor **inputTensors, size_t numInputs, larodTensor **outputTensors, size_t numOutputs, larodMap *params, larodError **error)
Create and initialize a job request handle.
larodMapSetIntArr2
bool larodMapSetIntArr2(larodMap *map, const char *key, const int64_t value0, const int64_t value1, larodError **error)
Add an integer array of two elements to a larodMap object.
larodLoadModel
larodModel * larodLoadModel(larodConnection *conn, const int fd, const larodChip chip, const larodAccess access, const char *name, const larodMap *params, larodError **error)
Load a new model.
LAROD_ACCESS_PRIVATE
@ LAROD_ACCESS_PRIVATE
Private access.
Definition: larod.h:101
LAROD_CHIP_LIBYUV
@ LAROD_CHIP_LIBYUV
CPU with libyuv.
Definition: larod.h:85
larodMap
struct larodMap larodMap
A type containing key-value pairs.
Definition: larod.h:408
larodModel
struct larodModel larodModel
A type representing a model.
Definition: larod.h:338
larodMapSetIntArr4
bool larodMapSetIntArr4(larodMap *map, const char *key, const int64_t value0, const int64_t value1, const int64_t value2, const int64_t value3, larodError **error)
Add an integer array of four elements to a larodMap object.
larodCreateMap
larodMap * larodCreateMap(larodError **error)
Create new larodMap.
larodRunJob
bool larodRunJob(larodConnection *conn, const larodJobRequest *jobReq, larodError **error)
Run a job on a loaded model.
larodSetJobRequestParams
bool larodSetJobRequestParams(larodJobRequest *jobReq, const larodMap *params, larodError **error)
Set additional parameters to a job request.