Extract VPR model to SNPE format

This repo contains the code to convert pre-trained yolov5 model to run on SNPE Qualcomm chip DSP. It is based on the YOLOv5 repo by Ultralytics, AGPL-3.0 license

Steps to converting the model to SNPE format to run on the DSP:

Train the detector using the train.py script
Export the model to ONNX format using the export.py script
Convert the model to dlc using Qualcomm's SNPE converter
Quantize the model using Qualcomm's SNPE converter
Run the model on the DSP using Qualcomm's SNPE SDK, and the provided C++ code

Train the detector on COCO dataset

Follow the instructions in the YOLOv5 repo

cd yolov5
pip install -r requirements.txt  # install
 # single class 
python train.py --data coco128.yaml --weights '' --cfg yolov5s-LeakyReLU_1class.yaml --img 320 --single-cls  
# multi class 
python train.py --data coco128.yaml --weights '' --cfg yolov5s-LeakyReLU.yaml --img 320

Extract the onnx model from the YOLOv5 repo

python export.py --weights runs/train/exp/weights/best.pt --include onnx --opset 11 --imgsz 256 320 --export-to-snpe --batch-size 2

Make sure your model looks good

go to https://netron.app/ and drag your onnx file to the browser
get the input name there

Convert the model to SNPE format.

Download SNPE-converter from Qualcomm's SNPE Tools
- This code was tested using Qualcomm Neural Processing SDK for Linux v2.10.0
- Driver Version: 460.73.01 CUDA Version: 11.3
- Install the SNPE converter to a conda environment using
```
conda env create -f snpe_environment_clean.yml
conda activate snpe_conda2
```

Run SNPE env setup

The script sets up the following environment variables.

SNPE_ROOT: root directory of the SNPE SDK installation
ONNX_HOME: root directory of the ONNX installation provided
The script also updates PATH, LD_LIBRARY_PATH, and PYTHONPATH.

cd $SNPE_ROOT
source bin/envsetup.sh -o $ONNX_DIR
where $ONNX_DIR is the path to the ONNX installation.
# for example
source snpe_2_10_0_4541/bin/envsetup.sh -o /home/ubuntu/anaconda3/envs/snpe_conda3/lib/python3.6/site-packages/onnx

this warning is ok

[WARNING] Can't find ANDROID_NDK_ROOT or ndk-build. SNPE needs android ndk to build the NativeCppExample

Run SNPE extractor on example input to dlc

python3.6 snpe_2_10_0_4541/bin/x86_64-linux-clang/snpe-onnx-to-dlc -i runs/train/exp/weights/best.onnx -d 'images' 1,3,256,320

If you get the following error ImportError: libpython3.6m.so.1.0: cannot open shared object file: No such file or directory
- try running the following command sudo apt-get install libpython3.6 pip3 install pyyaml
This error is OK

* WARNING_OP_NOT_SUPPORTED_BY_ONNX: Unable to register converter supported Operation [GridSample:Version 16] with your Onnx installation. Got: No schema registered for 'GridSample'!. Converter will bail if Model contains this Op.

Convert dlc to quantized dlc

Create a set of example inputs. Change this code to include a set of real inputs

import torch
import numpy as np
data = torch.rand(1, 3, 256, 320)
# Convert the tensor to a NumPy array and save it to a binary file
with open(r"float_data.bin", 'wb') as file:
    np.array(data.numpy(), dtype=np.float32).tofile(file)

create a txt file with the path to all the example inputs see examples/target_raw_list.txt
Run the quantization script
- The script will create a quantized dlc file

snpe_2_10_0_4541/bin/x86_64-linux-clang/snpe-dlc-quantize \
      --input_dlc=runs/train/exp/weights/best.dlc \
      --output_dlc=runs/train/exp/weights/best_quantized.dlc \
      --input_list=examples/target_raw_list.txt \
      --debug1

Run the model on the DSP

update the C++ following the instructions in this yolov5 issue

Dnn::BBoxesSortedByConfidence DecodeOutput(tcb::span<Math::Scalar> dnn_output, const YoloV5Config &config)
{
  SCOPED_TRACE_ALGO("YoloV5::DecodeOutput");
  Dnn::BBoxesSortedByConfidence bboxes;
  // modified from https://github.com/ultralytics/yolov5/issues/4790
  for (size_t c = 4; c < dnn_output.size(); c += 6) {
    if (dnn_output[c] >= config.confidence_threshold_before_nms) {
      float cx = dnn_output[c - 4];
      float cy = dnn_output[c - 3];
      float w = dnn_output[c - 2];
      float h = dnn_output[c - 1];
      int gridX, gridY;
      int anchor_gridX, anchor_gridY;
      const auto &anchor_x = config.anchor_x;
      const auto &anchor_y = config.anchor_y;
      const auto &filter_size_y = config.filter_size_y;
      const auto &filter_size_x = config.filter_size_x;
      const auto &num_filters = config.num_filters;

      int stride;
      int ci = (int)(c / 6);
      if (ci < num_filters[0]) {
        gridX = (ci % (filter_size_x[0] * filter_size_y[0])) % filter_size_x[0];
        gridY = (int)((ci % (filter_size_x[0] * filter_size_y[0])) / filter_size_x[0]);
        anchor_gridX = anchor_x[((int)(ci / (filter_size_x[0] * filter_size_y[0])))];
        anchor_gridY = anchor_y[((int)(ci / (filter_size_x[0] * filter_size_y[0])))];
        stride = 8;
      } else if (ci >= num_filters[0] && ci < num_filters[0] + num_filters[1]) {
        gridX = ((ci - num_filters[0]) % (filter_size_x[1] * filter_size_y[1])) % filter_size_x[1];
        gridY = (int)(((ci - num_filters[0]) % (filter_size_x[1] * filter_size_y[1])) / filter_size_x[1]);
        anchor_gridX = anchor_x[(int)((ci - num_filters[0]) / (filter_size_x[1] * filter_size_y[1])) + 3];
        anchor_gridY = anchor_y[(int)((ci - num_filters[0]) / (filter_size_x[1] * filter_size_y[1])) + 3];
        stride = 16;
      } else {
        gridX = ((ci - num_filters[1]) % (filter_size_x[2] * filter_size_y[2])) % filter_size_x[2];
        gridY = (int)(((ci - num_filters[1]) % (filter_size_x[2] * filter_size_y[2])) / filter_size_x[2]);
        anchor_gridX = anchor_x[(int)((ci - num_filters[1] - num_filters[0]) / (filter_size_x[2] * filter_size_y[2])) + 6];
        anchor_gridY = anchor_y[(int)((ci - num_filters[1] - num_filters[0]) / (filter_size_x[2] * filter_size_y[2])) + 6];
        stride = 32;
      }
      cx = (float)(cx * 2 - 0.5 + gridX) * stride;
      cy = (float)(cy * 2 - 0.5 + gridY) * stride;
      w = w * 2 * w * 2 * anchor_gridX;
      h = h * 2 * h * 2 * anchor_gridY;
      float left = cx - w / 2;
      float top = cy - h / 2;
      float right = cx + w / 2;
      float bottom = cy + h / 2;
      float confidence = dnn_output[c];
      cv::Rect2f bbox{ left, top, right - left, bottom - top };

      Dnn::DetectionResult result{ config.keyboard_class, confidence, bbox };
      bboxes.emplace(confidence, result);
    }
  }
  return bboxes;
}

Build the C++ code

/data/local/tmp/dnn-test/snpe2.10 # LD_LIBRARY_PATH=arm64-v8a ./run-dnn --native-library-dir arm64-v8a --dnn ../best_quantized.dlc   --input-layer images --width 320 --height 256 --channels 3 --iterations 100 --batch-size 1

The runtime should be around 13ms for a single image with size 256x320

Name		Name	Last commit message	Last commit date
Latest commit History 2,771 Commits
.github		.github
classify		classify
data		data
examples		examples
models		models
segment		segment
utils		utils
.dockerignore		.dockerignore
.gitattributes		.gitattributes
.gitignore		.gitignore
CITATION.cff		CITATION.cff
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
README.zh-CN.md		README.zh-CN.md
README_yolov5.md		README_yolov5.md
benchmarks.py		benchmarks.py
detect.py		detect.py
export.py		export.py
hubconf.py		hubconf.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
train.py		train.py
tutorial.ipynb		tutorial.ipynb
val.py		val.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Extract VPR model to SNPE format

Steps to converting the model to SNPE format to run on the DSP:

Train the detector on COCO dataset

Extract the onnx model from the YOLOv5 repo

Make sure your model looks good

Convert the model to SNPE format.

Convert dlc to quantized dlc

Run the model on the DSP

About

Releases

Packages

Languages

License

aleshem/yolov5_snpe_conversion

Folders and files

Latest commit

History

Repository files navigation

Extract VPR model to SNPE format

Steps to converting the model to SNPE format to run on the DSP:

Train the detector on COCO dataset

Extract the onnx model from the YOLOv5 repo

Make sure your model looks good

Convert the model to SNPE format.

Convert dlc to quantized dlc

Run the model on the DSP

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages