Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[QUESTION] Issue when translating output to bboxes with NMS. Using YoloV5L6.onnx #48

Closed
divineSix opened this issue Feb 5, 2024 · 4 comments

Comments

@divineSix
Copy link

I've been looking through the code, and applied the NMS code from here.

The outputs are a little bit strange, looking like they've drifted upwards. How do I get this resolved?

output

@divineSix
Copy link
Author

For reference, this is the output with yolov5's detect.py using yolov5l6.pt.

image

@cyrusbehr
Copy link
Owner

Closing this issue as it's not related to my inference code, but instead the post process code you have written. That being said, if you provide your full code and model I can take a look.

@divineSix
Copy link
Author

divineSix commented Feb 5, 2024

I've written some very basic post-processing code to read the data from the featureVectors in your code. For replication, I'm using the yolov5l6 model, exported to onnx from the official ultralytics repository.
This snippet is in main.cpp, just after printing the feature vector.

// POST PROCESSING
    std::vector<cv::Rect> bboxes;
    std::vector<float> confs; // Overall Conf
    std::vector<int> classIds;

    std::vector<float> output = featureVectors[0][0];
    int dims = 85; 
    int rows = output.size() / dims;

    for (int i=0; i < rows; i++) {
        int index = i * dims;
        if (output[index+4] <= CONF_THRESH) continue;

        int centerX = (int)(output[index]);
        int centerY = (int)(output[index+1]);
        int width = (int)(output[index+2]);
        int height = (int)(output[index+3]);
        int left = centerX - width / 2;
        int top = centerY - height / 2;

        float bestClassProb = 0; 
        int bestClassId = 5;
        for (int j=index+5; j < index+80; j++) {
            if (output[j] > bestClassProb) {
                bestClassProb = output[j];
                bestClassId = j - (index + 5);
            }
        }

        float conf = output[index+4] * bestClassProb;
        bboxes.emplace_back(left, top, width, height);
        confs.emplace_back(conf);
        classIds.emplace_back(bestClassId);
    }

    std::vector<int> indices;
    cv::dnn::NMSBoxes(bboxes, confs, CONF_THRESH, NMS_THRESH, indices);

    std::vector<Detection> dets;
    for (int idx: indices) {
        Detection det;
        det.box = cv::Rect(bboxes[idx]);
        scaleCoords(cv::Size(640, 640), det.box, cpuImg.size());
        det.conf = confs[idx];
        det.classId = classIds[idx];
        dets.emplace_back(det);
    }

    visDet(cpuImg, dets);
    cv::imwrite("new_output.jpg", cpuImg);

The below snippet is for the related functions.

void visDet(cv::Mat& image, std::vector<Detection>& detections) {
    for (const Detection& det : detections) {
        cv::rectangle(image, det.box, cv::Scalar(229, 160, 21), 2);
        int x = det.box.x;
        int y = det.box.y;
        int conf = (int)std::round(det.conf * 100);
        int classId = det.classId;
        std::string label = "cls" + std::to_string(classId) + " " + std::to_string(conf) + "%";

        int baseline = 0;
        cv::Size size = cv::getTextSize(label, cv::FONT_ITALIC, 0.8, 2, &baseline);
        cv::rectangle(image,
                      cv::Point(x, y - 25), cv::Point(x + size.width, y),
                      cv::Scalar(229, 160, 21), -1);

        cv::putText(image, label,
                    cv::Point(x, y - 3), cv::FONT_ITALIC,
                    0.8, cv::Scalar(255, 255, 255), 2);
    }
}

void scaleCoords(const cv::Size& imageShape, cv::Rect& coords, const cv::Size& imageOriginalShape) {
    float gain = std::min((float)imageShape.height / (float)imageOriginalShape.height,
                          (float)imageShape.width / (float)imageOriginalShape.width);

    int pad[2] = {(int) (( (float)imageShape.width - (float)imageOriginalShape.width * gain) / 2.0f),
                  (int) (( (float)imageShape.height - (float)imageOriginalShape.height * gain) / 2.0f)};

    coords.x = (int) std::round(((float)(coords.x - pad[0]) / gain));
    coords.y = (int) std::round(((float)(coords.y - pad[1]) / gain));

    coords.width = (int) std::round(((float)coords.width / gain));
    coords.height = (int) std::round(((float)coords.height / gain));
}

It'd be great if you can share your own post-processing code as well. I can look into it and understand where I might be going wrong.

@cyrusbehr
Copy link
Owner

I only have post processing code for YoloV8, which you can find here. I imagine though that for YOLOv5 it's different.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants