Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clip TTA Augmented Tails #5028

Merged
merged 5 commits into from
Oct 4, 2021
Merged

Clip TTA Augmented Tails #5028

merged 5 commits into from
Oct 4, 2021

Conversation

glenn-jocher
Copy link
Member

@glenn-jocher glenn-jocher commented Oct 2, 2021

Experimental TTA update for a new idea I came up with.

πŸ› οΈ PR Summary

Made with ❀️ by Ultralytics Actions

🌟 Summary

Enhanced YOLOv5 augmented inference by incorporating augmented tail clipping.

πŸ“Š Key Changes

  • Added a new function _clip_augmented to the YOLOv5 model code.
  • Modified _forward_augment function to call _clip_augmented, effectively clipping the augmented tails after predictions are made.

🎯 Purpose & Impact

  • Purpose: To ensure that the predictions made during augmented inference do not include augmented 'tails' that may lower the accuracy or quality of the model’s outputs.
  • Impact: This change is expected to increase the precision of the YOLOv5 model during inference with augmented data, resulting in more reliable and robust object detection. Users should benefit from better performance in their applications without making any adjustments to their existing workflows. πŸ› οΈπŸ“ˆ

πŸ€–: Please note that these changes are primarily relevant to users leveraging data augmentation techniques during inference with YOLOv5. Regular users without such augmentation should not be affected.

@glenn-jocher
Copy link
Member Author

/rebase

@glenn-jocher glenn-jocher merged commit d133968 into master Oct 4, 2021
@glenn-jocher glenn-jocher deleted the update/clip_augmented_tails branch October 4, 2021 22:48
BjarneKuehl pushed a commit to fhkiel-mlaip/yolov5 that referenced this pull request Aug 26, 2022
* Clip TTA Augmented Tails

Experimental TTA update.

* Update yolo.py

* Update yolo.py

* Update yolo.py

* Update yolo.py
@timothylimyl
Copy link

@glenn-jocher any explanation on this update?

I do not quite get what does _clip_augmented does to the outputs:

    def _clip_augmented(self, y):
        # Clip YOLOv5 augmented inference tails
        nl = self.model[-1].nl  # number of detection layers (P3-P5)
        g = sum(4**x for x in range(nl))  # grid points
        e = 1  # exclude layer count
        i = (y[0].shape[1] // g) * sum(4**x for x in range(e))  # indices
        y[0] = y[0][:, :-i]  # large
        i = (y[-1].shape[1] // g) * sum(4 ** (nl - 1 - x) for x in range(e))  # indices
        y[-1] = y[-1][:, i:]  # small
        return y

@timothylimyl
Copy link

also, if you want to separate the scale to large and small, wouldn't it be better to set up the scale to be something like [1.3, 1, 0.67] instead of [1,0.83,0.67] and recommending users to +30% img size when running the scripts. Sorry if I missed anything.

@glenn-jocher
Copy link
Member Author

glenn-jocher commented Dec 8, 2022

@timothylimyl large objects are clipped from large augmentation and vice versa, produced better empirical results in testing. You can disable by simply commenting the line.

Scaling up a low-res image is inferior to scaling down a high-res image. If you find any improvements to our TTA though please feel free to submit a PR.

@gavin-trendii
Copy link

@glenn-jocher
Could you please further explain the _clip_augmented?
From what I'm understanding now, this function would like to drop some predictions before nms, but I cannot get the logic here. Could you please use the large augmentation as an example to explain the meaning of each line?

 g = sum(4**x for x in range(nl))  # grid points
 e = 1  # exclude layer count
 i = (y[0].shape[1] // g) * sum(4**x for x in range(e))  # indices
 y[0] = y[0][:, :-i]  # large

Thanks a lot :)

@glenn-jocher
Copy link
Member Author

@gavin-trendii The _clip_augmented function aims to improve augmented inference results by removing superfluous predictions before NMS. The logic is to drop predictions from the tails of the output from the last layer, designed to be small and large.

  • g = sum(4**x for x in range(nl)) calculates the grid points based on the number of detection layers.
  • e = 1 excludes the layer count, and i calculates the indices for the large objects.
  • y[0] = y[0][:, :-i] removes the tail of predictions for large objects.

This process is similar for small objects with adjustments for the indices. This helps to improve inference by filtering out redundant predictions. Let me know if you have further questions!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants