Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] A very simple but effective trick for deterministic issue in detection. #9831

Open
GeoffreyChen777 opened this issue Feb 24, 2023 · 2 comments
Assignees

Comments

@GeoffreyChen777
Copy link

GeoffreyChen777 commented Feb 24, 2023

What's the feature?

Deterministic issue in detection models.

We all know that it's hard to reproduce identical results for two runs in detection even we use the same radom seed and set deterministic = True.

Recently, a issue in detectron2 mentioned that it is mainly caused by using atomicAdd in the backward pass. And ROIAlign is the key module.

facebookresearch/detectron2#4723

And he/she mentioned a very simple trick solution for this issue.

  • Truncate the input to a smaller datatype, this gives a starting point with a very small number of significand bits used
  • Then, cast to a larger data-type just before doing the computations that involve atomicAdd

He/She tried this and found fully deterministic (losses values, and evaluation results on COCO) upto tens of thousands of steps (using same code as in facebookresearch/detectron2#4260) for:

  • MaskRCNN based on ResNet-50 bakbone
  • MaskRCNN based on ResNeXt-101 bakbone
  • Wide range of batch sizes
  • Mixed-precision training
  • Single and Multi-GPU training
  • A100's & V100's

I also tried this trick in mmdet by:

changing

to

return self.roi_layers[0](feats[0].half().double(), rois.half().double()).to(rois.dtype)

I also found that it can solve the deterministic issue in mmdet. The loss and mAP is exactly identical after I use this trick, and it doesn't affect the mAP (at least in my project).

I think it would be good to implement this trick in mmdet with a configuarable param to turn on/off it. We can do this in the roi extractor module, or in the ROIAlign layer of mmcv.

Any other context?

No response

@hhaAndroid
Copy link
Collaborator

This is a very interesting trick.

@GeoffreyChen777
Copy link
Author

Yes, and very easy to implement.

I just tried it in my own project (a model based on Faster RCNN).

I didn't test every model in mmdet with this trick yet. If you are going to implement and release this in mmdet, please test it first. (I don't have enough GPU to do that 😥)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants