Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add tensor-based annotation storage to reduce DDP RAM usage with large COCO-format datasets #1885

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

spsancti
Copy link
Contributor

@spsancti spsancti commented Mar 5, 2024

Source of the bug: pytorch/pytorch#13246
Solution based on: facebookresearch/detectron2@0cd0e72
Fixes: #1214

@spsancti spsancti changed the title Add tensor-based annotation storage to reduce DDP RAM usage with large datasets Add tensor-based annotation storage to reduce DDP RAM usage with large COCO-format datasets Mar 5, 2024
Copy link
Contributor

@NatanBagrov NatanBagrov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! See comments inline.

Comment on lines 150 to 152
start_addr = 0 if sample_id == 0 else self._addr[sample_id - 1].item()
end_addr = self._addr[sample_id].item()
annotation = pickle.loads(self._annotations[start_addr:end_addr].numpy().data)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We saw slowdown due to pickle load, do we want to make the serialize-parse fix optional?
I mean, eventually a memory leak is a memory leak, but for small datasets you get overhead whereas without the fix "you'd be fine". @BloodAxe , thoughts?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@NatanBagrov yes IMO

Comment on lines 150 to 152
start_addr = 0 if sample_id == 0 else self._addr[sample_id - 1].item()
end_addr = self._addr[sample_id].item()
annotation = pickle.loads(self._annotations[start_addr:end_addr].numpy().data)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@NatanBagrov yes IMO

@NatanBagrov NatanBagrov requested a review from shaydeci May 27, 2024 06:59
Copy link
Collaborator

@shaydeci shaydeci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To me this looks quite ready.
I do have one more reqyest: please add a unit test that utilizes this feature so that we see nothing crashes. You can use our data in tests/data/coco2017 (see how we use it for example in tests/unit_tests/preprocessing_unit_test.py).
It can be a simple test that just iterates throught the dataset with use_tensor_backed_storage set.

@@ -52,6 +56,7 @@ def __init__(
:param with_crowd: Add the crowd groundtruths to __getitem__
:param class_ids_to_ignore: List of class ids to ignore in the dataset. By default, doesnt ignore any class.
:param tight_box_rotation: This parameter is deprecated and will be removed in a SuperGradients 3.8.
:param use_tensor_backed_storage: Whether to use tensor backed storage to mitigate python memory leak with large datasets ()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe give an estimate what's considered "large" as a recommendation, from your experience.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

YOLO-NAS Training Halted with 'Killed' Error Message
4 participants