Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add examples for detection models finetuning #30422

Merged
merged 35 commits into from
May 8, 2024

Conversation

qubvel
Copy link
Member

@qubvel qubvel commented Apr 23, 2024

What does this PR do?

Add examples how to fine-tune DETR, DETA, Deformable DETR, Conditional DETR, YOLOS with Trainer and Accelerate.

Introduced evaluation in Trainer API for detection models. Now it is possible to train models with ongoing evaluation and metrics tracking (fixed in #30267), this unblocks selecting the best checkpoint based on metric instead of loss.
Simplified metrics computation pipeline, reducing bounding boxes and output format conversions.

Finetuned models can be found here.
W&B report can be found here.

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a Github issue or the forum? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the
    documentation guidelines, and
    here are tips on formatting docstrings.
  • Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@NielsRogge

@qubvel qubvel changed the title Add examples for detection models fintuning Add examples for detection models finetuning Apr 23, 2024
@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@qubvel qubvel force-pushed the detection-finetuning-example branch from 8d2dca0 to 824b883 Compare April 26, 2024 11:14
@qubvel
Copy link
Member Author

qubvel commented Apr 26, 2024

@amyeroberts could you please review. Futher improvements of the training pipeline can be done as a next PR.

Copy link
Collaborator

@amyeroberts amyeroberts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! Thanks for all the work adding this and the additional PRs to make these models work well in our library ❤️

Just a few small comments - main one about not using the image processor to process the images for speed considerations. All comments for the trainer script obviously apply to the non-trainer one.

examples/pytorch/object-detection/run_object_detection.py Outdated Show resolved Hide resolved
examples/pytorch/object-detection/run_object_detection.py Outdated Show resolved Hide resolved
examples/pytorch/object-detection/run_object_detection.py Outdated Show resolved Hide resolved
Comment on lines +123 to +141
images = []
annotations = []
for image_id, image, objects in zip(examples["image_id"], examples["image"], examples["objects"]):
image = np.array(image.convert("RGB"))

# apply augmentations
output = transform(image=image, bboxes=objects["bbox"], category=objects["category"])
images.append(output["image"])

# format annotations in COCO format
formatted_annotations = format_image_annotations_as_coco(
image_id, output["category"], objects["area"], output["bboxes"]
)
annotations.append(formatted_annotations)

# Apply the image processor transformations: resizing, rescaling, normalization
result = image_processor(images=images, annotations=annotations, return_tensors="pt")

return result
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd recommend not using the image processor at all. They're very slow, in part because the resizing is done in pillow (for historical reasons). This means every image is converted to numpy -> PIL.Image.Image -> numpy. Instead, I'd just do all of the transformations within the library of choice (albumentations, torchvision etc.) and use the image processor for values like size if needed

Copy link
Member Author

@qubvel qubvel Apr 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest leaving image_processor for correct input formatting and turning off padding and resizing.

    image_processor = AutoImageProcessor.from_pretrained(
        model_args.image_processor_name or model_args.model_name_or_path,
        # At this moment we recommend using external transform to pad and resize images.
        # It`s faster and yields much better results for object-detection models.
        do_pad=False,
        do_resize=False,
        # We will save image size parameter in config just for reference
        size={"longest_edge": data_args.image_square_size},
        **common_pretrained_args,
    )

For padding and resizing I suggest the following strategy:

  1. Resize the largest size of an image to image_square_size
  2. Pad image to image_square_size x image_square_size

This strategy yields much better results in terms of mAP and also almost removes batch dependency for evaluation.
Here are two models trained with both strategies and evaluated for batch sizes 8 and 1.

Screenshot 2024-04-30 at 11 44 49

Deformable DETR fine-tuned with such padding archives mAP@0.5..0.95 = 0.5414, while on papers with code top model TridentNet achieve lower mAP@0.5..0.95 = 0.529 on CPPE-5 dataset (probably, this LB is outdated, but still can be used as a reference).

Please, let me know what you think about this. Is it worth adding this strategy to image processors too?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the detailed explanation and runs. OK, sounds good to me!

Regarding adding this to the image processors, it's a bit tricky as we need to account for backwards compatibility: even though this produces better results, DETR is a commonly used model and we shouldn't change the default behaviour. One option would be to add a flag to the image processor, which allows the user to pick the padding strategy, falling back to the current one by default

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, got it. I will update accelerate example and make sure the tests pass.

Regarding image processors I understand the backward compatibility issues, here are some options we can implement

  1. Add preserve_image_ratio flag (default is False). In combination with size = {height: ...; width: ...} it can be used to implement the suggested strategy. It is flexible, so we can use it even for non-square sizes. An image will be resized to respect height/width depending on the longest side and then padded to the specified size. But I dont like that it is not evident, and preserve_image_ratio=False may confuse for size = {longest_edge: ...; shortest_edge: ...} option, because this size option preserve image ratio by default.
  2. As you suggested, add a flag pad_strategy="batch". For batch it will follow current behaviour, for pad_strategy="size" it will pad to size = {height: ...; width: ...}. The problem with size = {longest_edge: ...; shortest_edge: ...} - we do not know which one is height and which is width, but we can raise an error forpad_strategy="size" to provide a correct size dict with height and width.

Both of them will require changing current resize and pad logic, but the second option seems better to me, let me know if you have any thoughts on that

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed - let's go with the second option 🤝

examples/pytorch/object-detection/run_object_detection.py Outdated Show resolved Hide resolved
@qubvel
Copy link
Member Author

qubvel commented Apr 30, 2024

Something is wrong with albumentations installation from the git+commit, it's a bit strange, dependencies were not changed and previously it worked. Maybe some issue with caching. I will wait for tomorrow for albumentations release, this should resolve the issue, otherwise will inspect it.

@qubvel
Copy link
Member Author

qubvel commented May 6, 2024

@amyeroberts the comments are addressed and tests passed, can you please approve if it is OK now

@NielsRogge NielsRogge added this to In progress in Computer vision May 6, 2024
@qubvel qubvel mentioned this pull request May 6, 2024
4 tasks
Copy link
Collaborator

@amyeroberts amyeroberts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding this and iterating! Looks great 🤗

It would be good to have the updated image processing logic included, such that we bypass the expensive resizing in the image processors, but happy to merge as-is and include in a follow-up.

I suspect we might have to change the image_square_size argument to something more flexible, to account for models which accept non-square inputs. Let's leave for now and cross that bridge when we come to it.

@qubvel qubvel merged commit 998dbe0 into huggingface:main May 8, 2024
8 checks passed
@NielsRogge NielsRogge moved this from In progress to Done in Computer vision May 8, 2024
@muellerzr
Copy link
Contributor

@qubvel BTW I'm seeing these fail during multi-GPU tests. Here's the trace from our nightly

FAILED examples/pytorch/test_pytorch_examples.py::ExamplesTests::test_run_object_detection - IndexError: Caught IndexError in replica 0 on device 0.
Original Traceback (most recent call last):
  File "/opt/conda/envs/accelerate/lib/python3.10/site-packages/torch/nn/parallel/parallel_apply.py", line 83, in _worker
    output = module(*input, **kwargs)
  File "/opt/conda/envs/accelerate/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/opt/conda/envs/accelerate/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
  File "/opt/conda/envs/accelerate/lib/python3.10/site-packages/transformers/models/detr/modeling_detr.py", line 1611, in forward
    loss_dict = criterion(outputs_loss, labels)
  File "/opt/conda/envs/accelerate/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/opt/conda/envs/accelerate/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
  File "/opt/conda/envs/accelerate/lib/python3.10/site-packages/transformers/models/detr/modeling_detr.py", line 2210, in forward
    indices = self.matcher(outputs_without_aux, targets)
  File "/opt/conda/envs/accelerate/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/opt/conda/envs/accelerate/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
  File "/opt/conda/envs/accelerate/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/opt/conda/envs/accelerate/lib/python3.10/site-packages/transformers/models/detr/modeling_detr.py", line 2339, in forward
    indices = [linear_sum_assignment(c[i]) for i, c in enumerate(cost_matrix.split(sizes, -1))]
  File "/opt/conda/envs/accelerate/lib/python3.10/site-packages/transformers/models/detr/modeling_detr.py", line 2339, in <listcomp>
    indices = [linear_sum_assignment(c[i]) for i, c in enumerate(cost_matrix.split(sizes, -1))]
IndexError: index 2 is out of bounds for dimension 0 with size 2

@qubvel
Copy link
Member Author

qubvel commented May 9, 2024

@muellerzr thanks for letting me know, I will try to figure out why that happens

zucchini-nlp pushed a commit to zucchini-nlp/transformers that referenced this pull request May 10, 2024
* Training script for object detection

* Evaluation script for object detection

* Training script for object detection with eval loop outside trainer

* Trainer DETR finetuning

* No trainer DETR finetuning

* Eval script

* Refine object detection example with trainer

* Remove commented code and enable telemetry

* No trainer example

* Add requirements for object detection examples

* Add test for trainer example

* Readme draft

* Fix uploading to HUB

* Readme improvements

* Update eval script

* Adding tests for object-detection examples

* Add object-detection example

* Add object-detection resources to docs

* Update README with custom dataset instructions

* Update year

* Replace valid with validation

* Update instructions for custom dataset

* Remove eval script

* Remove use_auth_token

* Add copied from and telemetry

* Fixup

* Update readme

* Fix id2label

* Fix links in docs

* Update examples/pytorch/object-detection/run_object_detection.py

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Update examples/pytorch/object-detection/run_object_detection.py

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Move description to the top

* Fix Trainer example

* Update no trainer example

* Update albumentations version

---------

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
itazap pushed a commit that referenced this pull request May 14, 2024
* Training script for object detection

* Evaluation script for object detection

* Training script for object detection with eval loop outside trainer

* Trainer DETR finetuning

* No trainer DETR finetuning

* Eval script

* Refine object detection example with trainer

* Remove commented code and enable telemetry

* No trainer example

* Add requirements for object detection examples

* Add test for trainer example

* Readme draft

* Fix uploading to HUB

* Readme improvements

* Update eval script

* Adding tests for object-detection examples

* Add object-detection example

* Add object-detection resources to docs

* Update README with custom dataset instructions

* Update year

* Replace valid with validation

* Update instructions for custom dataset

* Remove eval script

* Remove use_auth_token

* Add copied from and telemetry

* Fixup

* Update readme

* Fix id2label

* Fix links in docs

* Update examples/pytorch/object-detection/run_object_detection.py

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Update examples/pytorch/object-detection/run_object_detection.py

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Move description to the top

* Fix Trainer example

* Update no trainer example

* Update albumentations version

---------

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Development

Successfully merging this pull request may close these issues.

None yet

5 participants