Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Verify interpolation of image processors #28180

Open
34 tasks
NielsRogge opened this issue Dec 21, 2023 · 7 comments
Open
34 tasks

Verify interpolation of image processors #28180

NielsRogge opened this issue Dec 21, 2023 · 7 comments

Comments

@NielsRogge
Copy link
Contributor

NielsRogge commented Dec 21, 2023

Feature request

As pointed out in #27742, some image processors might need a correction on the default interpolation method being used (resampling in Pillow). We could check this on a per-model basis.

Motivation

Interpolation methods have a slight (often minimal) impact on performance. However it would be great to verify this on a per-model basis.

e.g. ViT's image processor defaults to BILINEAR but should use BICUBIC as seen here. We can update the default values of the image processors, but can't update the configs on the hub as this would break people's fine-tuned models.

Your contribution

I could work on this, but this seems like a good first issue for first contributors.

To be checked (by comparing against original implementation):

  • beit
  • bit
  • clip
  • convnext
  • convnextv2
  • cvt
  • data2vec-vision
  • deit
  • dinat
  • dinov2
  • efficientformer
  • efficientnet
  • focalnet
  • imagegpt
  • levit
  • mobilenet_v1
  • mobilenet_v2
  • mobilevit
  • mobilevitv2
  • nat
  • perceiver
  • poolformer
  • pvt
  • regnet
  • resnet
  • segformer
  • siglip
  • swiftformer
  • swin
  • swinv2
  • van
  • vit
  • vit_hybrid
  • vit_msn
@NielsRogge NielsRogge mentioned this issue Dec 21, 2023
3 tasks
@amyeroberts
Copy link
Collaborator

@NielsRogge Thanks for opening the issue!

It's fine to open up to the community but you'll need to add a checklist of the image processors so it's clear who is working on what and what's done as well as ideally some instructions on what it means for each one to be "done" e.g. making sure to run slow tests for models.

@huggingface huggingface deleted a comment from github-actions bot Jan 21, 2024
@amyeroberts amyeroberts mentioned this issue Jan 23, 2024
8 tasks
@nileshkokane01
Copy link
Contributor

nileshkokane01 commented Jan 24, 2024

@NielsRogge ,

If I understand it correctly, we need to match the interpolation:

For example for convnext: convnext should be changed to Bicubic as per timm/convnext .

If that's correct , I can take this up for all the models. Let me know.

@NielsRogge
Copy link
Contributor Author

Yes that is correct, see also the original implementation. Thanks for spotting that. Hence feel free to open a PR to update this, along with the image processor created in the conversion script. Ideally we assert the pixel values created by it against the original implementation, like done here for DINOv2.

@nileshkokane01
Copy link
Contributor

Sure! thanks for the pointers, will work on it.

@nileshkokane01
Copy link
Contributor

nileshkokane01 commented Jan 25, 2024

DieT and DPT default interpolation types matches with the original implementation types to BICUBIC . That's what I see it. Let me know if I overlooked.

@nileshkokane01
Copy link
Contributor

@NielsRogge ,

would you have a look ?

@amyeroberts
Copy link
Collaborator

@NielsRogge Can you please complete the checklist here?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants