Allow FP16 or other precision inference for Pipelines #31342

aliencaocao · 2024-06-10T05:57:00Z

What does this PR do?

Currently, if you pass torch_dtype=torch.float16 or set model=AutoModel.from_pretrained(..., torch_dtype=torch.float16) in hope to use FP16 for inference in a Pipeline, it will fail because although the model is casted to FP16, the inputs like image features stays in fp32 as the default torch dtype.

This PR converts them accordingly. It only convert those that comes out of a image_processor and with type torch.float32 so to not accidently touch things like token ids or boxes which may be in torch.int by intention.

~~I have not checked pipelines involving audio inputs but I would imagine some of them also having the same issue.~~

I originally found this issue when using ZeroShotImageClassificationPipeline like this:

ZeroShotImageClassificationPipeline(model=AutoModelForZeroShotImageClassification.from_pretrained(clip_path, torch_dtype=torch.float16), tokenizer=AutoTokenizer.from_pretrained(clip_path), image_processor=AutoImageProcessor.from_pretrained(clip_path), device='cuda')

~~Note that I have yet to write tests for it as I want to make sure this is a valid issue and I am not just using Pipelines wrongly.~~

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

@Narsil

…her precision in pipelines

aliencaocao · 2024-06-10T06:02:57Z

Anyone know how do I fix the code quality errors from here instead of running ruff locally? I don't have one setup now...
but strange thing I followed all the existing imports formats

src/transformers/pipelines/image_feature_extraction.py

amyeroberts

Thanks for adding this feature!

You should be able to directly use .to on the image processor outputs.

All of the pipelines should have tests added to check they can accept and run with fp16 inputs

src/transformers/pipelines/depth_estimation.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

aliencaocao · 2024-06-11T14:50:35Z

And for tests, do you think I can just add a dtype=torch.float16 in the pipeline init method in existing tests, or must I keep the fp32 and do a new fp16 run? I feel the latter is unnecessary

amyeroberts · 2024-06-12T18:19:06Z

And for tests, do you think I can just add a dtype=torch.float16 in the pipeline init method in existing tests

Certainly not 😄 fp32 is the default for pipelines and so this should be tested by default. We'll need to add tests which set torch_dtype=torch.float16

aliencaocao · 2024-06-13T15:04:07Z

Sure, i'll add that

aliencaocao · 2024-06-25T03:59:58Z

@amyeroberts do I need to test for numerical similarity in fp16 or just make sure the inference runs?

Testing for numerical is quite some work and slow to run as I have to download each model in each pipeline (i changed 14 of them) then get the expected logits then check allclose. However this can break for indiv models depending on their size and sensitively to numerical precision, and the threshold for allclose will vary between task and models too. Ultimately, even if a model does not work well when using FP16 VS FP32, there is really nothing we can do here and I think it should be up to the users to evaluate themselves.

aliencaocao · 2024-06-25T07:50:57Z

I have pushed the tests for inference and not check for numerical stability by using the existing code in pipeline test mixin, except for a few models where no mixin common methods (get_test_pipeline, run_pipeline_test) were declared so I added them in the respective test scripts on the test_small_model_pt methods.

image to image test has @slow, would need the run slow tag to run that.

amyeroberts · 2024-06-25T11:41:23Z

@aliencaocao Great!

At the moment, there's a few pipeline tests failing which will need to be resolved. To run slow tests locally, you can set the RUN_SLOW flag with RUN_SLOW=1 pytest ...

aliencaocao · 2024-06-25T11:53:53Z

there's a few pipeline tests failing which will need to be resolved

yes i will be resolving them but i see 1 error with owlvit (zero-shot-object-detection): value cannot be converted to type at::Half without overflow. This indicates that the weights exceed range of fp16. Normally we dont see that in models. I think I have to skip this test unless we can switch to another model like OWLv2-base.

amyeroberts · 2024-06-25T12:22:43Z

@aliencaocao For this test, as it's testing the pipeline rather than the model itself, we can change the checkpoint/architecture used. It will likely need other values to be updated alongside.

aliencaocao · 2024-06-25T13:07:14Z

The image to image slow test fails because swin2sr impl has a issue where it does not cast an intermediate tensor to the type of other model parameters.
Specifically, I have to modify

transformers/src/transformers/models/swin2sr/modeling_swin2sr.py

Line 349 in 0f67ba1

    
           relative_position_bias_table = self.continuous_position_bias_mlp(self.relative_coords_table).view(

to add

relative_coords_table = relative_coords_table.to(next(self.continuous_position_bias_mlp.parameters()).dtype)

As relative_coords_table is being passed into this MLP on this line.

Should I make a new PR for this change or add it here?

amyeroberts · 2024-06-25T13:10:46Z

Could you do this in a separate PR please? It'll be easier to track this way

aliencaocao · 2024-06-25T13:19:48Z

PR made #31589

aliencaocao · 2024-06-25T15:00:46Z

4 other failed tests can be fixed by #31590 - a small QoL improvement

aliencaocao · 2024-06-26T15:35:02Z

For the failing owlvit test, it is not because weight overflow, but implementation:

transformers/src/transformers/models/owlvit/modeling_owlvit.py

Line 1260 in 3f93fd0

pred_logits = torch.where(query_mask == 0, -1e6, pred_logits)

-1e6 is simply out of range for fp16, min is -65504.

Should we correct the impl? I can do a check on other models for similar issues.

The fix should be quite simple, just torch.finfo(pred_logits.dtype).min. Only thing is it does affect all existing model's outputs - but they are masked and meant to be ignored anyways

amyeroberts · 2024-06-26T16:21:58Z

@aliencaocao Yes, let's use torch.finfo(pred_logits.dtype).min 👍

Only thing is it does affect all existing model's outputs

Here does "all models" refer to all owlvit checkpoints? If that's the case, then that's OK!

aliencaocao · 2024-06-26T16:27:24Z

Here does "all models" refer to all owlvit checkpoints?

What I meant was all torch impl of models in HF transformers as more may be using a hard-coded out of range value for masking logits like owlvit. So for those that end up changing, then outputs may be affected.

Existing tests already check for logits and we will know if anything gets affected. Ideally, none of them should if the masking is working as intended.

Do you want a new PR to update the code for owlvit and potentially other models that use the same, or do I change it here?

amyeroberts · 2024-06-26T17:16:28Z

Do you want a new PR to update the code for owlvit and potentially other models that use the same, or do I change it here?

Up to you. Having it correct across all models if of course the dream, but it can be a bit laborious making sure this is correct everywhere + tests and might not be worth it for low-use models. I'm happy to just have the change made for owlvit here, and then we can think about other models' compatibility with fp16 if users raise it

# Conflicts: # tests/pipelines/test_pipelines_feature_extraction.py # tests/pipelines/test_pipelines_zero_shot_audio_classification.py

aliencaocao · 2024-06-28T05:57:33Z

@amyeroberts the failing tf and onnx tests are due to some keras changes in https://github.com/keras-team/keras/releases/tag/v3.4.1

The failing torch pipeline test is due to network timeout

amyeroberts · 2024-06-28T13:50:10Z

@aliencaocao Yes, unfortunately the keras update has broken everything 😭

We're working on a fix. I'll ping once resolved and hopefully then we can successfully re-run the CI for this PR

amyeroberts · 2024-07-05T14:17:50Z

@aliencaocao There's been fixes for keras and some of the timeout errors. Could you rebase to include these - should then make all the CIs green

aliencaocao · 2024-07-05T16:11:14Z

@amyeroberts CI all green now

amyeroberts

Great work - thanks for adding!

cast image features to model.dtype where needed to support FP16 or ot…

c9b2742

…her precision in pipelines

amyeroberts reviewed Jun 10, 2024

View reviewed changes

src/transformers/pipelines/image_feature_extraction.py Outdated Show resolved Hide resolved

amyeroberts reviewed Jun 10, 2024

View reviewed changes

src/transformers/pipelines/depth_estimation.py Outdated Show resolved Hide resolved

Update src/transformers/pipelines/image_feature_extraction.py

18ae03b

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

aliencaocao added 4 commits June 13, 2024 23:50

Use .to instead

6a28f4b

Merge branch 'huggingface:main' into fix-pipeline-dtype

4edf0cb

Merge branch 'huggingface:main' into fix-pipeline-dtype

d9c9e14

Add FP16 pipeline support for zeroshot audio classification

a929dd1

aliencaocao added 3 commits June 25, 2024 12:04

Remove unused torch imports

7897364

Add docs on FP16 pipeline

20289fb

Remove unused import

a180471

aliencaocao changed the title ~~Allow FP16 or other precision inference for Pipelines involving image features~~ Allow FP16 or other precision inference for Pipelines Jun 25, 2024

aliencaocao added 3 commits June 25, 2024 13:07

Add FP16 tests to pipeline mixin

88d9c29

Add fp16 placeholder for mask_generation pipeline test

d01fcaf

Add FP16 tests for all pipelines

74c652f

aliencaocao added 2 commits June 25, 2024 15:53

Fix formatting

c792b08

Remove torch_dtype arg from is_pipeline_test_to_skip*

7528ae7

aliencaocao mentioned this pull request Jun 25, 2024

Fix dtype casting in swinv2 and swinv2sr to allow non-FP32 inference #31589

Merged

5 tasks

aliencaocao mentioned this pull request Jun 26, 2024

enable low-precision pipeline #31625

Merged

aliencaocao added 2 commits June 27, 2024 21:46

Merge branch 'refs/heads/main' into fix-pipeline-dtype

60e0e5e

# Conflicts: # tests/pipelines/test_pipelines_feature_extraction.py # tests/pipelines/test_pipelines_zero_shot_audio_classification.py

Fix format

8fdb22d

aliencaocao mentioned this pull request Jun 27, 2024

Fix float out of range in owlvit and owlv2 when using FP16 or lower precision #31657

Merged

5 tasks

aliencaocao added 2 commits June 28, 2024 13:34

Merge branch 'huggingface:main' into fix-pipeline-dtype

9099e99

trigger ci

233ed50

Merge branch 'huggingface:main' into fix-pipeline-dtype

6110404

amyeroberts approved these changes Jul 5, 2024

View reviewed changes

amyeroberts merged commit ac26260 into huggingface:main Jul 5, 2024
18 checks passed

aliencaocao deleted the fix-pipeline-dtype branch July 5, 2024 16:22

amyeroberts mentioned this pull request Jul 5, 2024

Fix pipeline tests - don't set torch_dtype on non-torch pipelines #31809

Open

ydshieh mentioned this pull request Jul 7, 2024

Avoid failure TFBlipModelTest::test_pipeline_image_to_text #31827

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow FP16 or other precision inference for Pipelines #31342

Allow FP16 or other precision inference for Pipelines #31342

aliencaocao commented Jun 10, 2024 •

edited

Loading

aliencaocao commented Jun 10, 2024 •

edited

Loading

amyeroberts left a comment

aliencaocao commented Jun 11, 2024

amyeroberts commented Jun 12, 2024

aliencaocao commented Jun 13, 2024

aliencaocao commented Jun 25, 2024 •

edited

Loading

aliencaocao commented Jun 25, 2024

amyeroberts commented Jun 25, 2024

aliencaocao commented Jun 25, 2024 •

edited

Loading

amyeroberts commented Jun 25, 2024

aliencaocao commented Jun 25, 2024 •

edited

Loading

amyeroberts commented Jun 25, 2024

aliencaocao commented Jun 25, 2024

aliencaocao commented Jun 25, 2024

aliencaocao commented Jun 26, 2024 •

edited

Loading

amyeroberts commented Jun 26, 2024

aliencaocao commented Jun 26, 2024

amyeroberts commented Jun 26, 2024

aliencaocao commented Jun 28, 2024

amyeroberts commented Jun 28, 2024

amyeroberts commented Jul 5, 2024

aliencaocao commented Jul 5, 2024

amyeroberts left a comment

Allow FP16 or other precision inference for Pipelines #31342

Allow FP16 or other precision inference for Pipelines #31342

Conversation

aliencaocao commented Jun 10, 2024 • edited Loading

What does this PR do?

Before submitting

Who can review?

aliencaocao commented Jun 10, 2024 • edited Loading

amyeroberts left a comment

Choose a reason for hiding this comment

aliencaocao commented Jun 11, 2024

amyeroberts commented Jun 12, 2024

aliencaocao commented Jun 13, 2024

aliencaocao commented Jun 25, 2024 • edited Loading

aliencaocao commented Jun 25, 2024

amyeroberts commented Jun 25, 2024

aliencaocao commented Jun 25, 2024 • edited Loading

amyeroberts commented Jun 25, 2024

aliencaocao commented Jun 25, 2024 • edited Loading

amyeroberts commented Jun 25, 2024

aliencaocao commented Jun 25, 2024

aliencaocao commented Jun 25, 2024

aliencaocao commented Jun 26, 2024 • edited Loading

amyeroberts commented Jun 26, 2024

aliencaocao commented Jun 26, 2024

amyeroberts commented Jun 26, 2024

aliencaocao commented Jun 28, 2024

amyeroberts commented Jun 28, 2024

amyeroberts commented Jul 5, 2024

aliencaocao commented Jul 5, 2024

amyeroberts left a comment

Choose a reason for hiding this comment

aliencaocao commented Jun 10, 2024 •

edited

Loading

aliencaocao commented Jun 10, 2024 •

edited

Loading

aliencaocao commented Jun 25, 2024 •

edited

Loading

aliencaocao commented Jun 25, 2024 •

edited

Loading

aliencaocao commented Jun 25, 2024 •

edited

Loading

aliencaocao commented Jun 26, 2024 •

edited

Loading