Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug Fix] fix qa pipeline tensor to numpy #31585

Merged
merged 2 commits into from
Jul 11, 2024
Merged

Conversation

jiqing-feng
Copy link
Contributor

@jiqing-feng jiqing-feng commented Jun 25, 2024

Hi @Narsil @amyeroberts

This PR fixed the error for question-answering pipeline, the error could be reproduced by

from transformers import pipeline
pipe = pipeline("question-answering", model="hf-internal-testing/tiny-random-bert")
question = "What's my name?"
context = "My Name is Sasha and I live in Lyon."
pipe(question, context)

Traceback:

Traceback (most recent call last):
  File "test_qa.py", line 5, in <module>
    pipe(question, context)
  File "/home/jiqingfe/miniconda3/envs/ccl/lib/python3.8/site-packages/transformers/pipelines/question_answering.py", line 393, in __call__
    return super().__call__(examples[0], **kwargs)
  File "/home/jiqingfe/miniconda3/envs/ccl/lib/python3.8/site-packages/transformers/pipelines/base.py", line 1235, in __call__
    return next(
  File "/home/jiqingfe/miniconda3/envs/ccl/lib/python3.8/site-packages/transformers/pipelines/pt_utils.py", line 125, in __next__
    processed = self.infer(item, **self.params)
  File "/home/jiqingfe/miniconda3/envs/ccl/lib/python3.8/site-packages/transformers/pipelines/question_answering.py", line 546, in postprocess                                                    starts, ends, scores, min_null_score = select_starts_ends(
  File "/home/jiqingfe/miniconda3/envs/ccl/lib/python3.8/site-packages/transformers/pipelines/question_answering.py", line 124, in select_starts_ends
    undesired_tokens = undesired_tokens & attention_mask
TypeError: ufunc 'bitwise_and' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''

@jiqing-feng jiqing-feng marked this pull request as ready for review June 25, 2024 06:26
@jiqing-feng
Copy link
Contributor Author

I found this problem came from numpy, in python3.8, numpy will cast int to float:
image

So I suggest that we can use p_mask.numpy() instead of np.array(p_mask)

@jiqing-feng jiqing-feng changed the title fix qa pipeline [Bug Fix] fix qa pipeline tensor to numpy Jun 25, 2024
@@ -118,7 +118,7 @@ def select_starts_ends(
max_answer_len (`int`): Maximum size of the answer to extract from the model's output.
"""
# Ensure padded tokens & question tokens cannot belong to the set of candidate answers.
undesired_tokens = np.abs(np.array(p_mask) - 1)
undesired_tokens = np.abs(p_mask.numpy() - 1)
Copy link
Collaborator

@amyeroberts amyeroberts Jun 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this still work if you run the pipeline in jax?

from transformers import pipeline
pipe = pipeline("question-answering", model="hf-internal-testing/tiny-random-bert", framework="flax")
question = "What's my name?"
context = "My Name is Sasha and I live in Lyon."

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will raise a value error.
ValueError: Pipeline cannot infer suitable model classes from hf-internal-testing/tiny-random-bert.

Copy link
Contributor Author

@jiqing-feng jiqing-feng Jun 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Besides, temsor.numpy() has been already used in other pipelines like ASR

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, yes, looking into it we seem so assume either tf or pt everywhere in the pipeline, so even though I think this would break things for jax tensors it's not something we need to take account of at the moment. Thanks for testing!

@jiqing-feng
Copy link
Contributor Author

Hi @amyeroberts , could you take a look at this PR? I am waiting for your response, thx!

@LysandreJik
Copy link
Member

Hey @jiqing-feng! I'm trying to reproduce the issue but failing at doing so with python 3.8.18 and numpy 1.24.4.

>>> import torch
>>> import numpy as np
>>> a = torch.tensor([1,2,3], dtype=torch.int64)
>>> a
tensor([1, 2, 3])
>>> np.array(a)
array([1, 2, 3])
>>> import sys
>>> sys.version_info
sys.version_info(major=3, minor=8, micro=18, releaselevel='final', serial=0)

What's your torch version?

@jiqing-feng
Copy link
Contributor Author

Hey @jiqing-feng! I'm trying to reproduce the issue but failing at doing so with python 3.8.18 and numpy 1.24.4.

>>> import torch
>>> import numpy as np
>>> a = torch.tensor([1,2,3], dtype=torch.int64)
>>> a
tensor([1, 2, 3])
>>> np.array(a)
array([1, 2, 3])
>>> import sys
>>> sys.version_info
sys.version_info(major=3, minor=8, micro=18, releaselevel='final', serial=0)

What's your torch version?

torch 2.3.0+cpu

@jiqing-feng
Copy link
Contributor Author

Hey @jiqing-feng! I'm trying to reproduce the issue but failing at doing so with python 3.8.18 and numpy 1.24.4.

>>> import torch
>>> import numpy as np
>>> a = torch.tensor([1,2,3], dtype=torch.int64)
>>> a
tensor([1, 2, 3])
>>> np.array(a)
array([1, 2, 3])
>>> import sys
>>> sys.version_info
sys.version_info(major=3, minor=8, micro=18, releaselevel='final', serial=0)

What's your torch version?

torch 2.3.0+cpu

I just checked that torch 2.3.1+cpu fixed this issue; you can close this PR if you think there is no need to do this change. BTW, I suppose the change will not break anything, and it's more common. Thx!

@amyeroberts
Copy link
Collaborator

@jiqing-feng Thanks for investigating across the different pytorch versions. If the fix it only in later versions, then this is a change we'd still want as we officially support torch >= 1.11

Copy link
Collaborator

@amyeroberts amyeroberts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for fixing!

@amyeroberts amyeroberts merged commit aec1ca3 into huggingface:main Jul 11, 2024
18 checks passed
amyeroberts pushed a commit to amyeroberts/transformers that referenced this pull request Jul 19, 2024
MHRDYN7 pushed a commit to MHRDYN7/transformers that referenced this pull request Jul 23, 2024
zucchini-nlp pushed a commit to zucchini-nlp/transformers that referenced this pull request Jul 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants