Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bus error on version 4.43.0 with pretrained community CLIP model - MacOS #33357

Open
1 of 4 tasks
pezafar opened this issue Sep 6, 2024 · 9 comments
Open
1 of 4 tasks
Labels
bug PyTorch Anything PyTorch

Comments

@pezafar
Copy link

pezafar commented Sep 6, 2024

System Info

  • transformers version: 4.43.0
  • Platform: macOS-13.0-arm64-arm-64bit
  • Python version: 3.10.9
  • Huggingface_hub version: 0.24.6
  • Safetensors version: 0.4.5
  • Accelerate version: not installed
  • Accelerate config: not found
  • PyTorch version (GPU?): 2.4.1 (False)
  • Tensorflow version (GPU?): not installed (NA)
  • Flax version (CPU?/GPU?/TPU?): not installed (NA)
  • Jax version: not installed
  • JaxLib version: not installed
  • Using distributed or parallel set-up in script?:

Who can help?

@ArthurZucker

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

from transformers import CLIPModel, CLIPTokenizerFast

tokenizer = CLIPTokenizerFast.from_pretrained("patrickjohncyh/fashion-clip")
model = CLIPModel.from_pretrained("patrickjohncyh/fashion-clip")

tokenized = tokenizer(["hello"], return_tensors="pt", padding=True)
print("tokenized", tokenized)

# bus error occurs here
embed = model.get_text_features(**tokenized).detach().cpu().numpy()
print("embedded", tokenized)


gives :

tokenized {'input_ids': tensor([[49406,  3497, 49407]]), 'attention_mask': tensor([[1, 1, 1]])}
zsh: bus error  python test_hf.py

I don't think the issue has been posted already.
After bisecting versions, it looks like 4.42.4 does not have the issue and 4.43.0 has the issue

I have little insight to provide except the bus error, and that this does not occur with the clip-vit-base-patch32 model.
I saw some breaking changes in this version release, but only about the tokenizer.
I did not have time to test on a linux distribution yet

Thanks !

Expected behavior

By using the exact same script with the hugging face CLIP pretrained model, the embedding get computed as they should

processor = CLIPProcessor.from_pretrained("openai/clip-vit-base-patch32")
tokenizer = CLIPTokenizerFast.from_pretrained("openai/clip-vit-base-patch32")
@pezafar pezafar added the bug label Sep 6, 2024
@LysandreJik
Copy link
Member

Hmmm it's the first time I see this being reported here. Would it be possible for you to try with a different torch version to see if you still have the error?

@LysandreJik LysandreJik added the PyTorch Anything PyTorch label Sep 6, 2024
@pezafar
Copy link
Author

pezafar commented Sep 9, 2024

Hey @LysandreJik, thanks for your answer

Yes so I checked pytorch and it seems the issue occurs at version 2.1.0:

  • transformers 2.43.0 and torch 2.0.1 is ok
  • transformers 2.43.0 and torch 2.1.0. gives me bus error

(transformers 4.42.0 is still ok for all torch versions)
I did not dig further into torch changes though, let me know

@LysandreJik
Copy link
Member

Interesting, it might be good to open an issue on the PyTorch slack in that case

@pezafar
Copy link
Author

pezafar commented Sep 9, 2024

I will check in this direction, thanks (a bit off topic, but is the slack available on invite only ?)

Also if this helps anyone at some point, it seems the hidden layer outputs seems to explode to full nan at some point before being sent into the projection layer where the bus error occurs

@LysandreJik
Copy link
Member

Sorry I meant the PyTorch Github 🤦

@ArthurZucker
Copy link
Collaborator

No issues on M3 on my side:
image

@chanind
Copy link

chanind commented Oct 8, 2024

I have the same issue on a M1 mac and transformers 4.43.2, and using the gpt2-small model. If I use torch <= 2.0.1, everything is fine, but if 2.1.0 or greater, I get a bus error trying to pass anything into model(input). Makes it hard to debug since the entire Python process crashes. It also doesn't happen on Linux for me. I can also confirm that using transformers version 4.42.4 works fine with torch >= 2.1.0 (currently on 2.4.0)

@chanind
Copy link

chanind commented Oct 8, 2024

I find that if I comment out the following line from GPT2LMHeadModel (https://github.com/huggingface/transformers/blob/main/src/transformers/models/gpt2/modeling_gpt2.py#L1232C1-L1233C28) then everything works. I don't know what changed in 4.42.4 / torch 2.1.0 that cause things to break here, as this line has been there for awhile.

    def get_output_embeddings(self):
        return self.lm_head

I find that the point where Python crashes is in the GPT2LMHeadModel.forward() method, at lm_logits = self.lm_head(hidden_states). I find that after I create a GPT2LMHead model from pretrained, passing any tensor into model.lm_head() will cause the interpreter to crash.

My best guess is that the get_output_embeddings() method is getting called from some other thread somehow, and then can't be serialized or something. The python interpreter has the following line when it crashes that hints there's some multithreading thing going wrong: UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown, although this could be unrelated.

@chanind
Copy link

chanind commented Oct 8, 2024

Tracking this down more, things broke in this PR: #31771

Specifically, local_metadata["assign_to_params_buffers"] = assign_to_params_buffers seems to break on M1 macs on Pytorch >= 2.1.0. If I set assign_to_params_buffers = False then the bug goes away.

It also looks like this is related to tying the encoder and decoder weights together, specifically if I comment out the line output_embeddings.weight = input_embeddings.weight in _tie_or_clone_weights() in modeling_utils.py (https://github.com/huggingface/transformers/blob/main/src/transformers/modeling_utils.py#L2007), then this bug doesn't arise.

So, the bug arises when you set one param weight to the value of another param weight in the model, and you load state dict using assign_to_params_buffers=True on M1 macs, then the Python interpreter explodes once any of the overwritten weights is actually used.

It seems like setting _supports_param_buffer_assignment = False on GPT2 or any model that has tied embeddings, or always returning False from check_support_param_buffer_assignment() if you're on a M1 mac would also solve this. I can open a PR for this but not sure which approach is best.

chanind added a commit to chanind/transformers that referenced this issue Oct 8, 2024
There's a bug on M1 macs with transformer >= 4.43.0 and torch >= 2.1.0, where if a model has tied embeddings, then the fast loading from huggingface#31771 causes a bus error when the model is actually run. This can be solved by disabling `_supports_param_buffer_assignment` for these models.

More info in comments in huggingface#33357
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug PyTorch Anything PyTorch
Projects
None yet
Development

No branches or pull requests

4 participants