bus error on version 4.43.0 with pretrained community CLIP model - MacOS #33357

pezafar · 2024-09-06T15:08:19Z

System Info

transformers version: 4.43.0
Platform: macOS-13.0-arm64-arm-64bit
Python version: 3.10.9
Huggingface_hub version: 0.24.6
Safetensors version: 0.4.5
Accelerate version: not installed
Accelerate config: not found
PyTorch version (GPU?): 2.4.1 (False)
Tensorflow version (GPU?): not installed (NA)
Flax version (CPU?/GPU?/TPU?): not installed (NA)
Jax version: not installed
JaxLib version: not installed
Using distributed or parallel set-up in script?:

Who can help?

@ArthurZucker

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)

Reproduction

from transformers import CLIPModel, CLIPTokenizerFast

tokenizer = CLIPTokenizerFast.from_pretrained("patrickjohncyh/fashion-clip")
model = CLIPModel.from_pretrained("patrickjohncyh/fashion-clip")

tokenized = tokenizer(["hello"], return_tensors="pt", padding=True)
print("tokenized", tokenized)

# bus error occurs here
embed = model.get_text_features(**tokenized).detach().cpu().numpy()
print("embedded", tokenized)

gives :

tokenized {'input_ids': tensor([[49406,  3497, 49407]]), 'attention_mask': tensor([[1, 1, 1]])}
zsh: bus error  python test_hf.py

I don't think the issue has been posted already.
After bisecting versions, it looks like 4.42.4 does not have the issue and 4.43.0 has the issue

I have little insight to provide except the bus error, and that this does not occur with the clip-vit-base-patch32 model.
I saw some breaking changes in this version release, but only about the tokenizer.
I did not have time to test on a linux distribution yet

Thanks !

Expected behavior

By using the exact same script with the hugging face CLIP pretrained model, the embedding get computed as they should

processor = CLIPProcessor.from_pretrained("openai/clip-vit-base-patch32")
tokenizer = CLIPTokenizerFast.from_pretrained("openai/clip-vit-base-patch32")

The text was updated successfully, but these errors were encountered:

LysandreJik · 2024-09-06T20:42:50Z

Hmmm it's the first time I see this being reported here. Would it be possible for you to try with a different torch version to see if you still have the error?

pezafar · 2024-09-09T12:04:54Z

Hey @LysandreJik, thanks for your answer

Yes so I checked pytorch and it seems the issue occurs at version 2.1.0:

transformers 2.43.0 and torch 2.0.1 is ok
transformers 2.43.0 and torch 2.1.0. gives me bus error

(transformers 4.42.0 is still ok for all torch versions)
I did not dig further into torch changes though, let me know

LysandreJik · 2024-09-09T15:37:17Z

Interesting, it might be good to open an issue on the PyTorch slack in that case

pezafar · 2024-09-09T16:49:02Z

I will check in this direction, thanks (a bit off topic, but is the slack available on invite only ?)

Also if this helps anyone at some point, it seems the hidden layer outputs seems to explode to full nan at some point before being sent into the projection layer where the bus error occurs

LysandreJik · 2024-09-10T14:29:39Z

Sorry I meant the PyTorch Github 🤦

ArthurZucker · 2024-09-27T13:15:11Z

No issues on M3 on my side:

chanind · 2024-10-08T20:24:48Z

I have the same issue on a M1 mac and transformers 4.43.2, and using the gpt2-small model. If I use torch <= 2.0.1, everything is fine, but if 2.1.0 or greater, I get a bus error trying to pass anything into model(input). Makes it hard to debug since the entire Python process crashes. It also doesn't happen on Linux for me. I can also confirm that using transformers version 4.42.4 works fine with torch >= 2.1.0 (currently on 2.4.0)

chanind · 2024-10-08T21:01:16Z

I find that if I comment out the following line from GPT2LMHeadModel (https://github.com/huggingface/transformers/blob/main/src/transformers/models/gpt2/modeling_gpt2.py#L1232C1-L1233C28) then everything works. I don't know what changed in 4.42.4 / torch 2.1.0 that cause things to break here, as this line has been there for awhile.

    def get_output_embeddings(self):
        return self.lm_head

I find that the point where Python crashes is in the GPT2LMHeadModel.forward() method, at lm_logits = self.lm_head(hidden_states). I find that after I create a GPT2LMHead model from pretrained, passing any tensor into model.lm_head() will cause the interpreter to crash.

My best guess is that the get_output_embeddings() method is getting called from some other thread somehow, and then can't be serialized or something. The python interpreter has the following line when it crashes that hints there's some multithreading thing going wrong: UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown, although this could be unrelated.

chanind · 2024-10-08T21:50:23Z

Tracking this down more, things broke in this PR: #31771

Specifically, local_metadata["assign_to_params_buffers"] = assign_to_params_buffers seems to break on M1 macs on Pytorch >= 2.1.0. If I set assign_to_params_buffers = False then the bug goes away.

It also looks like this is related to tying the encoder and decoder weights together, specifically if I comment out the line output_embeddings.weight = input_embeddings.weight in _tie_or_clone_weights() in modeling_utils.py (https://github.com/huggingface/transformers/blob/main/src/transformers/modeling_utils.py#L2007), then this bug doesn't arise.

So, the bug arises when you set one param weight to the value of another param weight in the model, and you load state dict using assign_to_params_buffers=True on M1 macs, then the Python interpreter explodes once any of the overwritten weights is actually used.

It seems like setting _supports_param_buffer_assignment = False on GPT2 or any model that has tied embeddings, or always returning False from check_support_param_buffer_assignment() if you're on a M1 mac would also solve this. I can open a PR for this but not sure which approach is best.

There's a bug on M1 macs with transformer >= 4.43.0 and torch >= 2.1.0, where if a model has tied embeddings, then the fast loading from huggingface#31771 causes a bus error when the model is actually run. This can be solved by disabling `_supports_param_buffer_assignment` for these models. More info in comments in huggingface#33357

pezafar added the bug label Sep 6, 2024

LysandreJik added the PyTorch Anything PyTorch label Sep 6, 2024

chanind mentioned this issue Oct 8, 2024

Fix bus error when using GPT2 on M1 macs #34031

Open

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bus error on version 4.43.0 with pretrained community CLIP model - MacOS #33357

bus error on version 4.43.0 with pretrained community CLIP model - MacOS #33357

pezafar commented Sep 6, 2024

LysandreJik commented Sep 6, 2024

pezafar commented Sep 9, 2024

LysandreJik commented Sep 9, 2024

pezafar commented Sep 9, 2024

LysandreJik commented Sep 10, 2024

ArthurZucker commented Sep 27, 2024

chanind commented Oct 8, 2024 •

edited

Loading

chanind commented Oct 8, 2024

chanind commented Oct 8, 2024

bus error on version 4.43.0 with pretrained community CLIP model - MacOS #33357

bus error on version 4.43.0 with pretrained community CLIP model - MacOS #33357

Comments

pezafar commented Sep 6, 2024

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

LysandreJik commented Sep 6, 2024

pezafar commented Sep 9, 2024

LysandreJik commented Sep 9, 2024

pezafar commented Sep 9, 2024

LysandreJik commented Sep 10, 2024

ArthurZucker commented Sep 27, 2024

chanind commented Oct 8, 2024 • edited Loading

chanind commented Oct 8, 2024

chanind commented Oct 8, 2024

chanind commented Oct 8, 2024 •

edited

Loading