`SeamlessM4Tv2ConformerEncoder` does not behaves as expected if gradient checkpointing is enabled #31028

anferico · 2024-05-25T19:25:05Z

System Info

transformers version: 4.42.0.dev0
Platform: Linux-5.4.0-172-generic-x86_64-with-glibc2.17
Python version: 3.8.19
Huggingface_hub version: 0.23.1
Safetensors version: 0.4.3
Accelerate version: 0.30.1
Accelerate config: not found
PyTorch version (GPU?): 2.0.1+cu117 (True)
Tensorflow version (GPU?): 2.13.1 (True)
Flax version (CPU?/GPU?/TPU?): 0.7.0 (cpu)
Jax version: 0.4.13
JaxLib version: 0.4.13
Using GPU in script?:
Using distributed or parallel set-up in script?:

Who can help?

Proposed fix:

class SeamlessM4Tv2ConformerEncoder(...):
    [...]
    def forward(...):
        [...]
            if not skip_the_layer or deepspeed_zero3_is_enabled:
                # under deepspeed zero3 all gpus must run in sync
                if self.gradient_checkpointing and self.training:
                    layer_outputs = self._gradient_checkpointing_func(
                        layer.__call__,
                        hidden_states,
                        attention_mask,
                        output_attentions,    # <---------- Add this parameter
                        conv_attention_mask,  # <---------- Add this parameter         
                    )
                else:
                    layer_outputs = layer(
                        hidden_states,
                        attention_mask=attention_mask,
                        output_attentions=output_attentions,
                        conv_attention_mask=conv_attention_mask,
                    )
                hidden_states = layer_outputs[0]
        [...]

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)

Reproduction

Train a model that has transformers.models.seamless_m4t_v2.modeling_seamless_m4t_v2.SeamlessM4Tv2ConformerEncoder as a submodule
(enable gradient checkpointing while training)

When calling SeamlessM4Tv2ConformerEncoder.forward(), pass output_attentions=True and return_dict=True. For example:

encoder: SeamlessM4Tv2ConformerEncoder = ...
output = encoder(..., output_attentions=True, return_dict=True)

Expected behavior

output.attentions is a tuple of not-None tensors, one per encoder layer. Instead, the actual behavior is that output.attentions = (None, None, ..., None).

The text was updated successfully, but these errors were encountered:

amyeroberts · 2024-05-28T10:43:40Z

@anferico Thanks for raising! I think you pinged the wrong Sanchit - cc @sanchit-gandhi

ArthurZucker · 2024-06-05T12:06:27Z

cc @ylacombe as well! 🤗

ylacombe · 2024-06-05T12:37:42Z

Hey @anferico, nice catch, would you like to open a PR to fix this?

Note that Seamless training is not supported yet in transformers though

anferico · 2024-06-05T12:52:23Z

@ylacombe sure, I'll open a PR 👍 No worries about the support for training, as I actually have a use case where I just take the speech encoder out of SeamlessM4Tv2 and employ it in a larger model architecture

github-actions · 2024-06-30T08:03:44Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

anferico · 2024-07-05T17:43:37Z

Will open a PR soon, sorry for the delay

ylacombe · 2024-07-08T14:19:10Z

No worries @anferico, don't hesitate to ping me once it's done!

anferico · 2024-07-13T11:01:17Z

@ylacombe PR opened (#31945)! Besides, I also wanted to point your attention to another issue (#31946) I opened regarding the speech encoder of SeamlessM4Tv2. It would be great if you could check it out 🙏🏼

amyeroberts added the Audio label May 28, 2024

anferico mentioned this issue Jul 13, 2024

Pass missing arguments to SeamlessM4Tv2ConformerEncoderLayer.forward() when gradient checkpointing is enabled #31945

Merged

5 tasks

amyeroberts closed this as completed in #31945 Jul 17, 2024

anferico mentioned this issue Jul 18, 2024

More robust tests required for gradient checkpointing #32063

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`SeamlessM4Tv2ConformerEncoder` does not behaves as expected if gradient checkpointing is enabled #31028

`SeamlessM4Tv2ConformerEncoder` does not behaves as expected if gradient checkpointing is enabled #31028

anferico commented May 25, 2024 •

edited

Loading

amyeroberts commented May 28, 2024

ArthurZucker commented Jun 5, 2024

ylacombe commented Jun 5, 2024

anferico commented Jun 5, 2024

github-actions bot commented Jun 30, 2024

anferico commented Jul 5, 2024

ylacombe commented Jul 8, 2024

anferico commented Jul 13, 2024

SeamlessM4Tv2ConformerEncoder does not behaves as expected if gradient checkpointing is enabled #31028

SeamlessM4Tv2ConformerEncoder does not behaves as expected if gradient checkpointing is enabled #31028

Comments

anferico commented May 25, 2024 • edited Loading

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

amyeroberts commented May 28, 2024

ArthurZucker commented Jun 5, 2024

ylacombe commented Jun 5, 2024

anferico commented Jun 5, 2024

github-actions bot commented Jun 30, 2024

anferico commented Jul 5, 2024

ylacombe commented Jul 8, 2024

anferico commented Jul 13, 2024

`SeamlessM4Tv2ConformerEncoder` does not behaves as expected if gradient checkpointing is enabled #31028

`SeamlessM4Tv2ConformerEncoder` does not behaves as expected if gradient checkpointing is enabled #31028

anferico commented May 25, 2024 •

edited

Loading