Add language to word timestamps for Whisper #31572

robinderat · 2024-06-24T13:58:59Z

Add language to word timestamps for Whisper

This fix enables whisper to return word-level timestamps and the predicted language at the same time. Before, enabling word-level timestamps would discard the language prediction.

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

@kamilakesbi
@sanchit-gandhi

_collate_word_timestamps uses the return_language flag to determine whether the language of the chunk should be added to the word's information

…tamps

added missing comma

…b.com/robinderat/transformers into whisper-language-with-word-timestamps

kamilakesbi · 2024-06-27T13:12:54Z

Hi @robinderat, thanks for working on this!

Could you please add a test in test_tokenization_whisper.py to verify that we can indeed return both word-level timestamps and the predicted language at the same time?

test that the pipeline can return both the language and timestamp

kamilakesbi · 2024-07-03T12:17:49Z

LGTM! thanks for iterating on this :)

cc @amyeroberts for final review!

sanchit-gandhi

Thanks for the contribution @robinderat! 🤗

HuggingFaceDocBuilderDev · 2024-07-03T16:44:18Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

AvivSham · 2024-07-14T07:38:47Z

@amyeroberts gentle reminder 😄

amyeroberts

Thanks for the ping - just some questions to better understand the diff

amyeroberts · 2024-07-05T11:06:48Z

src/transformers/models/whisper/tokenization_whisper.py

@@ -1197,12 +1197,16 @@ def _find_longest_common_sequence(sequences, token_timestamp_sequences=None):
        return total_sequence, []


-def _collate_word_timestamps(tokenizer, tokens, token_timestamps, language):
+def _collate_word_timestamps(tokenizer, tokens, token_timestamps, language, return_language):


Is return_language always a bool, or could it be None too?

It can be a bool or None. However, the behavior for return_language=None is equal to that of return_language=False

amyeroberts · 2024-07-05T13:23:26Z

tests/pipelines/test_pipelines_automatic_speech_recognition.py

+        )
+        data = load_dataset("openslr/librispeech_asr", "clean", split="test", streaming=True, trust_remote_code=True)
+        sample = next(iter(data))
+        pipe.model.config.forced_decoder_ids = pipe.tokenizer.get_decoder_prompt_ids(language="en", task="transcribe")


Is this modification necessary for people to be able to use return_language with the pipeline?

I based my test on an existing one that had this modification in it. However, it does not seem to affect the tests, so I have now removed them.

Specifying return_language=True in the pipeline is all that is required for this to work

Removed model configurations that do not influence test results

amyeroberts

Thanks for adding and iterating on this!

* add language to words _collate_word_timestamps uses the return_language flag to determine whether the language of the chunk should be added to the word's information * ran style checks added missing comma * add new language test test that the pipeline can return both the language and timestamp * remove model configuration in test Removed model configurations that do not influence test results * remove model configuration in test Removed model configurations that do not influence test results

robinderat added 3 commits June 24, 2024 14:27

add language to words

391cd45

_collate_word_timestamps uses the return_language flag to determine whether the language of the chunk should be added to the word's information

Merge branch 'huggingface:main' into whisper-language-with-word-times…

77f4a01

…tamps

ran style checks

549c0f1

added missing comma

amyeroberts added the Audio label Jun 24, 2024

Merge branch 'whisper-language-with-word-timestamps' of https://githu…

88f0f0e

…b.com/robinderat/transformers into whisper-language-with-word-timestamps

add new language test

0352e8e

test that the pipeline can return both the language and timestamp

kamilakesbi requested a review from amyeroberts July 3, 2024 12:17

sanchit-gandhi approved these changes Jul 3, 2024

View reviewed changes

kamilakesbi mentioned this pull request Jul 8, 2024

Whisper - get probability of detected language #29293

Open

4 tasks

amyeroberts reviewed Jul 15, 2024

View reviewed changes

robinderat added 2 commits July 16, 2024 20:07

remove model configuration in test

e09de69

Removed model configurations that do not influence test results

remove model configuration in test

270dc94

Removed model configurations that do not influence test results

amyeroberts approved these changes Jul 17, 2024

View reviewed changes

amyeroberts merged commit b31d595 into huggingface:main Jul 17, 2024
18 checks passed

robinderat deleted the whisper-language-with-word-timestamps branch July 18, 2024 07:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add language to word timestamps for Whisper #31572

Add language to word timestamps for Whisper #31572

robinderat commented Jun 24, 2024 •

edited

Loading

kamilakesbi commented Jun 27, 2024

kamilakesbi commented Jul 3, 2024

sanchit-gandhi left a comment

HuggingFaceDocBuilderDev commented Jul 3, 2024

AvivSham commented Jul 14, 2024

amyeroberts left a comment

amyeroberts Jul 5, 2024

robinderat Jul 16, 2024

amyeroberts Jul 5, 2024

robinderat Jul 16, 2024

amyeroberts left a comment

Add language to word timestamps for Whisper #31572

Add language to word timestamps for Whisper #31572

Conversation

robinderat commented Jun 24, 2024 • edited Loading

Add language to word timestamps for Whisper

Before submitting

Who can review?

kamilakesbi commented Jun 27, 2024

kamilakesbi commented Jul 3, 2024

sanchit-gandhi left a comment

Choose a reason for hiding this comment

HuggingFaceDocBuilderDev commented Jul 3, 2024

AvivSham commented Jul 14, 2024

amyeroberts left a comment

Choose a reason for hiding this comment

amyeroberts Jul 5, 2024

Choose a reason for hiding this comment

robinderat Jul 16, 2024

Choose a reason for hiding this comment

amyeroberts Jul 5, 2024

Choose a reason for hiding this comment

robinderat Jul 16, 2024

Choose a reason for hiding this comment

amyeroberts left a comment

Choose a reason for hiding this comment

robinderat commented Jun 24, 2024 •

edited

Loading