fix(llama): fix LlamaTokenzier #22746

rockmagma02 · 2023-04-13T14:00:54Z

What does this PR do?

Bug in LlamaTokenizer when return_token_type_ids=True #22742

This PR removes an extra sep token in the sequence length calculation.

Ref:

transformers/src/transformers/models/llama/tokenization_llama.py

Lines 178 to 187 in 7df1343

    
           def build_inputs_with_special_tokens(self, token_ids_0, token_ids_1=None): 
        
               bos_token_id = [self.bos_token_id] if self.add_bos_token else [] 
        
               eos_token_id = [self.eos_token_id] if self.add_eos_token else [] 
        
               output = bos_token_id + token_ids_0 + eos_token_id 
        
               if token_ids_1 is not None: 
        
                   output = output + bos_token_id + token_ids_1 + eos_token_id 
        
               return output

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

Bug in LlamaTokenizer when huggingface#22742

HuggingFaceDocBuilderDev · 2023-04-13T14:16:16Z

The documentation is not available anymore as the PR was closed or merged.

ArthurZucker

Nice! Good job on the fix 🚀 . Note that the seq and cls token were not even set, was part of the problem.

amyeroberts

Thanks for the quick fix! 🚀

It seems there is an issue with your CircleCI permissions, as the tests won't run.
Could you try refreshing your permissions as shown here?

rockmagma02 · 2023-04-13T17:04:26Z

Thanks for your quick review, I have re-run the CI tests. 🤗 @amyeroberts @ArthurZucker

Bug in LlamaTokenizer when huggingface#22742

fix(llama): fix LlamaTokenzier

9d3f065

Bug in LlamaTokenizer when huggingface#22742

Merge branch 'huggingface:main' into fix-llama-tokenizer

4137f01

ArthurZucker approved these changes Apr 13, 2023

View reviewed changes

ArthurZucker requested a review from amyeroberts April 13, 2023 14:53

amyeroberts approved these changes Apr 13, 2023

View reviewed changes

rockmagma02 added 2 commits April 14, 2023 00:11

Merge branch 'huggingface:main' into fix-llama-tokenizer

aa3317b

Merge branch 'huggingface:main' into fix-llama-tokenizer

fac1378

amyeroberts merged commit 90ce374 into huggingface:main Apr 13, 2023

novice03 pushed a commit to novice03/transformers that referenced this pull request Jun 23, 2023

fix(llama): fix LlamaTokenzier (huggingface#22746)

62932ef

Bug in LlamaTokenizer when huggingface#22742

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(llama): fix LlamaTokenzier #22746

fix(llama): fix LlamaTokenzier #22746

rockmagma02 commented Apr 13, 2023 •

edited

Loading

HuggingFaceDocBuilderDev commented Apr 13, 2023 •

edited

Loading

ArthurZucker left a comment

amyeroberts left a comment

rockmagma02 commented Apr 13, 2023

	def build_inputs_with_special_tokens(self, token_ids_0, token_ids_1=None):
	bos_token_id = [self.bos_token_id] if self.add_bos_token else []
	eos_token_id = [self.eos_token_id] if self.add_eos_token else []

	output = bos_token_id + token_ids_0 + eos_token_id

	if token_ids_1 is not None:
	output = output + bos_token_id + token_ids_1 + eos_token_id

	return output

fix(llama): fix LlamaTokenzier #22746

fix(llama): fix LlamaTokenzier #22746

Conversation

rockmagma02 commented Apr 13, 2023 • edited Loading

What does this PR do?

Before submitting

Who can review?

HuggingFaceDocBuilderDev commented Apr 13, 2023 • edited Loading

ArthurZucker left a comment

Choose a reason for hiding this comment

amyeroberts left a comment

Choose a reason for hiding this comment

rockmagma02 commented Apr 13, 2023

rockmagma02 commented Apr 13, 2023 •

edited

Loading

HuggingFaceDocBuilderDev commented Apr 13, 2023 •

edited

Loading