Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AttributeError: 'CodeGen25Tokenizer' object has no attribute 'encoder' #82

Open
velocityCavalry opened this issue Oct 4, 2023 · 3 comments

Comments

@velocityCavalry
Copy link

Hi! I am using transformers 4.34 and tiktoken 0.4.0. I am trying to download the tokenizer for CodeGen 2.5, but when I run the command in the tutorial

>>> from transformers import AutoTokenizer, AutoModelForCausalLM
>>> tokenizer = AutoTokenizer.from_pretrained("Salesforce/codegen25-7b-mono", trust_remote_code=True)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "miniconda3/envs/scenario/lib/python3.8/site-packages/transformers/models/auto/tokenization_auto.py", line 738, in from_pretrained
    return tokenizer_class.from_pretrained(pretrained_model_name_or_path, *inputs, **kwargs)
  File "miniconda3/envs/scenario/lib/python3.8/site-packages/transformers/tokenization_utils_base.py", line 2045, in from_pretrained
    return cls._from_pretrained(
  File "miniconda3/envs/scenario/lib/python3.8/site-packages/transformers/tokenization_utils_base.py", line 2256, in _from_pretrained
    tokenizer = cls(*init_inputs, **init_kwargs)
  File ".cache/huggingface/modules/transformers_modules/Salesforce/codegen25-7b-mono/29854f8cbe3e588ff7c8d1d15e605b5f12bca8a7/tokenization_codegen25.py", line 136, in __init__
    super().__init__(
  File "miniconda3/envs/scenario/lib/python3.8/site-packages/transformers/tokenization_utils.py", line 366, in __init__
    self._add_tokens(self.all_special_tokens_extended, special_tokens=True)
  File "/home/velocity/miniconda3/envs/scenario/lib/python3.8/site-packages/transformers/tokenization_utils.py", line 462, in _add_tokens
    current_vocab = self.get_vocab().copy()
  File ".cache/huggingface/modules/transformers_modules/Salesforce/codegen25-7b-mono/29854f8cbe3e588ff7c8d1d15e605b5f12bca8a7/tokenization_codegen25.py", line 153, in get_vocab
    vocab = {self._convert_id_to_token(i): i for i in range(self.vocab_size)}
  File ".cache/huggingface/modules/transformers_modules/Salesforce/codegen25-7b-mono/29854f8cbe3e588ff7c8d1d15e605b5f12bca8a7/tokenization_codegen25.py", line 149, in vocab_size
    return self.encoder.n_vocab
AttributeError: 'CodeGen25Tokenizer' object has no attribute 'encoder'

I tried to delete the cache but it doesn't seem to be working.. Running tokenizer = AutoTokenizer.from_pretrained("Salesforce/codegen25-7b-mono") gives ValueError: Tokenizer class CodeGen25Tokenizer does not exist or is not currently imported.

So I wonder whether anyone else has encountered this issue, and if yes, how can I solve it, thank you so much!

@fernandalorena
Copy link

fernandalorena commented Oct 10, 2023

im getting the same problem for XGen, I'm thinking it might be a dependency on transformers and tiktoken previous versions

UPDATE: Soved it, looks like something is not working with transformers so we need to specify the version
pip install transformers==4.33.2

@rooa
Copy link
Contributor

rooa commented Oct 10, 2023

Correct, there was a breaking change in transformers that changed the order of variable declarations within PretrainedTokenizer. On it.

@skye95git
Copy link

im getting the same problem for XGen, I'm thinking it might be a dependency on transformers and tiktoken previous versions

UPDATE: Soved it, looks like something is not working with transformers so we need to specify the version pip install transformers==4.33.2

Hi, I meet the same error, only this one version work?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants