Export files required by OPT to load the tokenizer #1571

dbogunowicz · 2023-05-19T15:59:23Z

OPT does not contain a "tokenizer.json", but instead requires the following files:
"special_tokens_map.json"
"vocab.json"
"merges.txt"
to create the tokenizer from pretrained.

I.e if those three files are present in the deployment model this line of code will execute:

model_path = "deployment"
tokenizer = AutoTokenizer.from_pretrained(
            model_path,
        )

src/sparseml/transformers/export.py

initial commit

ea73fb7

dbogunowicz requested review from bfineran, natuan, a team and rahul-tuli and removed request for a team May 19, 2023 15:59

natuan suggested changes May 19, 2023

View reviewed changes

src/sparseml/transformers/export.py Outdated Show resolved Hide resolved

include Tuans proposal

1fe7ad6

dbogunowicz requested a review from natuan May 22, 2023 08:31

dbogunowicz added 2 commits May 22, 2023 10:31

Merge branch 'main' into feature/damian/tokenizer_files

4747226

Merge branch 'main' into feature/damian/tokenizer_files

469cac7

anmarques reviewed May 22, 2023

View reviewed changes

src/sparseml/transformers/export.py Show resolved Hide resolved

dbogunowicz commented May 22, 2023

View reviewed changes

src/sparseml/transformers/export.py Outdated Show resolved Hide resolved

Update src/sparseml/transformers/export.py

7592b90

dbogunowicz requested a review from anmarques May 22, 2023 15:06

anmarques approved these changes May 22, 2023

View reviewed changes

natuan approved these changes May 22, 2023

View reviewed changes

natuan merged commit 3dd1f8d into main May 22, 2023
12 checks passed

natuan deleted the feature/damian/tokenizer_files branch May 22, 2023 15:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Export files required by OPT to load the tokenizer #1571

Export files required by OPT to load the tokenizer #1571

dbogunowicz commented May 19, 2023

Export files required by OPT to load the tokenizer #1571

Export files required by OPT to load the tokenizer #1571

Conversation

dbogunowicz commented May 19, 2023