Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Minor Terminology and Documentation Updates for Local Tokenizer Loading #134

Merged
merged 2 commits into from
Mar 23, 2024

Conversation

justHungryMan
Copy link
Contributor

@justHungryMan justHungryMan commented Mar 23, 2024

I've been closely monitoring the DataTrove project and utilizing it in my workflow due to its efficient pipelining capabilities. Thanks for all the hard work on this project. Really appreciate it.

Given our environment's restriction on external internet access, the ability to load tokenizers from local files rather than exclusively from Hugging Face (HF) is crucial for us. I was delighted to discover that recent updates have added the capability to load tokenizers locally.

Although I had prepared to contribute this specific feature, upon noticing its implementation, I opted to make some supplementary updates instead. These include renaming tokenizer_name to tokenizer_name_or_path and refining the related documentation to better align with the new functionality.

I welcome any feedback or suggestions for further refinements. Thank you for your ongoing efforts to enhance DataTrove.

- Rename 'tokenizer_name' to 'tokenizer_name_or_path'.
- Update comments and documentation for clarity.
- Rename 'tokenizer_name' to 'tokenizer_name_or_path'.
- Adjust related documentation and usage accordingly.
@guipenedo
Copy link
Collaborator

LGTM, thanks!

@guipenedo guipenedo merged commit 56aa210 into huggingface:main Mar 23, 2024
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants