Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bff/token optimisation #268

Open
wants to merge 29 commits into
base: master
Choose a base branch
from

Conversation

benjaminfh
Copy link

@benjaminfh benjaminfh commented Nov 1, 2023

PR to fix: #266

@bugface
Copy link

bugface commented Nov 14, 2023

This is very useful!

One small findings:
add .DS_Store into gitignore or remove it.

@benjaminfh
Copy link
Author

benjaminfh commented Nov 14, 2023

Ah yes, thanks for catching that - removed the file.

I should have put some context in the original PR. This is to fix the issue I flagged here #266

@benjaminfh
Copy link
Author

@bugface – let me know if you're happy to merge :)

@bugface
Copy link

bugface commented Nov 22, 2023

@bugface – let me know if you're happy to merge :)

I think it looks great.

@benjaminfh benjaminfh marked this pull request as ready for review November 22, 2023 19:21
@benjaminfh
Copy link
Author

I think we're good to merge then, @bugface :)

@msciancalepore98
Copy link

news about this? @bugface

Copy link

@matteocacciola matteocacciola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the default value of replace_additional_special_tokens is False, why do you assign False value when you explicit its usage?

@benjaminfh
Copy link
Author

Ah yeah, I guess bad style. At the time I wanted to be explicit because this bug was so opaque. Are you happy to merge if I fix that?

@matteocacciola
Copy link

matteocacciola commented Jul 11, 2024 via email

if len(set(list_of_tokens) - set(self.tokenizer.all_special_tokens)) > 0:
newly_added_num = self.tokenizer.add_special_tokens({"additional_special_tokens": sorted(set(list_of_tokens))}, replace_additional_special_tokens=replace_additional_special_tokens)
else:
newly_added_num = 0

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this is the real solution to the detected bug, isn't it?

matteocacciola added a commit to matteocacciola/donut that referenced this pull request Jul 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

json2token performance
4 participants