Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Faster bert #1412

Merged
merged 4 commits into from
Aug 24, 2024
Merged

Faster bert #1412

merged 4 commits into from
Aug 24, 2024

Conversation

AngledLuffa
Copy link
Collaborator

A couple updates to the usage of bert, especially in the constituency parser - doesn't double tokenize, sort sentences by length to keep larger bert requests together

…ecessary duplicate uses of the tokenizer when looking for tokens with entirely unknown characters
… speedup coming from better batching of the input encoding. Will need to see if we can better batch the later operations as well
@AngledLuffa AngledLuffa merged commit c896431 into dev Aug 24, 2024
1 check passed
@AngledLuffa AngledLuffa deleted the faster_bert branch August 24, 2024 04:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant