Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove unnecessary array allocations in generation process and enable caching #308

Merged

Conversation

brandonwillard
Copy link
Contributor

@brandonwillard brandonwillard commented Oct 2, 2023

This PR removes some unnecessary array allocations during the generation process that affect scaling in max tokens and adds KV caching.

Perhaps the biggest non-cache-based change is that the method Sequence.update_token_ids has been removed; otherwise, the dimensions for arrays returned by Sequence.step are fixed (i.e. no squeezing). This makes the dimensions in Sequence.__call__ clearer and allows us to simplify the loop (e.g. no need to duplicate the steps in the auto-regression loop before starting the loop).

@brandonwillard brandonwillard added text Linked to text generation enhancement optimization Related to performance optimizations labels Oct 2, 2023
@brandonwillard brandonwillard self-assigned this Oct 2, 2023
@brandonwillard brandonwillard force-pushed the fix-sequence-scaling branch 6 times, most recently from dc551ff to d55bbc1 Compare October 3, 2023 23:04
@brandonwillard brandonwillard changed the title Remove unnecessary array allocations in generation process Remove unnecessary array allocations in generation process and enable caching Oct 3, 2023
@brandonwillard brandonwillard added the transformers Linked to the `transformers` integration label Oct 3, 2023
@brandonwillard brandonwillard merged commit a8429a3 into outlines-dev:main Oct 4, 2023
5 checks passed
@brandonwillard brandonwillard deleted the fix-sequence-scaling branch October 4, 2023 03:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement optimization Related to performance optimizations text Linked to text generation transformers Linked to the `transformers` integration
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant