Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"Optimistic" is_in generation for openai. #378

Merged
merged 1 commit into from
Nov 19, 2023

Conversation

HerrIvan
Copy link
Contributor

Addresses #368.

It is quite verbose: adds two helper functions: longest_common_prefix and get_choices_with_longest_common_prefix.

Maybe you would order the code differently.

ITo make the solution 100% secure, instead of computing the "prefixes" string-based, it should be done "token-based". That would change just those two functions. Let me know what you think.

The algorithm follows the pseudo-code in the issue.

except IndexError:
pass

greedy = False # we try to generate the full response at each iteration
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might as well remove the greedy case altogether

Copy link
Contributor Author

@HerrIvan HerrIvan Nov 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Say you have two tokenized choices [T1, TX], [T2, TX]. You would unmask the three tokens and request 2 tokens from the API. If the system is very biased towards TX, it may answer [TX, TX], and you have gained nothing. Repeating the same call you would enter in a loop. Thus, it makes sense to run a greedy step next with just T1 and T2 unmasked, such that you are guaranteed to move one token forward.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, this makes sense.

@rlouf
Copy link
Member

rlouf commented Nov 17, 2023

It might take me until Monday to be able to properly review the PR, thank you for contributing!

@rlouf rlouf linked an issue Nov 18, 2023 that may be closed by this pull request
@HerrIvan
Copy link
Contributor Author

I pushed changes such that it now computes the prefix matches as sequences of tokens (list of ints) and not as strings. That makes the algorithm 100% consistent with the output generated by the LLM. The code may require some refactoring for readability/testability, but as for the algorithmic solution, I think it is complete. I won't push anything else.

The current approach is greedy, in the sense that it generates a single
token at each steps, asking the API to only generate valid next tokens.
This mean having to pay for the prompt tokens for every token generated.

This commit takes a more optimistic approach. It starts with allowing
all tokens present in the sequences, and limiting the length of the
generation to the number of tokens in the longest sequence. If the
completion is not satisfactory it then takes one greedy step before
switching back to the optimistic mode.

On average this new approach consumes less tokens than the current one.
@rlouf
Copy link
Member

rlouf commented Nov 19, 2023

This is a very good addition, thank you! I just rebased the commits into a single commit. I will refactor the entire OpenAI integration soon, so I will leave the code as is for now and merge.

@rlouf rlouf marked this pull request as ready for review November 19, 2023 21:58
@rlouf rlouf merged commit 76cfc61 into outlines-dev:main Nov 19, 2023
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Reduce token consumption for "choice" generation with openAI.
2 participants