"Optimistic" is_in generation for openai. #378

HerrIvan · 2023-11-17T14:59:06Z

Addresses #368.

It is quite verbose: adds two helper functions: longest_common_prefix and get_choices_with_longest_common_prefix.

Maybe you would order the code differently.

ITo make the solution 100% secure, instead of computing the "prefixes" string-based, it should be done "token-based". That would change just those two functions. Let me know what you think.

The algorithm follows the pseudo-code in the issue.

rlouf · 2023-11-17T15:44:53Z

outlines/models/openai.py

-                        except IndexError:
-                            pass
+
+                greedy = False  # we try to generate the full response at each iteration


We might as well remove the greedy case altogether

Say you have two tokenized choices [T1, TX], [T2, TX]. You would unmask the three tokens and request 2 tokens from the API. If the system is very biased towards TX, it may answer [TX, TX], and you have gained nothing. Repeating the same call you would enter in a loop. Thus, it makes sense to run a greedy step next with just T1 and T2 unmasked, such that you are guaranteed to move one token forward.

Ok, this makes sense.

rlouf · 2023-11-17T15:46:00Z

It might take me until Monday to be able to properly review the PR, thank you for contributing!

HerrIvan · 2023-11-18T15:59:32Z

I pushed changes such that it now computes the prefix matches as sequences of tokens (list of ints) and not as strings. That makes the algorithm 100% consistent with the output generated by the LLM. The code may require some refactoring for readability/testability, but as for the algorithmic solution, I think it is complete. I won't push anything else.

The current approach is greedy, in the sense that it generates a single token at each steps, asking the API to only generate valid next tokens. This mean having to pay for the prompt tokens for every token generated. This commit takes a more optimistic approach. It starts with allowing all tokens present in the sequences, and limiting the length of the generation to the number of tokens in the longest sequence. If the completion is not satisfactory it then takes one greedy step before switching back to the optimistic mode. On average this new approach consumes less tokens than the current one.

rlouf · 2023-11-19T21:57:58Z

This is a very good addition, thank you! I just rebased the commits into a single commit. I will refactor the entire OpenAI integration soon, so I will leave the code as is for now and merge.

rlouf reviewed Nov 17, 2023

View reviewed changes

rlouf linked an issue Nov 18, 2023 that may be closed by this pull request

Reduce token consumption for "choice" generation with openAI. #368

Closed

rlouf force-pushed the feature/optimistic_is_in branch from f70570c to fb1061a Compare November 19, 2023 21:55

rlouf marked this pull request as ready for review November 19, 2023 21:58

rlouf merged commit 76cfc61 into outlines-dev:main Nov 19, 2023
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

"Optimistic" is_in generation for openai. #378

"Optimistic" is_in generation for openai. #378

HerrIvan commented Nov 17, 2023

rlouf Nov 17, 2023

HerrIvan Nov 17, 2023 •

edited

Loading

rlouf Nov 19, 2023

rlouf commented Nov 17, 2023

HerrIvan commented Nov 18, 2023

rlouf commented Nov 19, 2023

"Optimistic" is_in generation for openai. #378

"Optimistic" is_in generation for openai. #378

Conversation

HerrIvan commented Nov 17, 2023

rlouf Nov 17, 2023

Choose a reason for hiding this comment

HerrIvan Nov 17, 2023 • edited Loading

Choose a reason for hiding this comment

rlouf Nov 19, 2023

Choose a reason for hiding this comment

rlouf commented Nov 17, 2023

HerrIvan commented Nov 18, 2023

rlouf commented Nov 19, 2023

HerrIvan Nov 17, 2023 •

edited

Loading