Reduce token consumption for "choice" generation with openAI. #368

HerrIvan · 2023-11-15T13:57:32Z

What behavior of the library made you think about the improvement?

The current solution calls the openAI api once per token. This may be costly to the point of rendering this library not valid when using openAI (at least the "is_in" completion). Specially, for languages other than English, where even short choices require multiple tokens (acting as a cost multiplier over the prompt tokens).

Indeed, in English this may not be always a problem:

prompt = """You are a sentiment-labelling assistant.
Is the following review positive or negative?

Review: This restaurant is just awesome! 
"""

answer = model(prompt, is_in=["Positive", "Negative"])

results in:

Api response: 'Positive'
Current solution: ['Positive']
Answer: Positive

But see the same example in Spanish:

prompt = """Eres un asistente etiquetador de sentimientos.
La siguiente reseña, ¿es positiva o negativa?

Reseña: Este restaurante es increíble! 
"""

answer = model(prompt, is_in=[ "Positiva", "Negativa"])

Results in:

Api response: 'Pos'
Solution: ['Pos']

Api response: 'it'
Solution: ['Pos', 'it']

Api response: 'iva'
Solution: ['Pos', 'it', 'iva']

Answer: Positiva

How would you like it to behave?

The proposed solution is heuristic. It is based in trying "optimistic" requests. Sketched algorithm:

We activate all tokens that may construct a 
valid answer and directly ask for max_tokens to the API. 

After that:

- In case the response exactly matches one of the complete 
  choices, return it. 

- If it does not completely match any:
  * If it contains a valid prefix for at least one choice, 
    filter the choices by the given prefix, extend the prompt 
    with the current prefix, and iterate again.
  * If there was no valid prefix, make a _greedy_ call to the 
    API just with the valid next tokens unmasked and max_tokens 
    set to 1, and iterate again.

Additionally, we can monitor at which point the amount of still 
valid choices reduces to 1, and then directly construct the answer, 
without querying the API anymore.

Pros:

Much reduced token consumption. Final response will in general coincide with the one of the 1-token-per-call approach.

Cons:

Sometimes it may involve retries, when the API responds with a string/token that is no a prefix of any valid choice. This can be resolved with a subsequent greedy call.
(Theoretical) worst-case: Inconsistent prefixes in string and token domains:
- Say you have choices "No, I don't know" and "Isn't it obvious?". The API could return "I don't know". That response has no common token prefix with any choice, but it does have a string one, namely with "Isn't it obvious?". That would lead the algorithm to return "Isn't it obvious?" which has the opposite semantics to the LLM response.
- Note that, this could be solved filtering the responses based on token-prefixes instead of string ones.

Examples

The proposed solution would, in the first case, change nothing:

requesting 1 token(s)

Api response: 'Positive'
Solution: ['Positive']

Answer: Positive

But resolve the second in just one call.

requesting 3 token(s)

api response: 'Positiva'
solution: ['Positiva']

Answer: Positiva

Problematic cases

In general, to get into trouble one needs to do some adversarial prompting. For instance, misleading the LLM in answering first with a token which is not at the beginning of any correct choice.

prompt = """You are a sentiment-labelling assistant.
Is the next review 'positive' or 'negative'?

Review: This restaurant is just awesome! 
"""

answer = model(prompt, is_in=["the review is positive", "the review is negative"])

Output:

requesting 4 token(s)
Api response: ' positive is positive review'
Current solution: []

requesting 1 token(s)   # <-- greedy call
Api response: 'the'
Current solution: ['the']

requesting 3 token(s)
Api response: ' positive is positive'
Current solution: ['the', ' ']

requesting 3 token(s)
Api response: 'review is positive'
Current solution: ['the', ' ', 'review is positive']

Answer: the review is positive

The text was updated successfully, but these errors were encountered:

rlouf · 2023-11-17T13:28:41Z

I'm on board with this change, would you like to open a PR? You'd probably want to tokenize the possible answers and set max_tokens to length of the largest array, and bias the logits in the first request?

HerrIvan · 2023-11-17T13:31:15Z

I did not see that you have reacted to it. I'll rebase and make the PR.

HerrIvan added the enhancement label Nov 15, 2023

HerrIvan mentioned this issue Nov 17, 2023

Remove unused arguments from openai generate functions. #376

Merged

HerrIvan mentioned this issue Nov 17, 2023

"Optimistic" is_in generation for openai. #378

Merged

rlouf linked a pull request Nov 18, 2023 that will close this issue

"Optimistic" is_in generation for openai. #378

Merged

rlouf closed this as completed in #378 Nov 19, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reduce token consumption for "choice" generation with openAI. #368

Reduce token consumption for "choice" generation with openAI. #368

HerrIvan commented Nov 15, 2023

rlouf commented Nov 17, 2023 •

edited

Loading

HerrIvan commented Nov 17, 2023

Reduce token consumption for "choice" generation with openAI. #368

Reduce token consumption for "choice" generation with openAI. #368

Comments

HerrIvan commented Nov 15, 2023

What behavior of the library made you think about the improvement?

How would you like it to behave?

Pros:

Cons:

Examples

Problematic cases

rlouf commented Nov 17, 2023 • edited Loading

HerrIvan commented Nov 17, 2023

rlouf commented Nov 17, 2023 •

edited

Loading