Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce token consumption for "choice" generation with openAI. #368

Closed
HerrIvan opened this issue Nov 15, 2023 · 2 comments · Fixed by #378
Closed

Reduce token consumption for "choice" generation with openAI. #368

HerrIvan opened this issue Nov 15, 2023 · 2 comments · Fixed by #378

Comments

@HerrIvan
Copy link
Contributor

What behavior of the library made you think about the improvement?

The current solution calls the openAI api once per token. This may be costly to the point of rendering this library not valid when using openAI (at least the "is_in" completion). Specially, for languages other than English, where even short choices require multiple tokens (acting as a cost multiplier over the prompt tokens).

Indeed, in English this may not be always a problem:

prompt = """You are a sentiment-labelling assistant.
Is the following review positive or negative?

Review: This restaurant is just awesome! 
"""

answer = model(prompt, is_in=["Positive", "Negative"])

results in:

Api response: 'Positive'
Current solution: ['Positive']
Answer: Positive

But see the same example in Spanish:

prompt = """Eres un asistente etiquetador de sentimientos.
La siguiente reseña, ¿es positiva o negativa?

Reseña: Este restaurante es increíble! 
"""

answer = model(prompt, is_in=[ "Positiva", "Negativa"])

Results in:

Api response: 'Pos'
Solution: ['Pos']

Api response: 'it'
Solution: ['Pos', 'it']

Api response: 'iva'
Solution: ['Pos', 'it', 'iva']

Answer: Positiva

How would you like it to behave?

The proposed solution is heuristic. It is based in trying "optimistic" requests. Sketched algorithm:

We activate all tokens that may construct a 
valid answer and directly ask for max_tokens to the API. 

After that:

- In case the response exactly matches one of the complete 
  choices, return it. 

- If it does not completely match any:
  * If it contains a valid prefix for at least one choice, 
    filter the choices by the given prefix, extend the prompt 
    with the current prefix, and iterate again.
  * If there was no valid prefix, make a _greedy_ call to the 
    API just with the valid next tokens unmasked and max_tokens 
    set to 1, and iterate again.

Additionally, we can monitor at which point the amount of still 
valid choices reduces to 1, and then directly construct the answer, 
without querying the API anymore.

Pros:

  • Much reduced token consumption. Final response will in general coincide with the one of the 1-token-per-call approach.

Cons:

  • Sometimes it may involve retries, when the API responds with a string/token that is no a prefix of any valid choice. This can be resolved with a subsequent greedy call.
  • (Theoretical) worst-case: Inconsistent prefixes in string and token domains:
    • Say you have choices "No, I don't know" and "Isn't it obvious?". The API could return "I don't know". That response has no common token prefix with any choice, but it does have a string one, namely with "Isn't it obvious?". That would lead the algorithm to return "Isn't it obvious?" which has the opposite semantics to the LLM response.
    • Note that, this could be solved filtering the responses based on token-prefixes instead of string ones.

Examples

The proposed solution would, in the first case, change nothing:

requesting 1 token(s)

Api response: 'Positive'
Solution: ['Positive']

Answer: Positive

But resolve the second in just one call.

requesting 3 token(s)

api response: 'Positiva'
solution: ['Positiva']

Answer: Positiva

Problematic cases

In general, to get into trouble one needs to do some adversarial prompting. For instance, misleading the LLM in answering first with a token which is not at the beginning of any correct choice.

prompt = """You are a sentiment-labelling assistant.
Is the next review 'positive' or 'negative'?

Review: This restaurant is just awesome! 
"""

answer = model(prompt, is_in=["the review is positive", "the review is negative"])

Output:

requesting 4 token(s)
Api response: ' positive is positive review'
Current solution: []

requesting 1 token(s)   # <-- greedy call
Api response: 'the'
Current solution: ['the']

requesting 3 token(s)
Api response: ' positive is positive'
Current solution: ['the', ' ']

requesting 3 token(s)
Api response: 'review is positive'
Current solution: ['the', ' ', 'review is positive']

Answer: the review is positive
@rlouf
Copy link
Member

rlouf commented Nov 17, 2023

I'm on board with this change, would you like to open a PR? You'd probably want to tokenize the possible answers and set max_tokens to length of the largest array, and bias the logits in the first request?

@HerrIvan
Copy link
Contributor Author

I did not see that you have reacted to it. I'll rebase and make the PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants