You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
What behavior of the library made you think about the improvement?
The current solution calls the openAI api once per token. This may be costly to the point of rendering this library not valid when using openAI (at least the "is_in" completion). Specially, for languages other than English, where even short choices require multiple tokens (acting as a cost multiplier over the prompt tokens).
Indeed, in English this may not be always a problem:
prompt="""You are a sentiment-labelling assistant.Is the following review positive or negative?Review: This restaurant is just awesome! """answer=model(prompt, is_in=["Positive", "Negative"])
results in:
Api response: 'Positive'
Current solution: ['Positive']
Answer: Positive
But see the same example in Spanish:
prompt="""Eres un asistente etiquetador de sentimientos.La siguiente reseña, ¿es positiva o negativa?Reseña: Este restaurante es increíble! """answer=model(prompt, is_in=[ "Positiva", "Negativa"])
Results in:
Api response: 'Pos'
Solution: ['Pos']
Api response: 'it'
Solution: ['Pos', 'it']
Api response: 'iva'
Solution: ['Pos', 'it', 'iva']
Answer: Positiva
How would you like it to behave?
The proposed solution is heuristic. It is based in trying "optimistic" requests. Sketched algorithm:
We activate all tokens that may construct a
valid answer and directly ask for max_tokens to the API.
After that:
- In case the response exactly matches one of the complete
choices, return it.
- If it does not completely match any:
* If it contains a valid prefix for at least one choice,
filter the choices by the given prefix, extend the prompt
with the current prefix, and iterate again.
* If there was no valid prefix, make a _greedy_ call to the
API just with the valid next tokens unmasked and max_tokens
set to 1, and iterate again.
Additionally, we can monitor at which point the amount of still
valid choices reduces to 1, and then directly construct the answer,
without querying the API anymore.
Pros:
Much reduced token consumption. Final response will in general coincide with the one of the 1-token-per-call approach.
Cons:
Sometimes it may involve retries, when the API responds with a string/token that is no a prefix of any valid choice. This can be resolved with a subsequent greedy call.
(Theoretical) worst-case: Inconsistent prefixes in string and token domains:
Say you have choices "No, I don't know" and "Isn't it obvious?". The API could return "I don't know". That response has no common token prefix with any choice, but it does have a string one, namely with "Isn't it obvious?". That would lead the algorithm to return "Isn't it obvious?" which has the opposite semantics to the LLM response.
Note that, this could be solved filtering the responses based on token-prefixes instead of string ones.
Examples
The proposed solution would, in the first case, change nothing:
requesting 1 token(s)
Api response: 'Positive'
Solution: ['Positive']
Answer: Positive
But resolve the second in just one call.
requesting 3 token(s)
api response: 'Positiva'
solution: ['Positiva']
Answer: Positiva
Problematic cases
In general, to get into trouble one needs to do some adversarial prompting. For instance, misleading the LLM in answering first with a token which is not at the beginning of any correct choice.
prompt = """You are a sentiment-labelling assistant.
Is the next review 'positive' or 'negative'?
Review: This restaurant is just awesome!
"""
answer = model(prompt, is_in=["the review is positive", "the review is negative"])
Output:
requesting 4 token(s)
Api response: ' positive is positive review'
Current solution: []
requesting 1 token(s) # <-- greedy call
Api response: 'the'
Current solution: ['the']
requesting 3 token(s)
Api response: ' positive is positive'
Current solution: ['the', ' ']
requesting 3 token(s)
Api response: 'review is positive'
Current solution: ['the', ' ', 'review is positive']
Answer: the review is positive
The text was updated successfully, but these errors were encountered:
I'm on board with this change, would you like to open a PR? You'd probably want to tokenize the possible answers and set max_tokens to length of the largest array, and bias the logits in the first request?
What behavior of the library made you think about the improvement?
The current solution calls the openAI api once per token. This may be costly to the point of rendering this library not valid when using openAI (at least the "is_in" completion). Specially, for languages other than English, where even short choices require multiple tokens (acting as a cost multiplier over the prompt tokens).
Indeed, in English this may not be always a problem:
results in:
But see the same example in Spanish:
Results in:
How would you like it to behave?
The proposed solution is heuristic. It is based in trying "optimistic" requests. Sketched algorithm:
Pros:
Cons:
"No, I don't know"
and"Isn't it obvious?"
. The API could return"I don't know"
. That response has no common token prefix with any choice, but it does have a string one, namely with"Isn't it obvious?"
. That would lead the algorithm to return"Isn't it obvious?"
which has the opposite semantics to the LLM response.Examples
The proposed solution would, in the first case, change nothing:
But resolve the second in just one call.
Problematic cases
In general, to get into trouble one needs to do some adversarial prompting. For instance, misleading the LLM in answering first with a token which is not at the beginning of any correct choice.
Output:
The text was updated successfully, but these errors were encountered: