-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: improve tools to include name and add tests #1693
Conversation
thanks to you, you are doing a great job with TGI, it is an honor to be able to contribute with a little grain of sand. I have looked the changes on this PR, and I can comment that during my local tests, the tool prompt greatly influences the final performance of the tool selection or error notification when none can be selected, adding -------------{}{} ----------- delimiting to build the final prompt influences very heavy on the results, so I recommend considering format the final prompt delimited like this:
also changing the prompt from lib.rs
to:
has a positive impact in the performance of the llm on the tool selection task. to get the final prompt delimited I have to change the order of processing in the request, in the original code the tokenizer was used very on the first fragments on the code, to build a delimited final prompt I have to move the tokenizer use after the grammar generation and function definitions in order to build a final prompt with all stuff ready before tokenizer usage. ps: another important behaviour I have noticed in this tests is that on LLM grammar guidances when the prompt is clear and has a refined prompt engineer the speed of LLM inference performs a lot way better, when the prompt lacks of context the llm take a lot of time to output the generated json output, I guess that a refinen prompt will benefit the statistical process to limit the generation of tokens that the guidance framework must validate, so this validation is less restrictive and the llm generates tokens that are in line with what the guidance expects. |
Hi @puppetm4st3r thank you for the feedback on the PR , I've updated the default prompt and looked into changing how the tools are appended to the input. Currently I'm not 100% sure what approach is best for passing tools to the LLM. At the moment we send:
where input is the users input after the template is applied to the incoming messages. The method proposed in your PR seems to append the Its unclear to me which approach is best and would love to find an alternative that enables this flexibility without adding more complexity (an extra template)... |
yes, that part is a bit complex in Use cases, my motivation was to append to the last user message in order to try not confuse the llm (<70B llms are very sensitive to malformed chattemplates messages), also implemented the call_id property of the open ai spec because is useful in chathistory for the llm to understand previous turns of conversations that have called a function, the llm have to undestand and distinguish between turns function calls to not get confused, Honestly, I didn't give much thought to what was the best strategy to include the prompt_tool and the tool definitions, I simply tried to maintain the consistency of chattemplatting/messaging with the llm so as not to confuse it and I chose to modify the user's last message since that modification It is ephemeral and does not come back to the user (it is like a patch, I don't like it but I didn't find another direct and simple way to implement it by the moment). the main focus was support a consistent multi turn chat history, that was my use case and works fine, but I agree the appending to the last message is not the best & clear approach |
ps: Additionally, first experiment was try to adding a separate user/role/message at the end of the conversation so as not to have to modify the user's last message, but this translates into 2 user role messages in a turn, and some LLMs (<34b and not chatml templates) was giving very bad results since the LLM expect User/Assistant and not user/user/assistant tuples. Another variable was in my use case my functions/tools involved quite complex reasoning, so they became very sensitive to the "malformed" prompt or chat templates. Another interesting background is that in simple function calls, such as the typical Open AI example of the weather query API, probably any approach will work relatively well, the point is that as the need for reasoning to choose a function becomes more complex the LLM is becoming more sensitive to what it expects in terms of the chat template, possibly attaching the prompt_tool after the template works for many cases, but in situations closer to the productive world where the LLM must reason in a way probably more complex if it will affect the fact of not respecting the chat template for which the LLM was trained (specially if you send to the api a large chat history) |
Another option that I test a few months ago is to build a wrapper for the official OPEN AI client, which preset the chat template with the required stuff to execute function calling using json grammar guidance on the server mimics the open AI way on the client and the final response to the user, the benefit you don't touch the server, the cons is that you create a new dependency/artifact to call functions and that breaks the concept of open ai compatible api., is more like open ai compatible client. this option I feel takes us away from the original purpose |
@puppetm4st3r I think for now we'll include the current changes and keep the discussion open regarding tools and the template. It seems like there may be some convergence in using tools within templates https://github.com/huggingface/transformers/blob/76fa17c1663a0efeca7208c20579833365584889/src/transformers/models/cohere/tokenization_cohere_fast.py#L292 and this would require a larger change to the template logic. |
great, yes Im worked some like that on my locals at chat template level, now Im on the gym, later ar night I can share my chat templates that have result in good llm tool understansing on chat history and large multi turn conversarions, will back with more detailed ideas later! c ya |
router/src/infer.rs
Outdated
|
||
impl ToolGrammar { | ||
pub fn apply( | ||
tools: Option<&Vec<Tool>>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can make this being owned directly in the function signature, making all those cloning ops vanish (instead of relying on clever compiler)
router/src/server.rs
Outdated
@@ -764,6 +763,23 @@ async fn chat_completions( | |||
let logprobs = req.logprobs.unwrap_or(false); | |||
let seed = req.seed; | |||
let stop = req.stop.unwrap_or_default(); | |||
let tool_prompt = req.tool_prompt.unwrap_or_default(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think unwrapping ownership should work (see above).
let Request{ stop, seed, tool_prompt } = req;
router/src/infer.rs
Outdated
fn test_apply_with_tools_none() { | ||
let result = | ||
ToolGrammar::apply(None, Some(&ToolType::OneOf)).expect("Failed to apply grammar"); | ||
assert_eq!(result, None); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this normal ?
If you send a grammar and a ToolType shouldn't that be an error ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at all the other tests Iḿ confused about behavior.
I like the ability to let the LLM respond there's a error, is it possible to make a test case for it ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my testing I use the typical weather open AI demo, i asked to the weather on the moon, and the LLM returned in my case an error because the tools are for city weather.
something like that i got in my tests:
For a valid query the llm response is:
{
"function": {
"format": "celsius",
"location": "Paris",
"name": "get_current_weather_by_city"
}
}
If I ask for the temperature in the moon for example, the llm response is:
{
"function": {
"error": "The request cannot be resolved with the available tools. The request requests the temperature on the moon, but the available tools can only provide weather information for cities on Earth. Please try again with a city on the earth."
}
}
now due to the undeterministic LLM responses i will put focus on the existence of the error key in the json results, the message may change on every turn.
continuing the refinement of the idea, Let's review the key findings we have uncovered during the implementation of this feature
Conclusión: Thats for now, we will need to keep double clicking on these ideas and concepts. |
ps: in my use case that work for me was modify the chatML template in order to support tools for non-pretrained LLM for tool call understanding plus appending the prompt_tools and prompt_str to the last user's role message, thats work fine, but it is not the most direct and clean solution. We now whats work but we need to implement that in a clear and best practice way. this is my chatML modified template:
I think the biggest challenge is that mimics tools selection requires both development on the part of the server and actions such as modifying a chat template on the part of the client, which apparently disables the possibility of the responsibility falling solely on the server. Always thinking in multi turn chat interactions, for single turn or instruction interactins with no chat history the movie is relatively much simpler. |
router/src/server.rs
Outdated
top_logprobs: _, | ||
} = req; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
top_logprobs: _, | |
} = req; | |
.. | |
} = req; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can remove all the _
using ..
.
88f8df8
to
8589bae
Compare
note: details of tool -> grammarTGI is able to use tools by leveraging the grammar functionality. A tool is simply a specifically formatted grammar that includes In order to allow the LLM to not choose one of the user passed functions, we append a An example of the following grammar can be seen expanded below. Note the simplifed {
"$functions": {
"get_current_weather": { .. },
"get_n_day_weather_forecast": { .. },
"notify_error": { .. }
},
"properties": "function": { "anyOf": [
"get_current_weather",
"get_n_day_weather_forecast",
"notify_error"
]}
} actual{
"$functions": {
"get_current_weather": {
"description": "Get the current weather",
"properties": {
"_name": {
"const": "get_current_weather",
"type": "string"
},
"format": {
"description": "The temperature unit to use. Infer this from the users location.",
"enum": [
"celsius",
"fahrenheit"
],
"type": "string"
},
"location": {
"description": "The city and state, e.g. San Francisco, CA",
"type": "string"
}
},
"required": [
"location",
"format",
"_name"
],
"type": "object"
},
"get_n_day_weather_forecast": {
"description": "Get an N-day weather forecast",
"properties": {
"_name": {
"const": "get_n_day_weather_forecast",
"type": "string"
},
"format": {
"description": "The temperature unit to use. Infer this from the users location.",
"enum": [
"celsius",
"fahrenheit"
],
"type": "string"
},
"location": {
"description": "The city and state, e.g. San Francisco, CA",
"type": "string"
},
"num_days": {
"description": "The number of days to forecast",
"type": "integer"
}
},
"required": [
"location",
"format",
"num_days",
"_name"
],
"type": "object"
},
"notify_error": {
"properties": {
"_name": {
"const": "notify_error",
"type": "string"
},
"error": {
"description": "The error or issue to notify",
"type": "string"
}
},
"required": [
"error",
"_name"
],
"type": "object"
}
},
"properties": {
"function": {
"anyOf": [
{
"$ref": "#/$functions/get_current_weather"
},
{
"$ref": "#/$functions/get_n_day_weather_forecast"
},
{
"$ref": "#/$functions/notify_error"
}
]
}
}
} |
e26b5fb
to
eba4dd8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
This PR makes tool calling aware of the name of the function selected. Fixes: huggingface#1657 Thank you @puppetm4st3r for the helpful snippets, large parts of this PR are simply refactors of the code shared 🙏 **opening draft PR because small tweaks are needed before merging
This PR makes tool calling aware of the name of the function selected. Fixes: huggingface#1657 Thank you @puppetm4st3r for the helpful snippets, large parts of this PR are simply refactors of the code shared 🙏 **opening draft PR because small tweaks are needed before merging
This PR makes tool calling aware of the name of the function selected. Fixes: huggingface#1657 Thank you @puppetm4st3r for the helpful snippets, large parts of this PR are simply refactors of the code shared 🙏 **opening draft PR because small tweaks are needed before merging
This PR makes tool calling aware of the name of the function selected.
Fixes: #1657
Thank you @puppetm4st3r for the helpful snippets, large parts of this PR are simply refactors of the code shared 🙏
**opening draft PR because small tweaks are needed before merging