Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: improve tools to include name and add tests #1693

Merged
merged 9 commits into from
Apr 16, 2024

Conversation

drbh
Copy link
Collaborator

@drbh drbh commented Apr 2, 2024

This PR makes tool calling aware of the name of the function selected.

Fixes: #1657

Thank you @puppetm4st3r for the helpful snippets, large parts of this PR are simply refactors of the code shared 🙏

**opening draft PR because small tweaks are needed before merging

@puppetm4st3r
Copy link

puppetm4st3r commented Apr 2, 2024

thanks to you, you are doing a great job with TGI, it is an honor to be able to contribute with a little grain of sand.

I have looked the changes on this PR, and I can comment that during my local tests, the tool prompt greatly influences the final performance of the tool selection or error notification when none can be selected, adding -------------{}{} ----------- delimiting to build the final prompt influences very heavy on the results, so I recommend considering format the final prompt delimited like this:

// Proceed only if tool_prompt is not None
     if let Some(tool_prompt) = &req.tool_prompt {
         // Find the last message with role 'user'
         if let Some(last_user_message) = req.messages.iter_mut().rev().find(|msg| msg.role == "user") {
             // Generate the additional content combining tool_prompt and tools_str
             let additional_content = format!("\n\n---------------------------\n{}{}\n---- -----------------------", tool_prompt, tools_str);
            
             // If the last user message has existing content, append the additional content to it
             if let Some(content) = &mut last_user_message.content {
                 content.push_str(&additional_content); // Append to the existing content
             } else {
                 // If, for some reason, the content is None, replace it with the additional_content
                 last_user_message.content = Some(additional_content);
             }
         }
         // Note: If there is no message with role 'user', this block does nothing
     }

also changing the prompt from lib.rs

fn default_tool_prompt() -> Option<String> {
    Some(
        "\nBased on the conversation, please choose the most appropriate tool to use: ".to_string(),
    )
}

to:

You will be presented with a JSON schema representing a set of tools.\nIf the user request lacks of sufficient information to make a precise tool selection: Do not invent any tool's properties, instead notify with an error message.\ n\nJSON Schema:\n

has a positive impact in the performance of the llm on the tool selection task.

to get the final prompt delimited I have to change the order of processing in the request, in the original code the tokenizer was used very on the first fragments on the code, to build a delimited final prompt I have to move the tokenizer use after the grammar generation and function definitions in order to build a final prompt with all stuff ready before tokenizer usage.

ps: another important behaviour I have noticed in this tests is that on LLM grammar guidances when the prompt is clear and has a refined prompt engineer the speed of LLM inference performs a lot way better, when the prompt lacks of context the llm take a lot of time to output the generated json output, I guess that a refinen prompt will benefit the statistical process to limit the generation of tokens that the guidance framework must validate, so this validation is less restrictive and the llm generates tokens that are in line with what the guidance expects.

@drbh
Copy link
Collaborator Author

drbh commented Apr 4, 2024

Hi @puppetm4st3r thank you for the feedback on the PR , I've updated the default prompt and looked into changing how the tools are appended to the input.

Currently I'm not 100% sure what approach is best for passing tools to the LLM.

At the moment we send:

{inputs}{tool_prompt}{tools_str}

where input is the users input after the template is applied to the incoming messages.

The method proposed in your PR seems to append the {tool_prompt}{tools_str} to the last message, and then applies the chat template.

Its unclear to me which approach is best and would love to find an alternative that enables this flexibility without adding more complexity (an extra template)...

@puppetm4st3r
Copy link

yes, that part is a bit complex in Use cases, my motivation was to append to the last user message in order to try not confuse the llm (<70B llms are very sensitive to malformed chattemplates messages), also implemented the call_id property of the open ai spec because is useful in chathistory for the llm to understand previous turns of conversations that have called a function, the llm have to undestand and distinguish between turns function calls to not get confused, Honestly, I didn't give much thought to what was the best strategy to include the prompt_tool and the tool definitions, I simply tried to maintain the consistency of chattemplatting/messaging with the llm so as not to confuse it and I chose to modify the user's last message since that modification It is ephemeral and does not come back to the user (it is like a patch, I don't like it but I didn't find another direct and simple way to implement it by the moment).

the main focus was support a consistent multi turn chat history, that was my use case and works fine, but I agree the appending to the last message is not the best & clear approach

@puppetm4st3r
Copy link

puppetm4st3r commented Apr 4, 2024

ps: Additionally, first experiment was try to adding a separate user/role/message at the end of the conversation so as not to have to modify the user's last message, but this translates into 2 user role messages in a turn, and some LLMs (<34b and not chatml templates) was giving very bad results since the LLM expect User/Assistant and not user/user/assistant tuples.

Another variable was in my use case my functions/tools involved quite complex reasoning, so they became very sensitive to the "malformed" prompt or chat templates.

Another interesting background is that in simple function calls, such as the typical Open AI example of the weather query API, probably any approach will work relatively well, the point is that as the need for reasoning to choose a function becomes more complex the LLM is becoming more sensitive to what it expects in terms of the chat template, possibly attaching the prompt_tool after the template works for many cases, but in situations closer to the productive world where the LLM must reason in a way probably more complex if it will affect the fact of not respecting the chat template for which the LLM was trained (specially if you send to the api a large chat history)

@puppetm4st3r
Copy link

puppetm4st3r commented Apr 4, 2024

Another option that I test a few months ago is to build a wrapper for the official OPEN AI client, which preset the chat template with the required stuff to execute function calling using json grammar guidance on the server mimics the open AI way on the client and the final response to the user, the benefit you don't touch the server, the cons is that you create a new dependency/artifact to call functions and that breaks the concept of open ai compatible api., is more like open ai compatible client.

this option I feel takes us away from the original purpose

@drbh
Copy link
Collaborator Author

drbh commented Apr 5, 2024

@puppetm4st3r I think for now we'll include the current changes and keep the discussion open regarding tools and the template. It seems like there may be some convergence in using tools within templates https://github.com/huggingface/transformers/blob/76fa17c1663a0efeca7208c20579833365584889/src/transformers/models/cohere/tokenization_cohere_fast.py#L292 and this would require a larger change to the template logic.

@drbh drbh marked this pull request as ready for review April 5, 2024 17:24
@puppetm4st3r
Copy link

puppetm4st3r commented Apr 5, 2024

great, yes Im worked some like that on my locals at chat template level, now Im on the gym, later ar night I can share my chat templates that have result in good llm tool understansing on chat history and large multi turn conversarions, will back with more detailed ideas later! c ya


impl ToolGrammar {
pub fn apply(
tools: Option<&Vec<Tool>>,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can make this being owned directly in the function signature, making all those cloning ops vanish (instead of relying on clever compiler)

@@ -764,6 +763,23 @@ async fn chat_completions(
let logprobs = req.logprobs.unwrap_or(false);
let seed = req.seed;
let stop = req.stop.unwrap_or_default();
let tool_prompt = req.tool_prompt.unwrap_or_default();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think unwrapping ownership should work (see above).

let Request{ stop, seed, tool_prompt } = req;

fn test_apply_with_tools_none() {
let result =
ToolGrammar::apply(None, Some(&ToolType::OneOf)).expect("Failed to apply grammar");
assert_eq!(result, None);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this normal ?

If you send a grammar and a ToolType shouldn't that be an error ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at all the other tests Iḿ confused about behavior.

I like the ability to let the LLM respond there's a error, is it possible to make a test case for it ?

Copy link

@puppetm4st3r puppetm4st3r Apr 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my testing I use the typical weather open AI demo, i asked to the weather on the moon, and the LLM returned in my case an error because the tools are for city weather.

something like that i got in my tests:

For a valid query the llm response is:

{
  "function": {
    "format": "celsius",
    "location": "Paris",
    "name": "get_current_weather_by_city"
  }
}

If I ask for the temperature in the moon for example, the llm response is:

{
  "function": {
    "error": "The request cannot be resolved with the available tools. The request requests the temperature on the moon, but the available tools can only provide weather information for cities on Earth. Please try again with a city on the earth."
  }
}

now due to the undeterministic LLM responses i will put focus on the existence of the error key in the json results, the message may change on every turn.

@puppetm4st3r
Copy link

continuing the refinement of the idea,

Let's review the key findings we have uncovered during the implementation of this feature

  1. Models are highly sensitive to the chat template, with some templates not performing well with sentences containing non-alphanumeric characters, such as a JSON schema. Through experimentation, the "ChatML" template was found to be the most effective.

  2. Models not natively trained for function calling, attempting to mimic this behavior, are constrained to generating a JSON schema structure, hence incapable of returning feedback or non tool selection natural text. To mitigate this, the previous PR added the capability for the LLM to communicate errors (appending a "system" tool in a non clear or straightforward way), which is some crucial as this kind of models will not support optional use of tools.

  3. Due to the sensitivity to chat templates, the user’s prompt must include instructions for the model to know which JSON schema to use for text generation. Without this, the generation process could be significantly delayed or result in empty tokens until the context window is filled.

  4. Effective prompt engineering significantly enhances the model's tool selection quality, with even error notification potentially failing if the prompt is unclear or does not adhere to the chat template.

  5. For chat history, only ChatML considers roles for tools (representing a tool call by the LLM), where the call ID is essential for the LLM to understand and logically sequence the calls and responses from the selected tools. However, in models not trained to understand the tool role, a workaround has been to modify the Jinja template of the chat template to render the tool call as part of the Assistant role, like "I have decided to call the following tool: …".

Conclusión:
Incorporating tool functionality into the TGI API presents significant challenges, particularly in maintaining clean integration without resorting to less desirable practices like modifying the last user role message. The interplay between chat templates, prompt engineering, and model capabilities is crucial for effective function calling, necessitating careful design and customization to achieve seamless and efficient tool integration and communication.

Thats for now, we will need to keep double clicking on these ideas and concepts.

@puppetm4st3r
Copy link

puppetm4st3r commented Apr 9, 2024

ps: in my use case that work for me was modify the chatML template in order to support tools for non-pretrained LLM for tool call understanding plus appending the prompt_tools and prompt_str to the last user's role message, thats work fine, but it is not the most direct and clean solution. We now whats work but we need to implement that in a clear and best practice way.

this is my chatML modified template:

{% for message in messages %}
  {%- if message['role'] == 'system' %}
<|im_start|>system\n{{ message['content'] }}<|im_end|>\n
  {%- elif message['role'] == 'user' %}
<|im_start|>user\n{{ message['content'] }}<|im_end|>\n
  {%- elif message['role'] == 'tool' %}
<|im_start|>tool\ncall_id: {{ message['tool_call_id'] }}\nresult: {{ message['content'] }}<|im_end|>\n
  {%- elif message['role'] == 'assistant' %}
<|im_start|>assistant{%- if message.tool_calls -%}
      {%- for tool_call in message.tool_calls %}
tool_execution: call_id: {{ tool_call.id }}, function: {{ tool_call.function.name }}, arguments: {{ tool_call.function.arguments }}
      {%- endfor -%}
      {%- if not loop.last or not add_generation_prompt %}<|im_end|>{% endif -%}
    {%- else %}
{{message['content'] }}{%- if not loop.last or not add_generation_prompt %}<|im_end|>{% endif -%}
    {% endif -%}
  {% endif -%}
{% endfor -%}
{% if add_generation_prompt -%}
<|im_start|>assistant
{% endif %}

I think the biggest challenge is that mimics tools selection requires both development on the part of the server and actions such as modifying a chat template on the part of the client, which apparently disables the possibility of the responsibility falling solely on the server. Always thinking in multi turn chat interactions, for single turn or instruction interactins with no chat history the movie is relatively much simpler.

Comment on lines 774 to 771
top_logprobs: _,
} = req;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
top_logprobs: _,
} = req;
..
} = req;

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can remove all the _ using ...

@drbh drbh force-pushed the support-function-aware-tools branch from 88f8df8 to 8589bae Compare April 10, 2024 22:53
@drbh
Copy link
Collaborator Author

drbh commented Apr 11, 2024

note: details of tool -> grammar

TGI is able to use tools by leveraging the grammar functionality. A tool is simply a specifically formatted grammar that includes "$functions" and is required to choose anyOf the listed functions.

In order to allow the LLM to not choose one of the user passed functions, we append a notify_error default function that gives the LLM the opportunity to choose if none of the other functions match the output.

An example of the following grammar can be seen expanded below. Note the _name property is reserved to enable this and if specified by a user may result in naming conflicts.

simplifed

{
    "$functions": { 
      "get_current_weather": { .. },
      "get_n_day_weather_forecast": { .. },
      "notify_error": { .. }
    },
    "properties": "function": { "anyOf": [ 
      "get_current_weather",
      "get_n_day_weather_forecast",
      "notify_error"
    ]}
}
actual
{
    "$functions": {
        "get_current_weather": {
            "description": "Get the current weather",
            "properties": {
                "_name": {
                    "const": "get_current_weather",
                    "type": "string"
                },
                "format": {
                    "description": "The temperature unit to use. Infer this from the users location.",
                    "enum": [
                        "celsius",
                        "fahrenheit"
                    ],
                    "type": "string"
                },
                "location": {
                    "description": "The city and state, e.g. San Francisco, CA",
                    "type": "string"
                }
            },
            "required": [
                "location",
                "format",
                "_name"
            ],
            "type": "object"
        },
        "get_n_day_weather_forecast": {
            "description": "Get an N-day weather forecast",
            "properties": {
                "_name": {
                    "const": "get_n_day_weather_forecast",
                    "type": "string"
                },
                "format": {
                    "description": "The temperature unit to use. Infer this from the users location.",
                    "enum": [
                        "celsius",
                        "fahrenheit"
                    ],
                    "type": "string"
                },
                "location": {
                    "description": "The city and state, e.g. San Francisco, CA",
                    "type": "string"
                },
                "num_days": {
                    "description": "The number of days to forecast",
                    "type": "integer"
                }
            },
            "required": [
                "location",
                "format",
                "num_days",
                "_name"
            ],
            "type": "object"
        },
        "notify_error": {
            "properties": {
                "_name": {
                    "const": "notify_error",
                    "type": "string"
                },
                "error": {
                    "description": "The error or issue to notify",
                    "type": "string"
                }
            },
            "required": [
                "error",
                "_name"
            ],
            "type": "object"
        }
    },
    "properties": {
        "function": {
            "anyOf": [
                {
                    "$ref": "#/$functions/get_current_weather"
                },
                {
                    "$ref": "#/$functions/get_n_day_weather_forecast"
                },
                {
                    "$ref": "#/$functions/notify_error"
                }
            ]
        }
    }
}

@drbh drbh force-pushed the support-function-aware-tools branch from e26b5fb to eba4dd8 Compare April 12, 2024 17:03
Copy link
Collaborator

@Narsil Narsil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@drbh drbh self-assigned this Apr 16, 2024
@drbh drbh merged commit 7276d43 into main Apr 16, 2024
7 checks passed
@drbh drbh deleted the support-function-aware-tools branch April 16, 2024 13:02
Nilabhra pushed a commit to TII-AI-Research-Center/text-generation-inference that referenced this pull request May 14, 2024
This PR makes tool calling aware of the name of the function selected. 

Fixes:
huggingface#1657

Thank you @puppetm4st3r for the helpful snippets, large parts of this PR
are simply refactors of the code shared 🙏

**opening draft PR because small tweaks are needed before merging
kdamaszk pushed a commit to kdamaszk/tgi-gaudi that referenced this pull request May 27, 2024
This PR makes tool calling aware of the name of the function selected. 

Fixes:
huggingface#1657

Thank you @puppetm4st3r for the helpful snippets, large parts of this PR
are simply refactors of the code shared 🙏

**opening draft PR because small tweaks are needed before merging
kdamaszk pushed a commit to kdamaszk/tgi-gaudi that referenced this pull request Jun 3, 2024
This PR makes tool calling aware of the name of the function selected. 

Fixes:
huggingface#1657

Thank you @puppetm4st3r for the helpful snippets, large parts of this PR
are simply refactors of the code shared 🙏

**opening draft PR because small tweaks are needed before merging
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Improve LLM's tool awareness
3 participants