Auto-apply chat template in `SequenceGenerator` and `SequenceGeneratorAdapter`, if available #1019

leloykun · 2024-07-05T16:24:16Z

This PR auto-applies chat templates by default when using instruct/chat models. Doesn't support LlamaCPP for now tho.

Why?

Instruct/Chat models tend to be annoyingly template dependent (i.e. they perform worse if the prompts don't follow the chat template used in the finetuning step). And the more and longer they are finetuned, the worse the problem gets. Hence this PR.

Also see this issue #987 raised by @lapp0

Interface changes

This PR has minimal impact on the interface. It just changes the default behavior of the generation step.

However, this feature can be disabled either on the creation of the generator:

generator = outlines.generate.choice(model, ["Positive", "Negative"], apply_chat_template=False)

Or when calling the generator

answer = generator(prompt, apply_chat_template=False)

lapp0

Thanks so much for implementing this!

There's some things I'd like to see before this is ready for merge, could you please address the following

Test cases which ensure application of the chat template works properly for a standard case, for a missing chat template
Consider that some users of Outlines are relying on the current behavior and make apply_chat_template=False the default. Perhaps we can warn the user as well if they're not setting the parameter.
Add apply_chat_template explanation to the docs
Use apply_chat_template=True in all models.transformers generator examples

lapp0 · 2024-07-05T18:16:42Z

outlines/generate/api.py

+    from outlines.models.transformers import TransformerTokenizer
+
+    if isinstance(prompts, str):
+        prompts = [prompts]


I think the signature should be List[str] -> List[str] and raise an error if a list isn't passed. In transformers.py, this function is called after the prompts are normalized to 2D anyways.

lapp0 · 2024-07-05T18:19:00Z

outlines/generate/api.py

+        )
+        return prompts
+    tokenizer: "TransformerTokenizer" = model.tokenizer
+    if getattr(tokenizer.tokenizer, "chat_template", None) is None:


Should we use default_chat_template instead?

https://huggingface.co/docs/transformers/v4.34.0/chat_templating#how-do-chat-templates-work

lapp0 · 2024-07-05T18:22:18Z

outlines/generate/api.py

        **model_specific_params,
    ):
        """Return a text generator from a prompt or a list of prompts."""
+        if apply_chat_template is None:


To make this pythonic, we should have one obvious way of applying a chat template. IMO the argument should only be accepted in the constructor.

lapp0 · 2024-07-05T18:31:58Z

outlines/models/transformers.py

+        )
+
+    def apply_chat_template(self, prompt: str) -> str:
+        messages = self.get_messages(prompt)


We should probably have this be part of class Tokenizer and raise NotImplementedError by default.

rlouf · 2024-07-05T20:08:42Z

I don't think this design is coherent with the rest of the library; we want to avoid kwargs as much as we possibly can. Here I would simply add a apply_chat_template to the model instance.

leloykun · 2024-07-07T16:53:27Z

@lapp0 @rlouf

How do you guys think should the interface look like?

Here, I mirrored Huggingface's Pipeline interface where we can specify configs/args in the constructor and (optionally) override them in the model call. I like it cuz it's more flexible. But yah, it does make things a bit more complicated and less pythonic.

I did a quick look at other libraries, and it seems that they either (1) don't auto-apply the chat templates at all or (2) have a separate code path for base & instruct/chat models (e.g. TGI & MixEval). I think there are two reasons why:

It's hard to know which models are base models and which are instruct/chat models. I thought checking whether chat_template is None or not would suffice. But some chat models apparently just leave them out (especially the older ones & third-party finetunes). Additionally, transformers' PreTrainedTokenizerBase base class has a default_chat_template property--so, if I'm not mistaken, we can run tokenizer.apply_chat_template on all tokenizers without erroring out. And
Some models don't support system prompts and it's hard to know which ones do and which ones don't.

So yah, for now, we need a way to somehow let the library know that whether we're dealing with a base model or an instruct/chat model. Worst case is we might also need to ask the user to specify the system prompt. But if we're gonna force them to go all that trouble anyway, we might as well not do this by default.

I think a good compromise is to (1) not apply the chat templates by default but (2) warn the user if the chat template is specified in the tokenizer config but is not being used.

rlouf · 2024-07-07T19:27:42Z

answer = generator(
   model.apply_chat_template(prompt)
)

It would substantially simplify the code in this PR as well.

alonsosilvaallende · 2024-07-13T12:40:31Z

I like this pull request that adds the possibility to use the tokenizer's apply_chat_template, but I wonder if it's a good idea to make it the default behavior. I have had very bad experiences with apply_chat_template where it add spaces or remove spaces when it shouldn't or even worse in function calling cases where it completely ignores the functions given without even raising errors (see this example). Many people might complain about something which is outside the control of Outlines.

rlouf · 2024-07-13T13:34:37Z

Indeed, we are not going to make it the default behavior. Users should be able to inspect the modified prompt before they pass it to the generator.

leloykun added 4 commits July 5, 2024 23:10

add chat template support

305d1e5

also do it in SequenceGeneratorAdapter

174e3d0

improve interface

5c372df

fix imports

46733f8

lapp0 suggested changes Jul 5, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Auto-apply chat template in `SequenceGenerator` and `SequenceGeneratorAdapter`, if available #1019

Auto-apply chat template in `SequenceGenerator` and `SequenceGeneratorAdapter`, if available #1019

leloykun commented Jul 5, 2024

lapp0 left a comment

lapp0 Jul 5, 2024

lapp0 Jul 5, 2024

lapp0 Jul 5, 2024

lapp0 Jul 5, 2024

rlouf commented Jul 5, 2024 •

edited

Loading

leloykun commented Jul 7, 2024

rlouf commented Jul 7, 2024 •

edited

Loading

alonsosilvaallende commented Jul 13, 2024 •

edited

Loading

rlouf commented Jul 13, 2024

Auto-apply chat template in SequenceGenerator and SequenceGeneratorAdapter, if available #1019

Are you sure you want to change the base?

Auto-apply chat template in SequenceGenerator and SequenceGeneratorAdapter, if available #1019

Conversation

leloykun commented Jul 5, 2024

Why?

Interface changes

lapp0 left a comment

Choose a reason for hiding this comment

lapp0 Jul 5, 2024

Choose a reason for hiding this comment

lapp0 Jul 5, 2024

Choose a reason for hiding this comment

lapp0 Jul 5, 2024

Choose a reason for hiding this comment

lapp0 Jul 5, 2024

Choose a reason for hiding this comment

rlouf commented Jul 5, 2024 • edited Loading

leloykun commented Jul 7, 2024

rlouf commented Jul 7, 2024 • edited Loading

alonsosilvaallende commented Jul 13, 2024 • edited Loading

rlouf commented Jul 13, 2024

Auto-apply chat template in `SequenceGenerator` and `SequenceGeneratorAdapter`, if available #1019

Auto-apply chat template in `SequenceGenerator` and `SequenceGeneratorAdapter`, if available #1019

rlouf commented Jul 5, 2024 •

edited

Loading

rlouf commented Jul 7, 2024 •

edited

Loading

alonsosilvaallende commented Jul 13, 2024 •

edited

Loading