Implement prompt/generation alignment #531

RobinPicard · 2024-01-11T23:31:55Z

[updated 2024-06-28]

The aim of this PR to implement prompt token alignment

The idea is to modify the states_to_token_maps of the Guide to include in it the characters of some of the last tokens of the prompt that could be replaced by a different token that contains the same set of characters plus characters for the generation (a crossing token).

To do so, when receiving the prompts of the user (so after the OutlinesLogitsProcessor has already been initialized with its FSM), we copy the FSM as many as their are prompts and we apply to each of them prompt token alignment (as the modification of the states_to_token_maps depends on the content of each prompt).

At the end of the process, we modify the generated sequences to remove the characters at the beginning that correspond to the ends of the user prompts

rlouf · 2024-01-27T09:40:49Z

This is not intended to be merged, I was just wondering whether you think this is a promising direction to look into

I think this is the right general direction.

The case in which the text after the "boundary" of a token matching the end of the prompt does not exist in the vocabulary by itself is not covered

Could you illustrate this? I had a PR opened (can't find it right now) where I iterated once over the vocabulary to find the overlapping tokens.

RobinPicard · 2024-01-27T10:03:01Z

Could you illustrate this? I had a PR opened (can't find it right now) where I iterated once over the vocabulary to find the overlapping tokens.

Making up a fake example. My prompt is "Good mor". Let's say there's a token for "mor" and it's the last one of the prompt. We would want token alignment to replace "mor" with "morning". However, if the token "ning" by itself does not exist, then there's nothing in the states_to_token_maps that correspond to it as, at this point, the character-based FSM that would allow to generate "ning" has already been turned into a token-based mapping.

I was looking at creating states_to_token_maps only after the call is made (and the FSM is updated) but that would add too much overhead.

I was then thinking that a solution could be to create at initialization a mapping that contains both information about characters and about tokens (so we would have some states with no tokens leading to them that would be used for the token alignement)

rlouf · 2024-01-27T10:18:12Z

How about looping over the entire vocabulary and store the tokens that accept mor as a prefix. Then, in the unconstrained case the first state of the FSM would have transitions to the overlapping tokens only?

Haven't taken the time to think about the constrained case yet.

RobinPicard · 2024-01-30T09:08:24Z

I had not realized that I could walk the states_to_token_maps character by character for the postfix part of the crossing tokens in the constrained case. I think it works with almost no additional overhead like that. Let me know if you think it's fine and I'll update the tests afterward

outlines/fsm/fsm.py

rlouf · 2024-02-10T21:49:58Z

Yes I think that's the right approach. There's some stuff to figure out in terms of design, but otherwise looks good.

RobinPicard · 2024-02-17T12:16:30Z

I'll write unit tests next if you think having those separate functions is the right design

outlines/fsm/fsm.py

outlines/generate/api.py

outlines/fsm/fsm.py

rlouf

I have made several comments on the overall design, but nothing that would dramatically affect your implementation. You can start implementing tests.

shawnz

Hi there, I am just a user here who is looking forward to this change. However, I noticed that there is an error if the model is running on a GPU. I think it could be fixed by passing the device in these two statements here (at least, this fixes it for me)

outlines/generate/api.py

rlouf · 2024-03-01T12:40:33Z

We're getting really close. There are a few design changes remaining, and mostly we should have comprehensive tests before merging.

rlouf · 2024-03-11T21:13:47Z

I rebased your branch on main after a big refactor of the FSM interface. I will take a closer look this week.

RobinPicard · 2024-04-12T08:10:23Z

Is this still something we want to work on?

rlouf · 2024-04-12T08:41:12Z

Yes! I'm currently thinking about how we could integrate that to the logits processors since most integration are going to use this :)

shawnz · 2024-05-30T16:52:59Z

Sorry to prod but please don't lose sight of this! I think this is a very important change to make Outlines the most competitive structured generation system

rlouf · 2024-06-19T09:08:52Z

I think it is time to revisit this as #966 is about to be merged and the custom sampling loop will be removed. We can still implement this via passing logit processors to downstream libraries. Effectively we will be adding this feature to every upstream library :)

@RobinPicard are you still interested in implementing this?

RobinPicard · 2024-06-20T05:26:22Z

@RobinPicard are you still interested in implementing this?

I can look at adapting it to the change made this week end

lapp0 · 2024-06-20T10:23:49Z

@RobinPicard are you still interested in implementing this?

I can look at adapting it to the change made this week end

That's great news!

Please let me know if you run into any issues or have any questions about OutlinesLogitsProcessor.

You probably want to branch from #966 since it has fixes to the logits processors and a more detailed docstring.

RobinPicard · 2024-06-25T09:55:59Z

To make sure I understand the wider context, the plan is to eventually remove SequenceGenerator and only use SequenceGeneratorAdapter @lapp0, right? If so, should we implement it for both of those or only the latter?

rlouf · 2024-06-27T00:03:04Z

Indeed

RobinPicard · 2024-06-28T00:11:32Z

I rebased on your branch and modified my initial commit @lapp0

rlouf · 2024-07-16T13:17:28Z

Could you rebase on main now that #966 was merged?

rlouf added enhancement transformers Linked to the `transformers` integration correctness Everything related to the generation correctness labels Jan 16, 2024

rlouf linked an issue Jan 27, 2024 that may be closed by this pull request

Implement prompt/generation alignment #161

Open

RobinPicard marked this pull request as ready for review January 30, 2024 09:08

rlouf reviewed Feb 10, 2024

View reviewed changes