RegexPrefixAllowedTokens does not work for batch #789

randomcodelookup · 2024-04-05T21:11:40Z

Describe the issue as clearly as possible:

In these lines of code,

outlines/outlines/integrations/transformers.py

Lines 87 to 100 in 3a41b0e

    
               def __call__(self, batch_id: int, sent: torch.Tensor) -> List[int]: 
        
                   """Use the FSM to bias the logits before sampling the next token. 
        
                   Parameters 
        
                   ---------- 
        
                   batch_id 
        
                       The index of the current batch. 
        
                   sent 
        
                       The tokens of the current sentence. 
        
                   Returns 
        
                   ------- 
        
                   List[int] 
        
                       The indices of the tokens that are allowed to be sampled next.

it seems that the batch id is not used at all, and there is an error triggered when batch is used and input_ids is empty.

outlines/outlines/integrations/transformers.py

Line 116 in 3a41b0e

last_token = input_ids[-1]

Do you know how this can be fixed? Thanks! @saattrupdan @rlouf

Steps/code to reproduce the bug:

# Based on https://github.com/outlines-dev/outlines/blob/3a41b0edf389ceba49126e811ed0c1b23b50f235/examples/transformers_integration.py#L6
from pydantic import BaseModel
from transformers import  AutoModelForCausalLM, AutoProcessor

from outlines.integrations.transformers import JSONPrefixAllowedTokens


class Person(BaseModel):
    first_name: str
    surname: str


processor = AutoProcessor.from_pretrained("mistralai/Mistral-7B-v0.1")
model = AutoModelForCausalLM.from_pretrained("mistralai/Mistral-7B-v0.1")
prefix_allowed_tokens_fn = JSONPrefixAllowedTokens(
    schema=Person, tokenizer_or_pipe=processor.tokenizer,
)

inputs = processor(...)
model.generate(
    **inputs,
    prefix_allowed_tokens_fn=prefix_allowed_tokens_fn,
)

Expected result:

When inputs is a batch, this should not trigger an empty list indexing error

Error message:

No response

Outlines/Python version information:

Version information

``` (command output here) ```

Context for the issue:

No response

rlouf · 2024-04-08T14:57:40Z

You are correct, this needs to be fixed.

randomcodelookup · 2024-04-12T01:16:23Z

Thanks @rlouf . I'm looking to fix this too but just to get more context, it seems like this function generally does not allow parallelized operations across the batch is that true? I also wonder why not implement an equivalent of JSONLogitsProcessor, is this method faster?

rlouf · 2024-04-12T07:49:52Z

Thank you for considering contributing. I am not sure why this choice was made at the time, you'd have to ask @saattrupdan

lapp0 · 2024-05-23T03:10:40Z

@rlouf what are your thoughts on replacing outlines/integrations/transformers.py with a set of logits processors classes (similar to outlines/integrations/llamacpp.py) and updating examples/transformers_integration.py accordingly?

This will ensure consistency between different inference engines integrations and move us closer to having one method of applying a Guide to an inference engine / model.

saattrupdan · 2024-05-23T07:12:18Z

@lapp0
I'm quite surprised that batching doesn't work for you, as I've been using this with batches just fine. The batch ID doesn’t need to be used in these prefix allowed tokens functions.

Here’s how I currently use it:
https://github.com/ScandEval/ScandEval/blob/c1932e6bd9580a53bd23504ea6e6d6669e8a307f/src/scandeval/generation.py#L474

Also, not sure if having a consistent interface between the transformers integration and the llama.cpp integration makes sense, as they are different packages and thus need different treatment? Were you thinking of any change in particular?

rlouf · 2024-05-23T11:05:48Z

@rlouf what are your thoughts on replacing outlines/integrations/transformers.py with a set of logits processors classes (similar to outlines/integrations/llamacpp.py) and updating examples/transformers_integration.py accordingly?

Yes, this is where I was going with the change in the llama.cpp integration and the vLLM offline integration. You can open a new issue, but make sure there isn’t one already first.

lapp0 · 2024-05-23T19:53:20Z

@lapp0 I'm quite surprised that batching doesn't work for you, as I've been using this with batches just fine. The batch ID doesn’t need to be used in these prefix allowed tokens functions.

Here’s how I currently use it: https://github.com/ScandEval/ScandEval/blob/c1932e6bd9580a53bd23504ea6e6d6669e8a307f/src/scandeval/generation.py#L474

Also, not sure if having a consistent interface between the transformers integration and the llama.cpp integration makes sense, as they are different packages and thus need different treatment? Were you thinking of any change in particular?

It works for me except for the case that no input IDs are passed as described in the issue. If you pass just a bos token it works.

I think the main issue is this regression was introduced through a distinct interface. vllm and llamacpp use a LogitsProcessor, transformers should as well.

Yes, this is where I was going with the change in the llama.cpp integration and the vLLM offline integration. You can open a new issue, but make sure there isn’t one already first.

It does exist. We should close this issue in favor of #806

saattrupdan · 2024-05-24T06:11:19Z

@lapp0 What I don't understand is maybe exactly what you mean by "replacing the transformers integration with logits processors".

Logits processors have a different type signature than the prefix allowed tokens functions (for instance, one outputs floats and the other ints), so can't see how they can replace the current integration.

Further, and most importantly, will such a change still mean that I can take a transformers model and call generate with the proposed logits processors? I'm not interested in using the outlines wrapper classes.

lapp0 · 2024-05-24T23:50:24Z

@lapp0 What I don't understand is maybe exactly what you mean by "replacing the transformers integration with logits processors".

Logits processors have a different type signature than the prefix allowed tokens functions (for instance, one outputs floats and the other ints), so can't see how they can replace the current integration.

Logits processors can accomplish the same goal, but they require code changes to use. I see only a handful of public repos using these classes. https://github.com/search?q=%22JSONPrefixAllowedTokens%22&ref=opensearch&type=code

Further, and most importantly, will such a change still mean that I can take a transformers model and call generate with the proposed logits processors? I'm not interested in using the outlines wrapper classes.

Yes, I will design it such that you can use it directly with

model.generate(
    **inputs,
    logits_processor=LogitsProcessorList([RegexLogitsProcessor(*args))])
)

saattrupdan · 2024-05-25T07:34:01Z

@lapp0 But the transformers model classes don't accept a logits_processor argument. Are you talking about the outlines wrapper class here?

lapp0 · 2024-05-25T08:15:07Z

@lapp0 But the transformers model classes don't accept a logits_processor argument. Are you talking about the outlines wrapper class here?

Here's a dummy example that shows a logits processor which restricts tokens to those with IDs > 2000. Please make sure you're on the latest version of transformers.

>>> import transformers
>>> tokenizer = transformers.AutoTokenizer.from_pretrained("facebook/opt-125m")
>>> model = transformers.AutoModelForCausalLM.from_pretrained("facebook/opt-125m")
>>> model.generate(**tokenizer("hello", return_tensors="pt"))
tensor([[    2, 42891,     6,    38,   437,    10,    92,   869,     8,    38,
           437,   546,    13,    10,   205,   165,     7,   310,    19,     4]])
>>> class NoSmallTokenIDsProcessor(transformers.LogitsProcessor):
...     def __call__(self, input_ids, scores):
...         scores[:, :2000] = float("-inf")
...         return scores
>>> model.generate(**tokenizer("hello", return_tensors="pt"), logits_processor=transformers.LogitsProcessorList([NoSmallTokenIDsProcessor()]))
tensor([[    2, 42891,  2598, 39275,  7852, 50118, 13368,  2598, 39275,  7852,
         50118, 12229,  2598, 39275,  7852, 50118, 12229,  2598, 39275,  7852]])

saattrupdan · 2024-05-25T08:59:49Z

@lapp0 Aha, I didn't realise they included that now - thanks for explaining! 😊

lapp0 · 2024-06-13T00:27:41Z

@saattrupdan could you please tell me if pip install git+https://github.com/lapp0/outlines@transformers-use-logits-processor meets your requirements?

Rendered docs: https://github.com/lapp0/outlines/blob/transformers-use-logits-processor/docs/reference/models/transformers.md#example-direct-transformers-library-use

sigjhl · 2024-07-11T01:41:40Z

@lapp0 It's not working as expected. When it's in a loop, only the first generation will succeed.

##Code:
import outlines
import transformers

model_uri = "microsoft/Phi-3-mini-4k-instruct"

outlines_tokenizer = outlines.models.TransformerTokenizer(
transformers.AutoTokenizer.from_pretrained(model_uri)
)
phone_number_logits_processor = outlines.processors.RegexLogitsProcessor(
"\+?[1-9][0-9]{7,14}", # phone number pattern
outlines_tokenizer,
)

generator = transformers.pipeline('text-generation', model=model_uri)

for _ in range(3):
output = generator(
"Jenny gave me her number it's ",
logits_processor=transformers.LogitsProcessorList([phone_number_logits_processor])
)
print(output)

##Output:
[{'generated_text': "Jenny gave me her number it's 5551234567"}]
[{'generated_text': "Jenny gave me her number it's "}]
[{'generated_text': "Jenny gave me her number it's "}]

lapp0 · 2024-07-13T21:42:24Z

@sigjhl logits processors are stateful and for one time use. I'll be adding this to the documentation, thanks for pointing it out.

For now you could try

generator = transformers.pipeline('text-generation', model=model_uri)

for _ in range(3):
    phone_number_logits_processor = outlines.processors.RegexLogitsProcessor(
        "\+?[1-9][0-9]{7,14}", # phone number pattern
        outlines_tokenizer,
    )
    output = generator(
        "Jenny gave me her number it's ",
        logits_processor=transformers.LogitsProcessorList([phone_number_logits_processor])
    )
    print(output)

Please let me know if you have any other questions

randomcodelookup added the bug label Apr 5, 2024

rlouf added transformers Linked to the `transformers` integration JSON help wanted labels Apr 8, 2024

rlouf closed this as not planned Won't fix, can't repro, duplicate, stale May 24, 2024

lapp0 mentioned this issue May 29, 2024

Use LogitsProcessor for transformers integration #926

Closed

lapp0 mentioned this issue Jun 12, 2024

Transformers use logits processor lapp0/outlines#31

Open

8 tasks

lapp0 mentioned this issue Jun 13, 2024

Update models.transformers to use SequenceGeneratorAdapter and OutlinesLogitsProcessors #966

Merged

3 tasks

rlouf closed this as completed in #966 Jul 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RegexPrefixAllowedTokens does not work for batch #789

RegexPrefixAllowedTokens does not work for batch #789

randomcodelookup commented Apr 5, 2024

rlouf commented Apr 8, 2024

randomcodelookup commented Apr 12, 2024

rlouf commented Apr 12, 2024

lapp0 commented May 23, 2024 •

edited

Loading

saattrupdan commented May 23, 2024 •

edited

Loading

rlouf commented May 23, 2024 •

edited

Loading

lapp0 commented May 23, 2024

saattrupdan commented May 24, 2024

lapp0 commented May 24, 2024 •

edited

Loading

saattrupdan commented May 25, 2024

lapp0 commented May 25, 2024 •

edited

Loading

saattrupdan commented May 25, 2024

lapp0 commented Jun 13, 2024

sigjhl commented Jul 11, 2024 •

edited

Loading

lapp0 commented Jul 13, 2024

RegexPrefixAllowedTokens does not work for batch #789

RegexPrefixAllowedTokens does not work for batch #789

Comments

randomcodelookup commented Apr 5, 2024

Describe the issue as clearly as possible:

Steps/code to reproduce the bug:

Expected result:

Error message:

Outlines/Python version information:

Context for the issue:

rlouf commented Apr 8, 2024

randomcodelookup commented Apr 12, 2024

rlouf commented Apr 12, 2024

lapp0 commented May 23, 2024 • edited Loading

saattrupdan commented May 23, 2024 • edited Loading

rlouf commented May 23, 2024 • edited Loading

lapp0 commented May 23, 2024

saattrupdan commented May 24, 2024

lapp0 commented May 24, 2024 • edited Loading

saattrupdan commented May 25, 2024

lapp0 commented May 25, 2024 • edited Loading

saattrupdan commented May 25, 2024

lapp0 commented Jun 13, 2024

sigjhl commented Jul 11, 2024 • edited Loading

lapp0 commented Jul 13, 2024

lapp0 commented May 23, 2024 •

edited

Loading

saattrupdan commented May 23, 2024 •

edited

Loading

rlouf commented May 23, 2024 •

edited

Loading

lapp0 commented May 24, 2024 •

edited

Loading

lapp0 commented May 25, 2024 •

edited

Loading

sigjhl commented Jul 11, 2024 •

edited

Loading