Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RegexPrefixAllowedTokens does not work for batch #789

Closed
randomcodelookup opened this issue Apr 5, 2024 · 15 comments · Fixed by #966 · May be fixed by lapp0/outlines#31
Closed

RegexPrefixAllowedTokens does not work for batch #789

randomcodelookup opened this issue Apr 5, 2024 · 15 comments · Fixed by #966 · May be fixed by lapp0/outlines#31
Labels
bug help wanted JSON transformers Linked to the `transformers` integration

Comments

@randomcodelookup
Copy link

Describe the issue as clearly as possible:

In these lines of code,

def __call__(self, batch_id: int, sent: torch.Tensor) -> List[int]:
"""Use the FSM to bias the logits before sampling the next token.
Parameters
----------
batch_id
The index of the current batch.
sent
The tokens of the current sentence.
Returns
-------
List[int]
The indices of the tokens that are allowed to be sampled next.

it seems that the batch id is not used at all, and there is an error triggered when batch is used and input_ids is empty.

last_token = input_ids[-1]

Do you know how this can be fixed? Thanks! @saattrupdan @rlouf

Steps/code to reproduce the bug:

# Based on https://github.com/outlines-dev/outlines/blob/3a41b0edf389ceba49126e811ed0c1b23b50f235/examples/transformers_integration.py#L6
from pydantic import BaseModel
from transformers import  AutoModelForCausalLM, AutoProcessor

from outlines.integrations.transformers import JSONPrefixAllowedTokens


class Person(BaseModel):
    first_name: str
    surname: str


processor = AutoProcessor.from_pretrained("mistralai/Mistral-7B-v0.1")
model = AutoModelForCausalLM.from_pretrained("mistralai/Mistral-7B-v0.1")
prefix_allowed_tokens_fn = JSONPrefixAllowedTokens(
    schema=Person, tokenizer_or_pipe=processor.tokenizer,
)

inputs = processor(...)
model.generate(
    **inputs,
    prefix_allowed_tokens_fn=prefix_allowed_tokens_fn,
)

Expected result:

When inputs is a batch, this should not trigger an empty list indexing error

Error message:

No response

Outlines/Python version information:

Version information

``` (command output here) ```

Context for the issue:

No response

@rlouf
Copy link
Member

rlouf commented Apr 8, 2024

You are correct, this needs to be fixed.

@rlouf rlouf added transformers Linked to the `transformers` integration JSON help wanted labels Apr 8, 2024
@randomcodelookup
Copy link
Author

Thanks @rlouf . I'm looking to fix this too but just to get more context, it seems like this function generally does not allow parallelized operations across the batch is that true? I also wonder why not implement an equivalent of JSONLogitsProcessor, is this method faster?

@rlouf
Copy link
Member

rlouf commented Apr 12, 2024

Thank you for considering contributing. I am not sure why this choice was made at the time, you'd have to ask @saattrupdan

@lapp0
Copy link
Collaborator

lapp0 commented May 23, 2024

@rlouf what are your thoughts on replacing outlines/integrations/transformers.py with a set of logits processors classes (similar to outlines/integrations/llamacpp.py) and updating examples/transformers_integration.py accordingly?

This will ensure consistency between different inference engines integrations and move us closer to having one method of applying a Guide to an inference engine / model.

@saattrupdan
Copy link
Contributor

saattrupdan commented May 23, 2024

@lapp0
I'm quite surprised that batching doesn't work for you, as I've been using this with batches just fine. The batch ID doesn’t need to be used in these prefix allowed tokens functions.

Here’s how I currently use it:
https://github.com/ScandEval/ScandEval/blob/c1932e6bd9580a53bd23504ea6e6d6669e8a307f/src/scandeval/generation.py#L474

Also, not sure if having a consistent interface between the transformers integration and the llama.cpp integration makes sense, as they are different packages and thus need different treatment? Were you thinking of any change in particular?

@rlouf
Copy link
Member

rlouf commented May 23, 2024

@rlouf what are your thoughts on replacing outlines/integrations/transformers.py with a set of logits processors classes (similar to outlines/integrations/llamacpp.py) and updating examples/transformers_integration.py accordingly?

Yes, this is where I was going with the change in the llama.cpp integration and the vLLM offline integration. You can open a new issue, but make sure there isn’t one already first.

@lapp0
Copy link
Collaborator

lapp0 commented May 23, 2024

@lapp0 I'm quite surprised that batching doesn't work for you, as I've been using this with batches just fine. The batch ID doesn’t need to be used in these prefix allowed tokens functions.

Here’s how I currently use it: https://github.com/ScandEval/ScandEval/blob/c1932e6bd9580a53bd23504ea6e6d6669e8a307f/src/scandeval/generation.py#L474

Also, not sure if having a consistent interface between the transformers integration and the llama.cpp integration makes sense, as they are different packages and thus need different treatment? Were you thinking of any change in particular?

It works for me except for the case that no input IDs are passed as described in the issue. If you pass just a bos token it works.

I think the main issue is this regression was introduced through a distinct interface. vllm and llamacpp use a LogitsProcessor, transformers should as well.

Yes, this is where I was going with the change in the llama.cpp integration and the vLLM offline integration. You can open a new issue, but make sure there isn’t one already first.

It does exist. We should close this issue in favor of #806

@saattrupdan
Copy link
Contributor

@lapp0 What I don't understand is maybe exactly what you mean by "replacing the transformers integration with logits processors".

Logits processors have a different type signature than the prefix allowed tokens functions (for instance, one outputs floats and the other ints), so can't see how they can replace the current integration.

Further, and most importantly, will such a change still mean that I can take a transformers model and call generate with the proposed logits processors? I'm not interested in using the outlines wrapper classes.

@rlouf rlouf closed this as not planned Won't fix, can't repro, duplicate, stale May 24, 2024
@lapp0
Copy link
Collaborator

lapp0 commented May 24, 2024

@lapp0 What I don't understand is maybe exactly what you mean by "replacing the transformers integration with logits processors".

Logits processors have a different type signature than the prefix allowed tokens functions (for instance, one outputs floats and the other ints), so can't see how they can replace the current integration.

Logits processors can accomplish the same goal, but they require code changes to use. I see only a handful of public repos using these classes. https://github.com/search?q=%22JSONPrefixAllowedTokens%22&ref=opensearch&type=code

Further, and most importantly, will such a change still mean that I can take a transformers model and call generate with the proposed logits processors? I'm not interested in using the outlines wrapper classes.

Yes, I will design it such that you can use it directly with

model.generate(
    **inputs,
    logits_processor=LogitsProcessorList([RegexLogitsProcessor(*args))])
)

@saattrupdan
Copy link
Contributor

@lapp0 But the transformers model classes don't accept a logits_processor argument. Are you talking about the outlines wrapper class here?

@lapp0
Copy link
Collaborator

lapp0 commented May 25, 2024

@lapp0 But the transformers model classes don't accept a logits_processor argument. Are you talking about the outlines wrapper class here?

Here's a dummy example that shows a logits processor which restricts tokens to those with IDs > 2000. Please make sure you're on the latest version of transformers.

>>> import transformers
>>> tokenizer = transformers.AutoTokenizer.from_pretrained("facebook/opt-125m")
>>> model = transformers.AutoModelForCausalLM.from_pretrained("facebook/opt-125m")
>>> model.generate(**tokenizer("hello", return_tensors="pt"))
tensor([[    2, 42891,     6,    38,   437,    10,    92,   869,     8,    38,
           437,   546,    13,    10,   205,   165,     7,   310,    19,     4]])
>>> class NoSmallTokenIDsProcessor(transformers.LogitsProcessor):
...     def __call__(self, input_ids, scores):
...         scores[:, :2000] = float("-inf")
...         return scores
>>> model.generate(**tokenizer("hello", return_tensors="pt"), logits_processor=transformers.LogitsProcessorList([NoSmallTokenIDsProcessor()]))
tensor([[    2, 42891,  2598, 39275,  7852, 50118, 13368,  2598, 39275,  7852,
         50118, 12229,  2598, 39275,  7852, 50118, 12229,  2598, 39275,  7852]])

@saattrupdan
Copy link
Contributor

@lapp0 Aha, I didn't realise they included that now - thanks for explaining! 😊

@lapp0
Copy link
Collaborator

lapp0 commented Jun 13, 2024

@saattrupdan could you please tell me if pip install git+https://github.com/lapp0/outlines@transformers-use-logits-processor meets your requirements?

Rendered docs: https://github.com/lapp0/outlines/blob/transformers-use-logits-processor/docs/reference/models/transformers.md#example-direct-transformers-library-use

@sigjhl
Copy link

sigjhl commented Jul 11, 2024

@lapp0 It's not working as expected. When it's in a loop, only the first generation will succeed.

##Code:
import outlines
import transformers

model_uri = "microsoft/Phi-3-mini-4k-instruct"

outlines_tokenizer = outlines.models.TransformerTokenizer(
transformers.AutoTokenizer.from_pretrained(model_uri)
)
phone_number_logits_processor = outlines.processors.RegexLogitsProcessor(
"\+?[1-9][0-9]{7,14}", # phone number pattern
outlines_tokenizer,
)

generator = transformers.pipeline('text-generation', model=model_uri)

for _ in range(3):
output = generator(
"Jenny gave me her number it's ",
logits_processor=transformers.LogitsProcessorList([phone_number_logits_processor])
)
print(output)

##Output:
[{'generated_text': "Jenny gave me her number it's 5551234567"}]
[{'generated_text': "Jenny gave me her number it's "}]
[{'generated_text': "Jenny gave me her number it's "}]

@lapp0
Copy link
Collaborator

lapp0 commented Jul 13, 2024

@sigjhl logits processors are stateful and for one time use. I'll be adding this to the documentation, thanks for pointing it out.

For now you could try

generator = transformers.pipeline('text-generation', model=model_uri)

for _ in range(3):
    phone_number_logits_processor = outlines.processors.RegexLogitsProcessor(
        "\+?[1-9][0-9]{7,14}", # phone number pattern
        outlines_tokenizer,
    )
    output = generator(
        "Jenny gave me her number it's ",
        logits_processor=transformers.LogitsProcessorList([phone_number_logits_processor])
    )
    print(output)

Please let me know if you have any other questions

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug help wanted JSON transformers Linked to the `transformers` integration
Projects
Status: Done
5 participants