Update `models.transformers` to use `SequenceGeneratorAdapter` and `OutlinesLogitsProcessors` #966

lapp0 · 2024-06-13T22:09:33Z

~~In draft until huggingface/transformers#31448 makes it into a new transformers release~~

Fixes #1021

Fixes #789

Fixes #806 (does everything except remove the issues requirement of "remove torch")

Closes #910 (device inconsistency issue handled through other means)

Problem

For RegexPrefixAllowedTokens does not work for batch #789 in SequenceGenerator if input_ids is empty, it fails.
Generally, models.transformers has an inconsistent implementation with other modules making the codebase harder to manage.

Solution

Use SequenceGeneratorAdapter for transformers instead of SequenceGenerator
Implement Transformers.generate and Transformers.stream which use model.generate(logits_processor=...) argument with outlines.processors.OutlinesLogitsProcessor

Additional Changes

Temporarily disables stop_at argument for transformers and implements test to determine whether upstream is fixed (stop_strings Argument in model.generate() Results in Exception if Generation Completes Without stop_string Being Generated huggingface/transformers#31435)
- fixed upstream, change reverted
Update docs and examples to use seed instead of rng
Disables outlines.generate.cfg for now (Using context-free grammars to guide generation does not work #959)

TODO:

Determine cause of regression when generating json
- was caused by GenerationConfig default max_tokens being 20
replace rng with seed in all tests and all documentation.
Awaiting transformers fix for stop_strings Argument in model.generate() Results in Exception if Generation Completes Without stop_string Being Generated huggingface/transformers#31435 so stop_at works.

outlines/generate/cfg.py

lapp0 · 2024-06-13T22:12:36Z

outlines/generate/regex.py

@@ -39,8 +40,9 @@ def regex(model, regex_str: str, sampler: Sampler = multinomial()):


 @regex.register(MLXLM)
-def regex_mlxlm(
-    model: MLXLM,
+@regex.register(Transformers)


the _unified dispatchers will become the default dispatcher as a next step. In this PR it's just used by MLXLM and Transformers

rlouf · 2024-06-19T09:00:20Z

docs/reference/models/transformers.md

@@ -30,4 +30,55 @@ tokenizer = AutoTokenizer.from_pretrained("gpt2")
 model = models.Transformers(llm, tokenizer)
 ```

+# Using Logits Processors


We'll need to improve the documentation to reach something similar to the lamacpp integration's.

We should plan a restructuring and cleaning up of documentation in a separate issue. I could share some ideas in call on how we might approach this.

In this case, a lot of information documented for llamacpp applies to all other models including transformers. We shouldn't repeat ourselves. We should explain the behavior of all models generally, highlight the models differences with a feature table, and document only transformers specific information on its documentation page.

What I meant was listing the main arguments to you can pass when initialising and calling the model, cf https://outlines-dev.github.io/outlines/reference/models/llamacpp/

outlines/processors/base_logits_processor.py

….py (#998) A lot of these fixes were intended for #966 however that's blocked until there's a new `transformers` release. These improvements are general to all models and will enable PRs resolving #806 and #965 # Structure of `OutlinesLogitsProcessor` The goal is to create a base class which allows a logits processors to be implemented once and used for any `outlines.models` inference library. To accomplish this we must normalize the input array. It must have a consistent type (`torch.Tensor`) and consistent dimensionality (2). We can normalize both of these simply, and without any copy operations. `mlx.core.array`, `numpy.array`, and `torch.Tensor` all support [pythons array standard `__dlpack__`](https://data-apis.org/array-api/latest/API_specification/generated/array_api.array.__dlpack__.html). This standard allows for casting between array types without copying. `torch.Tensor` is the only input type which cannot always be cast to any other type because torch tensors may live in GPU memory. Therefore, we cast all arrays to `torch.Tensor`, implement logits processors using torch methods, and convert back to the original array type in `OutlinesLogitsProcessor`. See docstring of `OutlinesLogitsProcessor.__call__()` for more details. # Detailed Changes - Rename `BaseLogitsProcessor` to `OutlinesLogitsProcessor` - Ensure `OutlinesLogitsProcessor.process_logits()` is always passed a 2D batch request with `torch.Tensor` logits and `List` input_ids. Also clean up code to be more readable in `OutlinesLogitsProcessor__call__()` - Ensure `FSMLogitsProcessor` allows unstable sequence ordering (beam search in transformers and vLLM change the order of sequences) - Update `tests/generate/test_generate.py` to cover more permutations of - regex / text - batch / single - greedy / multinomial / beam search - `stream()` / `generate()` - Ensure performance stability with difference array libraries through `benchmark_processors.py`

outlines/models/transformers.py

…ate.*

rlouf · 2024-07-15T08:00:11Z

Very good work, thank you!

lapp0 changed the title ~~Transformers use logits processor~~ Update outlines.models.transformers to use SequenceGeneratorAdapter and OutlinesLogitsProcessors Jun 13, 2024

lapp0 commented Jun 13, 2024

View reviewed changes

outlines/generate/cfg.py Show resolved Hide resolved

lapp0 commented Jun 13, 2024

View reviewed changes

lapp0 force-pushed the transformers-use-logits-processor branch 18 times, most recently from d94527a to 5bd8832 Compare June 13, 2024 23:20

lapp0 marked this pull request as ready for review June 13, 2024 23:21

lapp0 marked this pull request as draft June 13, 2024 23:36

lapp0 force-pushed the transformers-use-logits-processor branch from 5bd8832 to 1537695 Compare June 13, 2024 23:59

lapp0 marked this pull request as ready for review June 14, 2024 00:00

lapp0 marked this pull request as draft June 14, 2024 02:08

lapp0 force-pushed the transformers-use-logits-processor branch 4 times, most recently from d0fb1a6 to 0167700 Compare June 14, 2024 22:39

lapp0 force-pushed the transformers-use-logits-processor branch 2 times, most recently from 8f9c317 to 6ea583e Compare June 18, 2024 04:46

rlouf reviewed Jun 19, 2024

View reviewed changes

rlouf mentioned this pull request Jun 19, 2024

Implement prompt/generation alignment #531

Open

lapp0 commented Jun 20, 2024

View reviewed changes

outlines/processors/base_logits_processor.py Outdated Show resolved Hide resolved

lapp0 force-pushed the transformers-use-logits-processor branch 2 times, most recently from f5ae15e to b75beeb Compare June 21, 2024 14:59

This was referenced Jun 21, 2024

Use outlines.processors for models.llamacpp #997

Merged

Improve outlines.processors, add integration tests to test_generate.py #998

Merged

rlouf assigned lapp0 Jun 22, 2024

This was referenced Jun 23, 2024

Use LogitsProcessor for transformers integration #926

Closed

Added generate.probabilities for BeamSearch #895

Open

lapp0 force-pushed the transformers-use-logits-processor branch from b75beeb to cdc78f0 Compare June 30, 2024 21:14

lapp0 commented Jun 30, 2024

View reviewed changes

outlines/models/transformers.py Outdated Show resolved Hide resolved

lapp0 commented Jun 30, 2024

View reviewed changes

outlines/models/transformers.py Outdated Show resolved Hide resolved

lapp0 force-pushed the transformers-use-logits-processor branch 4 times, most recently from b495c2c to 326ab77 Compare June 30, 2024 21:49

lapp0 marked this pull request as ready for review June 30, 2024 21:59

lapp0 force-pushed the transformers-use-logits-processor branch 2 times, most recently from c24b1fa to 32319df Compare July 2, 2024 20:11

Use LogitsProcessors for models.transformers, apply in outlines.gener…

7d43bbd

…ate.*

lapp0 force-pushed the transformers-use-logits-processor branch from 32319df to 7d43bbd Compare July 3, 2024 14:42

This was referenced Jul 10, 2024

Major bug & fix: Fix bug in batched multi sample generation #1025

Merged

When consecutively entering the same text input, calling the RegexPrefixAllowedTokens instance results in an error at the second time. #1021

Closed

rlouf approved these changes Jul 15, 2024

View reviewed changes

rlouf merged commit 8224855 into outlines-dev:main Jul 15, 2024
7 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update `models.transformers` to use `SequenceGeneratorAdapter` and `OutlinesLogitsProcessors` #966

Update `models.transformers` to use `SequenceGeneratorAdapter` and `OutlinesLogitsProcessors` #966

lapp0 commented Jun 13, 2024 •

edited

Loading

lapp0 Jun 13, 2024

rlouf Jun 19, 2024

lapp0 Jun 30, 2024

rlouf Jul 12, 2024

rlouf commented Jul 15, 2024

Update models.transformers to use SequenceGeneratorAdapter and OutlinesLogitsProcessors #966

Update models.transformers to use SequenceGeneratorAdapter and OutlinesLogitsProcessors #966

Conversation

lapp0 commented Jun 13, 2024 • edited Loading

Problem

Solution

Additional Changes

TODO:

lapp0 Jun 13, 2024

Choose a reason for hiding this comment

rlouf Jun 19, 2024

Choose a reason for hiding this comment

lapp0 Jun 30, 2024

Choose a reason for hiding this comment

rlouf Jul 12, 2024

Choose a reason for hiding this comment

rlouf commented Jul 15, 2024

Update `models.transformers` to use `SequenceGeneratorAdapter` and `OutlinesLogitsProcessors` #966

Update `models.transformers` to use `SequenceGeneratorAdapter` and `OutlinesLogitsProcessors` #966

lapp0 commented Jun 13, 2024 •

edited

Loading