Update the `transformers` integration #806

rlouf · 2024-04-12T11:10:18Z

In the vein of #782 and #772 we should refactor the transformers integration to use logits processor.

We will keep the custom sampling loop, but via a Outlines model that wraps transformers models. We should be able to remove torch and transformers as default dependencies as a result.

The text was updated successfully, but these errors were encountered:

….py (#998) A lot of these fixes were intended for #966 however that's blocked until there's a new `transformers` release. These improvements are general to all models and will enable PRs resolving #806 and #965 # Structure of `OutlinesLogitsProcessor` The goal is to create a base class which allows a logits processors to be implemented once and used for any `outlines.models` inference library. To accomplish this we must normalize the input array. It must have a consistent type (`torch.Tensor`) and consistent dimensionality (2). We can normalize both of these simply, and without any copy operations. `mlx.core.array`, `numpy.array`, and `torch.Tensor` all support [pythons array standard `__dlpack__`](https://data-apis.org/array-api/latest/API_specification/generated/array_api.array.__dlpack__.html). This standard allows for casting between array types without copying. `torch.Tensor` is the only input type which cannot always be cast to any other type because torch tensors may live in GPU memory. Therefore, we cast all arrays to `torch.Tensor`, implement logits processors using torch methods, and convert back to the original array type in `OutlinesLogitsProcessor`. See docstring of `OutlinesLogitsProcessor.__call__()` for more details. # Detailed Changes - Rename `BaseLogitsProcessor` to `OutlinesLogitsProcessor` - Ensure `OutlinesLogitsProcessor.process_logits()` is always passed a 2D batch request with `torch.Tensor` logits and `List` input_ids. Also clean up code to be more readable in `OutlinesLogitsProcessor__call__()` - Ensure `FSMLogitsProcessor` allows unstable sequence ordering (beam search in transformers and vLLM change the order of sequences) - Update `tests/generate/test_generate.py` to cover more permutations of - regex / text - batch / single - greedy / multinomial / beam search - `stream()` / `generate()` - Ensure performance stability with difference array libraries through `benchmark_processors.py`

rlouf added enhancement transformers Linked to the `transformers` integration labels Apr 12, 2024

lapp0 mentioned this issue May 23, 2024

RegexPrefixAllowedTokens does not work for batch #789

Closed

pngwn mentioned this issue May 24, 2024

mlx library integration (via mlx-lm) #918

Closed

lapp0 mentioned this issue May 29, 2024

Use LogitsProcessor for transformers integration #926

Closed

lapp0 mentioned this issue Jun 21, 2024

Improve outlines.processors, add integration tests to test_generate.py #998

Merged

rlouf closed this as completed in #966 Jul 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update the `transformers` integration #806

Update the `transformers` integration #806

rlouf commented Apr 12, 2024

Update the transformers integration #806

Update the transformers integration #806

Comments

rlouf commented Apr 12, 2024

Update the `transformers` integration #806

Update the `transformers` integration #806