Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update the transformers integration #806

Closed
rlouf opened this issue Apr 12, 2024 · 0 comments · Fixed by #966 · May be fixed by lapp0/outlines#31
Closed

Update the transformers integration #806

rlouf opened this issue Apr 12, 2024 · 0 comments · Fixed by #966 · May be fixed by lapp0/outlines#31
Labels
enhancement transformers Linked to the `transformers` integration

Comments

@rlouf
Copy link
Member

rlouf commented Apr 12, 2024

In the vein of #782 and #772 we should refactor the transformers integration to use logits processor.

We will keep the custom sampling loop, but via a Outlines model that wraps transformers models. We should be able to remove torch and transformers as default dependencies as a result.

@rlouf rlouf added enhancement transformers Linked to the `transformers` integration labels Apr 12, 2024
rlouf pushed a commit that referenced this issue Jun 30, 2024
….py (#998)

A lot of these fixes were intended for
#966 however that's blocked
until there's a new `transformers` release.

These improvements are general to all models and will enable PRs
resolving #806 and
#965

# Structure of `OutlinesLogitsProcessor`

The goal is to create a base class which allows a logits processors to
be implemented once and used for any `outlines.models` inference
library.

To accomplish this we must normalize the input array. It must have a
consistent type (`torch.Tensor`) and consistent dimensionality (2). We
can normalize both of these simply, and without any copy operations.

`mlx.core.array`, `numpy.array`, and `torch.Tensor` all support [pythons
array standard
`__dlpack__`](https://data-apis.org/array-api/latest/API_specification/generated/array_api.array.__dlpack__.html).
This standard allows for casting between array types without copying.

`torch.Tensor` is the only input type which cannot always be cast to any
other type because torch tensors may live in GPU memory. Therefore, we
cast all arrays to `torch.Tensor`, implement logits processors using
torch methods, and convert back to the original array type in
`OutlinesLogitsProcessor`. See docstring of
`OutlinesLogitsProcessor.__call__()` for more details.

# Detailed Changes
- Rename `BaseLogitsProcessor` to `OutlinesLogitsProcessor`
- Ensure `OutlinesLogitsProcessor.process_logits()` is always passed a
2D batch request with `torch.Tensor` logits and `List` input_ids. Also
clean up code to be more readable in `OutlinesLogitsProcessor__call__()`
- Ensure `FSMLogitsProcessor` allows unstable sequence ordering (beam
search in transformers and vLLM change the order of sequences)
- Update `tests/generate/test_generate.py` to cover more permutations of
  - regex / text 
  - batch / single
  - greedy / multinomial / beam search
  - `stream()` / `generate()`
- Ensure performance stability with difference array libraries through
`benchmark_processors.py`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement transformers Linked to the `transformers` integration
Projects
Status: Done
1 participant