Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow to generate several samples for each prompt #533

Merged
merged 7 commits into from
Feb 6, 2024

Conversation

rlouf
Copy link
Member

@rlouf rlouf commented Jan 13, 2024

Closes #416

TODO

  • Use torch.repeat_interleave so samples of the same sequence in a batch are contiguous (easier for beam search)
  • Make sure the FSMs are corrrectly duplicated

Questions

  • Do we make the sequence generator return arrays of shape (num_samples * batch_size, num_tokens), and reshape in SequenceGenerator? Reshaping in the generator makes the latter much more complex. We could also not reshape at all, and let the user do it manually, which will simplify chained calls in the future.

@rlouf rlouf added enhancement transformers Linked to the `transformers` integration labels Jan 16, 2024
@rlouf rlouf force-pushed the restore-multiple-samples branch 6 times, most recently from f558105 to 5cebac7 Compare January 20, 2024 10:19
outlines/generate/generator.py Outdated Show resolved Hide resolved
outlines/generate/api.py Show resolved Hide resolved
outlines/generate/api.py Show resolved Hide resolved
outlines/generate/api.py Show resolved Hide resolved
@rlouf rlouf force-pushed the restore-multiple-samples branch 5 times, most recently from a1f9a8a to 88c8bc6 Compare January 23, 2024 07:00
@rlouf rlouf force-pushed the restore-multiple-samples branch 3 times, most recently from 19f8eb5 to 4358032 Compare January 23, 2024 16:28
@rlouf rlouf marked this pull request as ready for review January 23, 2024 16:28
Copy link
Collaborator

@lapp0 lapp0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still learning the outlines/generate/ side of the code-base. If there are any components you would like test cases for, that would help me learn :)

IMHO we shouldn't reshape at all. Users should expect an array of results with the same length as their input prompts. A shape-agnostic decoder should be a final step so we aren't required to convert to decodable array and back again in multiple places.

outlines/generate/api.py Show resolved Hide resolved
outlines/generate/api.py Outdated Show resolved Hide resolved
outlines/generate/api.py Outdated Show resolved Hide resolved
outlines/generate/generator.py Outdated Show resolved Hide resolved
@rlouf rlouf merged commit e00c53f into outlines-dev:main Feb 6, 2024
5 checks passed
@rlouf rlouf deleted the restore-multiple-samples branch February 6, 2024 15:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement transformers Linked to the `transformers` integration
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Restore the ability to draw multiple samples with Open Source models
2 participants