Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add integration for ExllamaV2 #462

Merged
merged 3 commits into from
Dec 22, 2023
Merged

Add integration for ExllamaV2 #462

merged 3 commits into from
Dec 22, 2023

Conversation

kimjaewon96
Copy link
Contributor

library: https://github.com/turboderp/exllamav2
To install exllamav2, you have to install the correct python & cuda version from the releases.

With proper caching, it's 3~4x faster than gptq.

@rlouf
Copy link
Member

rlouf commented Dec 21, 2023

Thank you! Did you try guided generation with this integration?

@kimjaewon96
Copy link
Contributor Author

prompt = "What is the IP address of the Google DNS servers? "
unguided = outlines.generate.text(model, max_tokens=30)(prompt)
guided = outlines.generate.regex(
    model,
    r"((25[0-5]|2[0-4]\d|[01]?\d\d?)\.){3}(25[0-5]|2[0-4]\d|[01]?\d\d?)",
    max_tokens=30,
)(prompt)
print(unguided)
#\n\n Google Public DNS is a free, global, open recursive DNS service from Google. There are two IP addresses for Google Public D
print(guided)
#0.0.0.0

Yes, it works.

One thing I forgot to mention is that exl2 doesn't automatically download models from huggingface, so you should write the local folder path containing the model.

@rlouf rlouf merged commit 6084f4c into outlines-dev:main Dec 22, 2023
5 checks passed
@rlouf
Copy link
Member

rlouf commented Dec 22, 2023

Thank you for contributing! We will need to add some documentation for that in the near future.

benlipkin pushed a commit to benlipkin/outlines that referenced this pull request Jan 5, 2024
library: https://github.com/turboderp/exllamav2
To install exllamav2, you have to install the correct python & cuda
version from the releases.

With proper caching, it's 3~4x faster than gptq.
@dnhkng
Copy link
Contributor

dnhkng commented Jan 25, 2024

Can someone provide a quick example for loading the exllama model?

@rlouf rlouf linked an issue Feb 10, 2024 that may be closed by this pull request
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Exllama support
3 participants