Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add LlamaCpp integration #486

Merged
merged 1 commit into from
Jan 8, 2024
Merged

Add LlamaCpp integration #486

merged 1 commit into from
Jan 8, 2024

Conversation

dtiarks
Copy link
Contributor

@dtiarks dtiarks commented Dec 27, 2023

fixes #422

This is a draft PR for a LlamaCpp integration using the low-level API.

@rlouf
Copy link
Member

rlouf commented Dec 27, 2023

Great work! I still need to review the PR, but one remark for now: can we name the model llamacpp instead of llama to avoid confusion?

@dtiarks
Copy link
Contributor Author

dtiarks commented Dec 27, 2023

Sure!

@brandonwillard brandonwillard marked this pull request as draft December 27, 2023 20:24
@dtiarks
Copy link
Contributor Author

dtiarks commented Dec 27, 2023

Is there any guidance on how to handle artifacts (the GGUF model used in the test) in the CI/CD? Just check it in using LFS?

@brandonwillard brandonwillard linked an issue Dec 28, 2023 that may be closed by this pull request
@rlouf
Copy link
Member

rlouf commented Dec 28, 2023

So far we've been downloading models from the hub each time the tests are run. Since they're tiny models it doesn't affect the runtime much, but it would be better to cache them for sure.

@dtiarks
Copy link
Contributor Author

dtiarks commented Dec 28, 2023

Ok. The llama test model is around 2meg so it should be fine.
In order to have the llama tests running in CI/CD I added llama-cpp-python as test dependency. Question is if we should at it as project dependency as well.

Another open question is what the current intention behind the outer dimension of the tokenizer encode method is. Is it supposed to be a batch dimension? If so, I have to implement a padding meachnism, because the llama cpp tokenizer doesn't support that out of the box.

@rlouf
Copy link
Member

rlouf commented Dec 28, 2023

We won't add it as a dependency by default but will document it in the installation section of the documentation.

And yes token_ids tensors are of shape n_batch x n_tokens and we keep the outer dimension throughout even for batch size of 1. Does llamacpp handle batch inference?

@dtiarks
Copy link
Contributor Author

dtiarks commented Dec 28, 2023

It does have a batch-based API for inference (which is also recommended) but not for tokenization. This creates a little bit of overhead within the outlines sided implementation of the tokenizer, but this should be ok.

I will incorporate the batch API in the coming days...

@rlouf
Copy link
Member

rlouf commented Jan 3, 2024

@dtiarks is this ready for review?

@dtiarks
Copy link
Contributor Author

dtiarks commented Jan 3, 2024

Not yet. I'm still trying to figure out why I get so many whitespaces in the output JSON and if it has something to do with my implementation.

Will ping you once I have news.

@rlouf
Copy link
Member

rlouf commented Jan 3, 2024

It might not be your implementation: #484 (comment)

@dtiarks
Copy link
Contributor Author

dtiarks commented Jan 4, 2024

Ok. I have a few smaller issues on my list. But those should be done today or tomorrow.

@dtiarks dtiarks marked this pull request as ready for review January 4, 2024 16:07
@rlouf rlouf added the transformers Linked to the `transformers` integration label Jan 5, 2024
@rlouf rlouf force-pushed the llama_cpp branch 3 times, most recently from 01bfac9 to 824cc48 Compare January 8, 2024 13:59
@rlouf
Copy link
Member

rlouf commented Jan 8, 2024

Thank you so much for this addition! I'm sure it will be greatly appreciated by the community :)

@rlouf rlouf merged commit 03b749a into outlines-dev:main Jan 8, 2024
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement transformers Linked to the `transformers` integration
Projects
None yet
Development

Successfully merging this pull request may close these issues.

llama.cpp or llama-cpp-python support? Add llamacpp integration
3 participants