Add LlamaCpp integration #486

dtiarks · 2023-12-27T17:35:04Z

fixes #422

This is a draft PR for a LlamaCpp integration using the low-level API.

rlouf · 2023-12-27T18:03:39Z

Great work! I still need to review the PR, but one remark for now: can we name the model llamacpp instead of llama to avoid confusion?

dtiarks · 2023-12-27T18:08:40Z

Sure!

dtiarks · 2023-12-27T21:34:21Z

Is there any guidance on how to handle artifacts (the GGUF model used in the test) in the CI/CD? Just check it in using LFS?

rlouf · 2023-12-28T12:57:30Z

So far we've been downloading models from the hub each time the tests are run. Since they're tiny models it doesn't affect the runtime much, but it would be better to cache them for sure.

dtiarks · 2023-12-28T13:07:03Z

Ok. The llama test model is around 2meg so it should be fine.
In order to have the llama tests running in CI/CD I added llama-cpp-python as test dependency. Question is if we should at it as project dependency as well.

Another open question is what the current intention behind the outer dimension of the tokenizer encode method is. Is it supposed to be a batch dimension? If so, I have to implement a padding meachnism, because the llama cpp tokenizer doesn't support that out of the box.

rlouf · 2023-12-28T14:15:01Z

We won't add it as a dependency by default but will document it in the installation section of the documentation.

And yes token_ids tensors are of shape n_batch x n_tokens and we keep the outer dimension throughout even for batch size of 1. Does llamacpp handle batch inference?

dtiarks · 2023-12-28T16:14:54Z

It does have a batch-based API for inference (which is also recommended) but not for tokenization. This creates a little bit of overhead within the outlines sided implementation of the tokenizer, but this should be ok.

I will incorporate the batch API in the coming days...

rlouf · 2024-01-03T20:48:19Z

@dtiarks is this ready for review?

dtiarks · 2024-01-03T20:51:07Z

Not yet. I'm still trying to figure out why I get so many whitespaces in the output JSON and if it has something to do with my implementation.

Will ping you once I have news.

rlouf · 2024-01-03T20:59:46Z

It might not be your implementation: #484 (comment)

dtiarks · 2024-01-04T08:22:51Z

Ok. I have a few smaller issues on my list. But those should be done today or tomorrow.

rlouf · 2024-01-08T14:14:59Z

Thank you so much for this addition! I'm sure it will be greatly appreciated by the community :)

brandonwillard marked this pull request as draft December 27, 2023 20:24

brandonwillard linked an issue Dec 28, 2023 that may be closed by this pull request

llama.cpp or llama-cpp-python support? #487

Closed

rlouf mentioned this pull request Dec 31, 2023

llama.cpp with outlines #493

Closed

brandonwillard force-pushed the llama_cpp branch from 66de1fe to 7e2799e Compare December 31, 2023 20:03

brandonwillard added enhancement llama labels Dec 31, 2023

dtiarks marked this pull request as ready for review January 4, 2024 16:07

rlouf added the transformers Linked to the `transformers` integration label Jan 5, 2024

rlouf force-pushed the llama_cpp branch 3 times, most recently from 01bfac9 to 824cc48 Compare January 8, 2024 13:59

Add llama.cpp integration

2435dac

rlouf force-pushed the llama_cpp branch from 824cc48 to 2435dac Compare January 8, 2024 14:03

rlouf merged commit 03b749a into outlines-dev:main Jan 8, 2024
5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add LlamaCpp integration #486

Add LlamaCpp integration #486

dtiarks commented Dec 27, 2023

rlouf commented Dec 27, 2023

dtiarks commented Dec 27, 2023

dtiarks commented Dec 27, 2023 •

edited

Loading

rlouf commented Dec 28, 2023

dtiarks commented Dec 28, 2023

rlouf commented Dec 28, 2023

dtiarks commented Dec 28, 2023

rlouf commented Jan 3, 2024

dtiarks commented Jan 3, 2024

rlouf commented Jan 3, 2024

dtiarks commented Jan 4, 2024

rlouf commented Jan 8, 2024

Add LlamaCpp integration #486

Add LlamaCpp integration #486

Conversation

dtiarks commented Dec 27, 2023

rlouf commented Dec 27, 2023

dtiarks commented Dec 27, 2023

dtiarks commented Dec 27, 2023 • edited Loading

rlouf commented Dec 28, 2023

dtiarks commented Dec 28, 2023

rlouf commented Dec 28, 2023

dtiarks commented Dec 28, 2023

rlouf commented Jan 3, 2024

dtiarks commented Jan 3, 2024

rlouf commented Jan 3, 2024

dtiarks commented Jan 4, 2024

rlouf commented Jan 8, 2024

dtiarks commented Dec 27, 2023 •

edited

Loading