Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add llama logits processor #556

Merged
merged 2 commits into from
Feb 16, 2024
Merged

Conversation

dtiarks
Copy link
Contributor

@dtiarks dtiarks commented Jan 18, 2024

Note that this is super early.
We should probably provide a core implementation that all the other processors could inherit. Also the implementation of the tokenizer seems to be redundant.

Looking forward to some input.

@rlouf
Copy link
Member

rlouf commented Jan 20, 2024

I think we should use a hybrid approach where we subclass Llama to be able to get the logits (useful for debugging), choose the sampling method and give this the same interface as the loop that is used for transformers models. What do you think?

@dtiarks
Copy link
Contributor Author

dtiarks commented Jan 22, 2024

Sounds good. I'm going to move the logit processor into the integration file.

@dtiarks dtiarks changed the title [WIP] Add llama logits processor Add llama logits processor Jan 23, 2024
@dtiarks dtiarks marked this pull request as ready for review January 23, 2024 11:03
Copy link
Collaborator

@lapp0 lapp0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMHO, we should have an abstract implementation of RegexLogitsProcessor, CFGLogitsProcessor, and JSONLogitsProcessor which handles the state management logic, then subclass for specific interfacing with the inference library.

outlines/models/llamacpp.py Outdated Show resolved Hide resolved
@dtiarks
Copy link
Contributor Author

dtiarks commented Jan 24, 2024

Jip that's a good idea...

outlines/models/llamacpp.py Outdated Show resolved Hide resolved
outlines/models/llamacpp.py Outdated Show resolved Hide resolved
@rlouf rlouf force-pushed the llama-logits-processor branch 2 times, most recently from ff5abb8 to adeced3 Compare January 26, 2024 11:42
@rlouf
Copy link
Member

rlouf commented Jan 26, 2024

I updated the code in PR to give you a better idea of the interface I'm thinking about. There are still several things to think about / do:

  • We need to add a stream method to the LlamaCpp class. Signature should be the same as SequenceGenerator.stream
  • We can use Llama.__call__ method, and make sure that the signature is identical to SequenceGenerator.__call__
  • There is potential for simplification: JSONLogitsProcessor is redundant with RegexLogitsProcessor
  • For JSON-structured generation we need to format the output so we have the same interface as the one we have with transformers models. We will need to add a format_sequence method to LlamaCpp and apply it to the output of __call__. This way we could also remove the need for the json_llamacpp function.
  • If llama.cpp provides tiny models just like transformers does we should add integration tests.

@dtiarks
Copy link
Contributor Author

dtiarks commented Feb 3, 2024

  • Finalize integration tests
  • Clarify if we need multi prompt streaming in the first iteration (@rlouf )
  • Fix failing integration tests (llama-cpp 0.2.30 works fine, latest doesn't)

@rlouf
Copy link
Member

rlouf commented Feb 6, 2024

We can forget about multi-prompt streaming in the first iteration, raise an informative NotImplementedError when a user asks for it, and open an issue for this.

outlines/generate/text.py Outdated Show resolved Hide resolved
@lapp0
Copy link
Collaborator

lapp0 commented Feb 9, 2024

Hi @dtiarks, thanks for implementing this! Could you let me know what work remains and if there's any way I could help?

@dtiarks
Copy link
Contributor Author

dtiarks commented Feb 9, 2024

This is essentially ready for reviewing. I talked with @rlouf in private about it assuming he would review it.

@rlouf
Copy link
Member

rlouf commented Feb 9, 2024

I am going to as soon as I have time!

Copy link
Member

@rlouf rlouf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am currently simplifying a few things in the code. Please wait until I'm done to make changes.

outlines/generate/processors.py Outdated Show resolved Hide resolved
@@ -30,8 +30,7 @@ class Character(BaseModel):


if __name__ == "__main__":
# Download model from https://huggingface.co/TheBloke/phi-2-GGUF
model = outlines.models.llamacpp("./phi-2.Q3_K_M.gguf", device="cpu")
model = outlines.models.llamacpp("./my_model.gguf")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even better in the documentation

@rlouf
Copy link
Member

rlouf commented Feb 12, 2024

I simplified the code in this PR, among other things removing all the tokenizer code that is not needed anymore. Since the code in outlines.generator.processors does not generalize I chose to leave it out of the model.

@rlouf rlouf linked an issue Feb 12, 2024 that may be closed by this pull request
@dtiarks
Copy link
Contributor Author

dtiarks commented Feb 13, 2024

I simplified the code in this PR, among other things removing all the tokenizer code that is not needed anymore. Since the code in outlines.generator.processors does not generalize I chose to leave it out of the model.

All the processor I looked at so far (transformers, vLLM, TensorRT-LLM etc.) have a similar interface but not exactly the same. So it makes sense to have a model specific implementation for each.

@@ -3,6 +3,7 @@
from outlines.fsm.fsm import RegexFSM
from outlines.generate.api import SequenceGenerator
from outlines.models import OpenAI
from outlines.models.llamacpp import LlamaCpp, RegexLogitsProcessor
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should import within def regex_llamacpp, otherwise using outlines.generate.regex and outlines.generate.json will require installing llama_cpp_python

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We register on the LlamaCpp type which is a subclass of Llama, so there's clearly a design issue here

@rlouf
Copy link
Member

rlouf commented Feb 13, 2024

I rebased the branch on main and fixed the merge conflicts commit-by-commit so there may be some errors popping up. Please rebase on main instead of adding a merge commit in the future to preserve the history.

We still need to move a few things around, as noted @lapp0 the current code makes it necessary to install llama-cpp-python regardless of whether we're using llama.cpp or not.

@rlouf rlouf force-pushed the llama-logits-processor branch 2 times, most recently from 11ab047 to 726ec24 Compare February 13, 2024 17:59
@rlouf
Copy link
Member

rlouf commented Feb 13, 2024

I found a workaround, can someone review before we merge?

outlines/fsm/json_schema.py Outdated Show resolved Hide resolved
)

logits_processor = RegexLogitsProcessor(regex_str, model.tokenizer)
model.logits_processor = logits_processor
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it make sense to bind the logits processor to the model? Won't this have side effects for other generations?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But a new generation would bind a different logit processor?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Except for outlines.generate.text. Perhaps we can just delete the attribute if it exists in outlines.generate.text to get it out the door, and I can address the design in a separate PR, refactoring with the use of abstract logits processor that have cleaner logic?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's fine for now given how __call__ works, we may revisit this in the future though.

for prompt in prompts:
processors = []
if self.logits_processor is not None:
processors = [copy.copy(self.logits_processor)]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Won't the logits processors share a CFGFSM if we shallow copy?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, logits processors should have a copy method.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably open an issue for this.


formatted = [self.format_sequence(sequence) for sequence in results]

return formatted if len(formatted) > 1 else formatted[0]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

IMO, List[prompt] should always correspond to List[generation]. I think we should condition on the passed prompts type rather than the length of formatted.

outlines/models/llamacpp.py Show resolved Hide resolved
pyproject.toml Outdated Show resolved Hide resolved
@rlouf
Copy link
Member

rlouf commented Feb 15, 2024

@dtiarks You can take a last look

@rlouf rlouf force-pushed the llama-logits-processor branch 4 times, most recently from 1e220e1 to 743b054 Compare February 16, 2024 10:15
@rlouf
Copy link
Member

rlouf commented Feb 16, 2024

I added a copy method to LogitsProcessor that uses the FSMs' copy method. Fixes #672.

@rlouf rlouf linked an issue Feb 16, 2024 that may be closed by this pull request
@rlouf
Copy link
Member

rlouf commented Feb 16, 2024

Merging, great job everyone!

@rlouf rlouf merged commit e99d92d into outlines-dev:main Feb 16, 2024
5 checks passed
joennlae added a commit to joennlae/text-generation-inference that referenced this pull request Mar 19, 2024
this is introduces since version 0.32
with this pull request
outlines-dev/outlines#556
It changes the function name from
`build_regex_from_object` to `build_regex_from_schema`
This leads to an error in newer docker containers when starting
tgi.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement llama.cpp Related to the `llama.cpp` integration
Projects
None yet
3 participants