Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test the CFG integration for llama.cpp #796

Closed
rlouf opened this issue Apr 10, 2024 · 4 comments
Closed

Test the CFG integration for llama.cpp #796

rlouf opened this issue Apr 10, 2024 · 4 comments
Labels
bug grammar help wanted llama.cpp Related to the `llama.cpp` integration tests Linked to library tests

Comments

@rlouf
Copy link
Member

rlouf commented Apr 10, 2024

I took a quick look at the llama.cpp CFG integration and it doesn't seem to give the correct results. We need to investigate and understand what is wrong.

@rlouf rlouf added bug llama.cpp Related to the `llama.cpp` integration help wanted labels Apr 10, 2024
@brandonwillard brandonwillard added grammar tests Linked to library tests labels May 9, 2024
@lapp0
Copy link
Contributor

lapp0 commented May 21, 2024

llamacpp is the only model which supports a CFG Logits Processor, and CFGGuides handling of unambiguously incomplete terminals appears to be broken.

Previously we would handle incomplete terminals by catching exceptions UnexpectedToken / UnexpectedCharacter for calls to interactive.exhaust_lexer(), but the current CFGGuide implementation doesn't.

Code:

import llama_cpp
from outlines.integrations.llamacpp import CFGLogitsProcessor

import outlines.grammars as grammars
import outlines.models as models
import torch


model = models.llamacpp(
    repo_id="QuantFactory/Meta-Llama-3-8B-Instruct-GGUF",
    filename="Meta-Llama-3-8B-Instruct.Q8_0.gguf",
    tokenizer=llama_cpp.llama_tokenizer.LlamaHFTokenizer.from_pretrained(
        "mlx-community/Meta-Llama-3-8B-Instruct-4bit"),
    n_gpu_layers=-1,
)
mock_scores = torch.zeros(model.model.n_vocab())

logits_processor = CFGLogitsProcessor(grammars.json, model.model)

input_ids = []
generation = [90, 702]
for token in generation:
    logits_mask = logits_processor(input_ids, mock_scores)
    input_ids.append(token)
logits_processor(input_ids, mock_scores)

Error:

Traceback (most recent call last):
  File "/opt/conda/lib/python3.10/site-packages/lark/lexer.py", line 673, in lex
    token = self.root_lexer.next_token(lexer_state, parser_state)
  File "/opt/conda/lib/python3.10/site-packages/lark/lexer.py", line 598, in next_token
    raise UnexpectedCharacters(lex_state.text, line_ctr.char_pos, line_ctr.line, line_ctr.column,
lark.exceptions.UnexpectedCharacters: No terminal matches '"' in the current parser context, at line 1 col 2

{"
 ^
Expected one of: 
        * SIGNED_NUMBER
        * NULL
        * RSQB
        * FALSE
        * LBRACE
        * LSQB
        * UNESCAPED_STRING
        * COMMA
        * RBRACE
        * TRUE
        * COLON

Previous tokens: Token('LBRACE', '{')


During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/root/repro_796.py", line 20, in <module>
    lp(response[:i], mock_scores)
  File "/opt/conda/lib/python3.10/site-packages/outlines/integrations/llamacpp.py", line 123, in __call__
    allowed_tokens = self.fsm.get_next_instruction(self._fsm_state).tokens
  File "/opt/conda/lib/python3.10/site-packages/outlines/fsm/guide.py", line 350, in get_next_instruction
    interactive.exhaust_lexer()
  File "/opt/conda/lib/python3.10/site-packages/lark/parsers/lalr_interactive_parser.py", line 52, in exhaust_lexer
    return list(self.iter_parse())
  File "/opt/conda/lib/python3.10/site-packages/lark/parsers/lalr_interactive_parser.py", line 43, in iter_parse
    for token in self.lexer_thread.lex(self.parser_state):
  File "/opt/conda/lib/python3.10/site-packages/lark/lexer.py", line 676, in lex
    raise e  # Raise the original UnexpectedCharacters. The root lexer raises it with the wrong expected set.
  File "/opt/conda/lib/python3.10/site-packages/lark/lexer.py", line 665, in lex
    yield lexer.next_token(lexer_state, parser_state)
  File "/opt/conda/lib/python3.10/site-packages/lark/lexer.py", line 598, in next_token
    raise UnexpectedCharacters(lex_state.text, line_ctr.char_pos, line_ctr.line, line_ctr.column,
lark.exceptions.UnexpectedCharacters: No terminal matches '"' in the current parser context, at line 1 col 2

{"
 ^
Expected one of: 
        * UNESCAPED_STRING
        * RBRACE

Previous tokens: Token('LBRACE', '{')

Investigating how to resolve

@lapp0
Copy link
Contributor

lapp0 commented May 22, 2024

There are a lot of improvements that can be made to ensure CFGLogitsProcessor doesn't result in unexpected RuntimeErrors.

Here is a summary of the issues that would need to be resolved:

major

  • In CFGGuide, UnexpectedCharactor error occurs if a terminal is incomplete when calling interactive.exhaust_lexer().
  • In CFGGuide, if $EOS is a legal next terminal, but not the only legal next terminal, under some conditions we get ValueError("The vocabulary does not allow us to build a sequence that matches the input regex").
  • CFGGuide generates a new RegexGuide and resets the state to 0 if the existing RegexGuide is possibly terminal, not necessarily terminal. This disallows some legal grammars.
  • Structural issue: in CFGGuide, state is updated in get_next_instruction(), rather than get_next_state()
  • There are other conditions where parsing breaks down resulting in an illegal generation, which I haven't yet determined the cause of.

minor

  • CFGGuide has self.generation += self.tokenizer.decode([token_id])[0], but for llamacpp, the passed tokenizer returns a str, not List[str], this means the tokens ["aaaa", "bbbb"] will be read by the parser as "ab" rather than aaaabbbb.
  • CFGGuide checks for eos_token_id in RegexGuide.get_next_instruction(), but it should be checking RegexGuide.is_final_state
  • UNESCAPED_STRING allows invalid json (I have a patch ready for this)
  • LogitsProcessor incorrectly handles Write instructions in cases where num tokens > 1Logits Processors Guide integration will be buggy when len(tokens) > 1 in a Write instruction #855

Proposal

@rlouf @brandonwillard could you share your thoughts please?

Due to the issues with CFGGuide, and the fact that any improvements to the CFGGuide will eventually be superseded by parsing.py, I propose getting parsing.py working as a next step, rather than correcting all the bugs in CFGGuide. This will reduce technical debt substantially, and provide fully functioning CFG structured generation.

@rlouf
Copy link
Member Author

rlouf commented May 22, 2024

I agree on fixing issues in parsing.py first.

@brandonwillard
Copy link
Member

brandonwillard commented May 22, 2024

Due to the issues with CFGGuide, and the fact that any improvements to the CFGGuide will eventually be superseded by parsing.py, I propose getting parsing.py working as a next step, rather than correcting all the bugs in CFGGuide. This will reduce technical debt substantially, and provide fully functioning CFG structured generation.

A necessary change that will almost certainly fix a few of those issues is the use of PartialLark. That is a top priority.

The partial parsing code in parsing.py should already work. At most, the way that lark is being extended can be cleaned up (e.g. remove the new options conflict when using Lark after PartialLark).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug grammar help wanted llama.cpp Related to the `llama.cpp` integration tests Linked to library tests
Projects
None yet
Development

No branches or pull requests

3 participants