Test the CFG integration for `llama.cpp` #796

rlouf · 2024-04-10T09:31:27Z

I took a quick look at the llama.cpp CFG integration and it doesn't seem to give the correct results. We need to investigate and understand what is wrong.

The text was updated successfully, but these errors were encountered:

lapp0 · 2024-05-21T07:59:19Z

llamacpp is the only model which supports a CFG Logits Processor, and CFGGuides handling of unambiguously incomplete terminals appears to be broken.

Previously we would handle incomplete terminals by catching exceptions UnexpectedToken / UnexpectedCharacter for calls to interactive.exhaust_lexer(), but the current CFGGuide implementation doesn't.

Code:

import llama_cpp
from outlines.integrations.llamacpp import CFGLogitsProcessor

import outlines.grammars as grammars
import outlines.models as models
import torch


model = models.llamacpp(
    repo_id="QuantFactory/Meta-Llama-3-8B-Instruct-GGUF",
    filename="Meta-Llama-3-8B-Instruct.Q8_0.gguf",
    tokenizer=llama_cpp.llama_tokenizer.LlamaHFTokenizer.from_pretrained(
        "mlx-community/Meta-Llama-3-8B-Instruct-4bit"),
    n_gpu_layers=-1,
)
mock_scores = torch.zeros(model.model.n_vocab())

logits_processor = CFGLogitsProcessor(grammars.json, model.model)

input_ids = []
generation = [90, 702]
for token in generation:
    logits_mask = logits_processor(input_ids, mock_scores)
    input_ids.append(token)
logits_processor(input_ids, mock_scores)

Error:

Traceback (most recent call last):
  File "/opt/conda/lib/python3.10/site-packages/lark/lexer.py", line 673, in lex
    token = self.root_lexer.next_token(lexer_state, parser_state)
  File "/opt/conda/lib/python3.10/site-packages/lark/lexer.py", line 598, in next_token
    raise UnexpectedCharacters(lex_state.text, line_ctr.char_pos, line_ctr.line, line_ctr.column,
lark.exceptions.UnexpectedCharacters: No terminal matches '"' in the current parser context, at line 1 col 2

{"
 ^
Expected one of: 
        * SIGNED_NUMBER
        * NULL
        * RSQB
        * FALSE
        * LBRACE
        * LSQB
        * UNESCAPED_STRING
        * COMMA
        * RBRACE
        * TRUE
        * COLON

Previous tokens: Token('LBRACE', '{')


During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/root/repro_796.py", line 20, in <module>
    lp(response[:i], mock_scores)
  File "/opt/conda/lib/python3.10/site-packages/outlines/integrations/llamacpp.py", line 123, in __call__
    allowed_tokens = self.fsm.get_next_instruction(self._fsm_state).tokens
  File "/opt/conda/lib/python3.10/site-packages/outlines/fsm/guide.py", line 350, in get_next_instruction
    interactive.exhaust_lexer()
  File "/opt/conda/lib/python3.10/site-packages/lark/parsers/lalr_interactive_parser.py", line 52, in exhaust_lexer
    return list(self.iter_parse())
  File "/opt/conda/lib/python3.10/site-packages/lark/parsers/lalr_interactive_parser.py", line 43, in iter_parse
    for token in self.lexer_thread.lex(self.parser_state):
  File "/opt/conda/lib/python3.10/site-packages/lark/lexer.py", line 676, in lex
    raise e  # Raise the original UnexpectedCharacters. The root lexer raises it with the wrong expected set.
  File "/opt/conda/lib/python3.10/site-packages/lark/lexer.py", line 665, in lex
    yield lexer.next_token(lexer_state, parser_state)
  File "/opt/conda/lib/python3.10/site-packages/lark/lexer.py", line 598, in next_token
    raise UnexpectedCharacters(lex_state.text, line_ctr.char_pos, line_ctr.line, line_ctr.column,
lark.exceptions.UnexpectedCharacters: No terminal matches '"' in the current parser context, at line 1 col 2

{"
 ^
Expected one of: 
        * UNESCAPED_STRING
        * RBRACE

Previous tokens: Token('LBRACE', '{')

Investigating how to resolve

lapp0 · 2024-05-22T01:29:57Z

There are a lot of improvements that can be made to ensure CFGLogitsProcessor doesn't result in unexpected RuntimeErrors.

Here is a summary of the issues that would need to be resolved:

major

In CFGGuide, UnexpectedCharactor error occurs if a terminal is incomplete when calling interactive.exhaust_lexer().
In CFGGuide, if $EOS is a legal next terminal, but not the only legal next terminal, under some conditions we get ValueError("The vocabulary does not allow us to build a sequence that matches the input regex").
CFGGuide generates a new RegexGuide and resets the state to 0 if the existing RegexGuide is possibly terminal, not necessarily terminal. This disallows some legal grammars.
Structural issue: in CFGGuide, state is updated in get_next_instruction(), rather than get_next_state()
There are other conditions where parsing breaks down resulting in an illegal generation, which I haven't yet determined the cause of.

minor

CFGGuide has self.generation += self.tokenizer.decode([token_id])[0], but for llamacpp, the passed tokenizer returns a str, not List[str], this means the tokens ["aaaa", "bbbb"] will be read by the parser as "ab" rather than aaaabbbb.
CFGGuide checks for eos_token_id in RegexGuide.get_next_instruction(), but it should be checking RegexGuide.is_final_state
UNESCAPED_STRING allows invalid json (I have a patch ready for this)
LogitsProcessor incorrectly handles Write instructions in cases where num tokens > 1Logits Processors Guide integration will be buggy when len(tokens) > 1 in a Write instruction #855

Proposal

@rlouf @brandonwillard could you share your thoughts please?

Due to the issues with CFGGuide, and the fact that any improvements to the CFGGuide will eventually be superseded by parsing.py, I propose getting parsing.py working as a next step, rather than correcting all the bugs in CFGGuide. This will reduce technical debt substantially, and provide fully functioning CFG structured generation.

rlouf · 2024-05-22T08:27:52Z

I agree on fixing issues in parsing.py first.

brandonwillard · 2024-05-22T22:37:47Z

Due to the issues with CFGGuide, and the fact that any improvements to the CFGGuide will eventually be superseded by parsing.py, I propose getting parsing.py working as a next step, rather than correcting all the bugs in CFGGuide. This will reduce technical debt substantially, and provide fully functioning CFG structured generation.

A necessary change that will almost certainly fix a few of those issues is the use of PartialLark. That is a top priority.

The partial parsing code in parsing.py should already work. At most, the way that lark is being extended can be cleaned up (e.g. remove the new options conflict when using Lark after PartialLark).

rlouf added bug llama.cpp Related to the `llama.cpp` integration help wanted labels Apr 10, 2024

brandonwillard added grammar tests Linked to library tests labels May 9, 2024

rlouf closed this as not planned Won't fix, can't repro, duplicate, stale May 23, 2024

lapp0 mentioned this issue May 31, 2024

Allow Tokens to Span Multiple Terminals in CFG #684

Closed

lapp0 mentioned this issue Jun 12, 2024

Using context-free grammars to guide generation does not work #959

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Test the CFG integration for `llama.cpp` #796

Test the CFG integration for `llama.cpp` #796

rlouf commented Apr 10, 2024

lapp0 commented May 21, 2024 •

edited

Loading

lapp0 commented May 22, 2024 •

edited

Loading

rlouf commented May 22, 2024

brandonwillard commented May 22, 2024 •

edited

Loading

Test the CFG integration for llama.cpp #796

Test the CFG integration for llama.cpp #796

Comments

rlouf commented Apr 10, 2024

lapp0 commented May 21, 2024 • edited Loading

lapp0 commented May 22, 2024 • edited Loading

Proposal

rlouf commented May 22, 2024

brandonwillard commented May 22, 2024 • edited Loading

Test the CFG integration for `llama.cpp` #796

Test the CFG integration for `llama.cpp` #796

lapp0 commented May 21, 2024 •

edited

Loading

lapp0 commented May 22, 2024 •

edited

Loading

brandonwillard commented May 22, 2024 •

edited

Loading