Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Grammars #2105

Closed
wants to merge 77 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
77 commits
Select commit Hold shift + click to select a range
b255b36
functional grammar.py, missing EBNF
Dec 14, 2023
5e1b0d2
removed redundant code
Dec 14, 2023
383e8e2
change ordering
Dec 14, 2023
c116407
bug fix
Dec 14, 2023
8f234f6
remove unused code
Dec 14, 2023
fc05133
remove dead code
Dec 15, 2023
72e12f1
basic functioning EBNF grammar generator based on lark
Dec 19, 2023
50ea652
clean up
Dec 19, 2023
77f347a
clean up and add TokenTrie
Dec 19, 2023
69fbd80
clean up and add NextTokenValidator
Dec 19, 2023
6af9a38
clean up, add NextTokenValidator.valid_token_id_set
Dec 19, 2023
c783db7
clean up
Dec 19, 2023
bd94202
clean up
Dec 19, 2023
ec6cb14
implement GrammarLogitProcessor
Dec 19, 2023
648c676
commit grammar work before I delete it so its in my history
Dec 19, 2023
2b9b265
remove old grammar, rename new grammar
Dec 19, 2023
783bbff
add grammar docs
Dec 19, 2023
57081a6
cleanup
Dec 19, 2023
9052a54
add cache to TokenTrie, resulting in 35% speedup
Dec 20, 2023
e55ae6b
clean up
Dec 20, 2023
7706a9c
WIP: tests
Dec 20, 2023
1fdb0f5
misc bug fixes, optimizations + add EOS token generation
Dec 21, 2023
b3f6502
fix doc rendering
Dec 21, 2023
517e329
Update grammars.rst
lapp0 Dec 21, 2023
2087a82
I had an 8x speedup when fixing ESCAPE_STRING, but a 10x speedup from…
lapp0 Dec 21, 2023
45cce24
clean up and fix bugs
Dec 22, 2023
5015ed3
Merge remote-tracking branch 'lapp0/grammar' into grammar
Dec 22, 2023
b8a625f
update docs
Dec 22, 2023
9d2b1f0
bug fix
Dec 22, 2023
64ef881
remove stray prints
Dec 22, 2023
90b54d7
remove stray debug code
Dec 22, 2023
e441584
clean up tests, add more tests
Dec 22, 2023
8ba330a
remove unused import
Dec 22, 2023
df704b1
don't modify requirements-dev.txt
Dec 22, 2023
f7eb37f
remove unused imports
Dec 22, 2023
577e02e
fix and clean tests
Dec 22, 2023
e1bc7ac
return tensor
Dec 22, 2023
3a55edc
update tests
Dec 22, 2023
236553d
handle batch requests
Dec 22, 2023
cda4711
bug fixes
Dec 22, 2023
bc66e56
bugfix
Dec 22, 2023
4c9de04
fix test
Dec 22, 2023
461d318
reduce number of prompts
Dec 22, 2023
6fe8bd4
fix test
Dec 22, 2023
7d7f8cf
fix yarp
Dec 22, 2023
fcb13d5
yarp ruff
Dec 22, 2023
e71b4ed
fix ruff yarp
Dec 22, 2023
b11256c
write test case for issue noticed: can't use more than one char token…
Dec 23, 2023
a400fc8
use new recursive IncrementalParser instead, cleaner, fixes prev comm…
Dec 23, 2023
2bf537f
improve with memoization
Dec 23, 2023
4d6eeb2
much better memoization + bugfixes
Dec 24, 2023
988e95f
don't use a trie, with memoization it's inefficient
Dec 24, 2023
8db40b2
improve performance 15% with generator
Dec 24, 2023
40f9438
refactor to be cleaner, faster, stateless
Dec 24, 2023
db143ba
do lookup rather than recursive cache check to prevent recursion limi…
Dec 24, 2023
40df2cf
refactor: cleaner, faster
Dec 25, 2023
aee2a8e
bug fixes and test fixes
Dec 25, 2023
8fd06fa
memoize instances where a unique instance is defined by (state stack,…
Dec 26, 2023
ab51de9
update comments
Dec 26, 2023
eb4b7a6
update docs
Dec 26, 2023
af59f2c
try this for docs?
Dec 26, 2023
6eb3820
try adding toc
Dec 26, 2023
befb92c
ruff yapf
Dec 26, 2023
7afc27b
add grammars to openai
Dec 27, 2023
c41f8e4
fixes
Dec 27, 2023
2b2b024
use grammars in /v1/completions only, shouldn't apply to /v1/chat/com…
Dec 27, 2023
c2b026d
yapf
Dec 27, 2023
0fa9d07
add grammar to protocol
Dec 27, 2023
e23f93c
bug fix in api server
Dec 27, 2023
81ec332
integrate into engine
Dec 28, 2023
a1f9352
remove dead code
Dec 29, 2023
db09714
cleaner
Dec 29, 2023
350f364
implementation with no partial state tracked by parser
Dec 29, 2023
a88e506
faster cache decorator via lru_cache
Dec 29, 2023
4f31fc0
add attribution
Dec 29, 2023
f130c98
remove dead code
Dec 29, 2023
344f27b
yapf
Dec 30, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
172 changes: 172 additions & 0 deletions docs/source/grammars/grammars.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,172 @@
.. contents:: Table of Contents
:depth: 3


.. _grammars:


Grammars
========

vLLM offers `Lark <https://lark-parser.readthedocs.io/en/stable/>`_ style EBNF grammars via ``vllm.grammar.GrammarLogitsProcessor``.

``GrammarLogitsProcessor`` ensures generated text follows the rules of a grammar. This provides the ability to guarantee your output is syntactically valid JSON, SQL, Python, RegEx, etc.

Sample Code for JSON
---------------------

.. code-block:: python

json_grammar = r"""
start: value
value: WS* object WS*
object: dict
| list
| string
| signed_number -> number
| "true" -> true
| "false" -> false
| "null" -> null

list : "[" [value ("," value)*] "]"

dict : "{" [pair ("," pair)*] "}"
pair : WS* string WS* ":" value

string : "\"" escaped_string_char* "\""
escaped_string_char: _STR_INNER_CHAR | _ESCAPED_CHAR
_ESCAPED_CHAR: "\\" _ESCAPABLE_CHAR
_STR_INNER_CHAR: /[^\\\"]/
_ESCAPABLE_CHAR: /[\\\/bfnrtu]/

signed_number: ["+"|"-"] number
number: float | int
float: int exp | decimal exp?
decimal: int "." int? | "." int
exp: ("e"|"E") signed_int
signed_int: ["+"|"-"] int
int: DIGIT+
DIGIT: "0".."9"

WS: /[ \t\f\r\n]/
"""
grammar_logits_processor = GrammarLogitsProcessor(
tokenizer,
json_grammar,
grammar_start="value"
)
SamplingParams(logits_processor=grammar_logits_processor)


Performance
-----------

For the provided JSON grammar in the subsection below, constrained to only keyboard characters, on the authors mid-end laptop using codeLlama-7b's vocabulary, generation occurred at the following rates:

- first 10 tokens: 3.47 tokens / second
- first 100 tokens: 8.61 tokens / second
- first 1,000 tokens: 14.41 tokens / second
- first 10,000 tokens: 23.80 tokens / second

There is a "warmup" period where token legality is cached based on parser state. The first generation and first tokens within that generation are the slowest.

**Design your EBNF grammar with minimal regexp**

Regexp processing is the most expensive task for GrammarLogitsProcessor. When designing your EBNF, it's better to keep your regexp short and simple if at all possible.

Breaking down the following expressions ESCAPE_STRING into an expression with many faster-terminating regex resulted in a dramatic speedup:

.. code-block::

start: value
?value: dict
| list
| string
| signed_number -> number
| "true" -> true
| "false" -> false
| "null" -> null
python parser test case
list : "[" [value ("," value)*] "]"

dict : "{" [pair ("," pair)*] "}"
pair : string ":" value

string : "\"" escaped_string_char* "\""
escaped_string_char: STR_INNER_CHAR | ESCAPED_CHAR
ESCAPED_CHAR: "\\" ANY_CHAR
STR_INNER_CHAR: /[^\\\"]/
ANY_CHAR: /[.]/

signed_number: ["+"|"-"] number
number: float | int
float: int exp | decimal exp?
decimal: int "." int? | "." int
exp: ("e"|"E") signed_int
signed_int: ["+"|"-"] int
int: DIGIT+
DIGIT: "0".."9"

WS: /[ \t\f\r\n]/
%ignore WS

# old slow regex-based expressions:

# %import common.ESCAPED_STRING
# %import common.SIGNED_NUMBER
# %import common.WS

**Constrain legal characters**

Every legal character in the alphabet must be checked against the parser by default. Mistral tokenizer, for example, has an alphabet of 3,298 characters, here are 40 random examples:

.. code-block::

[ '堂', 'ู', 'ɔ', '🙌', 'Б', '레', '允', 'ả', '\ue934', '如', '試', 'K', '¯', '卷', '園', 'ए', '\\', '酒', 'थ', 'グ', '터', '연', 'Ș', 'ブ', '星', 'ြ', 'å', '軍', '案', '题', '银', '映', '표', '\x11', '級', '醒', 'ေ', '✭', '約', '😤']

Likely many of these characters aren't useful in your generation.

Expect increased performance if you constrain your generation to UTF-8, eliminating 3,042 unnecessary characters.

.. code-block::

GrammarLogitsProcessor(
tokenizer,
grammar,
legal_chars=set(map(chr, range(256))),,
)

Example 2: constrain the grammar to the set of keyboard typeable characters:

.. code-block::

def keyboard_chars():
keyboard_chars = ""
keyboard_chars += "abcdefghijklmnopqrstuvwxyz"
keyboard_chars += "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
keyboard_chars += "0123456789"
keyboard_chars += "`~!@#$%^&*()-_=+[{]}\\|;:'\",<.>/? "
keyboard_chars += "\t\n"
return keyboard_chars
GrammarLogitsProcessor(
tokenizer,
grammar,
legal_chars=set(keyboard_chars()),
)


Resources
---------

- `How to write an EBNF grammar for Lark <https://lark-parser.readthedocs.io/en/latest/grammar.html>`_
- `Wikipedia - EBNF <https://en.wikipedia.org/wiki/Extended_Backus%E2%80%93Naur_form>`_
- `Wikipedia - LALR Parser <https://en.wikipedia.org/wiki/LALR_parser>`_

Example Lark Grammars
---------------------

Note: These grammars should

- `JSON <https://lark-parser.readthedocs.io/en/latest/examples/advanced/_json_parser.html>`_
- `Python3 <https://github.com/python-poetry/poetry-core/blob/main/src/poetry/core/_vendor/lark/grammars/python.lark>`_
- `Resource with many grammars including SQLite, TOML, YAML, Lua, and more <https://github.com/ligurio/lark-grammars>`_
2 changes: 1 addition & 1 deletion format.sh
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@ tool_version_check() {
}

tool_version_check "yapf" $YAPF_VERSION "$(grep yapf requirements-dev.txt | cut -d'=' -f3)"
tool_version_check "ruff" $RUFF_VERSION "$(grep "ruff==" requirements-dev.txt | cut -d'=' -f3)"
#tool_version_check "ruff" $RUFF_VERSION "$(grep "ruff==" requirements-dev.txt | cut -d'=' -f3)"
tool_version_check "mypy" "$MYPY_VERSION" "$(grep mypy requirements-dev.txt | cut -d'=' -f3)"

YAPF_FLAGS=(
Expand Down
1 change: 0 additions & 1 deletion requirements-dev.txt
Original file line number Diff line number Diff line change
Expand Up @@ -12,4 +12,3 @@ types-setuptools
pytest
pytest-forked
pytest-asyncio

1 change: 1 addition & 0 deletions requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -12,3 +12,4 @@ fastapi
uvicorn[standard]
pydantic == 1.10.13 # Required for OpenAI server.
aioprometheus[starlette]
lark == 1.1.8 # Required for Grammars
Loading
Loading