Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

build_regex_from_schema generates incorrect regexes when used with "const" and "enum" for booleans, nulls, and strings needing escaping #971

Closed
mwootten opened this issue Jun 14, 2024 · 0 comments · Fixed by #972
Labels

Comments

@mwootten
Copy link
Contributor

Describe the issue as clearly as possible:

  • The regex expects booleans to be capitalized like in Python, not in lowercase as in JSON
  • The regex expects nulls to be None like in Python, not null as in JSON
  • The regex handles string escaping for regular expressions, but doesn't take into account that a string with quotes or backslashes needs to be escaped in the JSON output.

Steps/code to reproduce the bug:

import json
from outlines.generate.json import build_regex_from_schema
print(build_regex_from_schema(json.dumps({"const": True}))) # => True
print(build_regex_from_schema(json.dumps({"const": None}))) # => None
print(build_regex_from_schema(json.dumps({"const": '"'})))  # => """

Expected result:

true
null
"\\""

Error message:

No response

Outlines/Python version information:

Version information

``` pip freeze 0.0.44 Python 3.12.3 (main, Apr 17 2024, 00:00:00) [GCC 13.2.1 20240316 (Red Hat 13.2.1-7)] aiohttp==3.9.5 aiosignal==1.3.1 annotated-types==0.7.0 anyio==4.4.0 attrs==23.2.0 beartype==0.15.0 certifi==2024.6.2 cfgv==3.4.0 chardet==5.2.0 charset-normalizer==3.3.2 cloudpickle==3.0.0 cmake==3.29.5.1 coverage==7.5.3 datasets==2.20.0 diff_cover==9.0.0 dill==0.3.8 diskcache==5.6.3 distlib==0.3.8 distro==1.9.0 filelock==3.15.1 flake8==7.0.0 frozenlist==1.4.1 fsspec==2024.5.0 h11==0.14.0 httpcore==1.0.5 httpx==0.27.0 huggingface-hub==0.23.4 identify==2.5.36 idna==3.7 iniconfig==2.0.0 interegular==0.3.3 Jinja2==3.1.4 jsonschema==4.22.0 jsonschema-specifications==2023.12.1 lark==1.1.9 llama_cpp_python==0.2.78 llvmlite==0.43.0 MarkupSafe==2.1.5 mccabe==0.7.0 mpmath==1.3.0 multidict==6.0.5 multiprocess==0.70.16 nest-asyncio==1.6.0 networkx==3.3 ninja==1.11.1.1 nodeenv==1.9.1 numba==0.60.0 numpy==1.26.4 nvidia-cublas-cu12==12.1.3.1 nvidia-cuda-cupti-cu12==12.1.105 nvidia-cuda-nvrtc-cu12==12.1.105 nvidia-cuda-runtime-cu12==12.1.105 nvidia-cudnn-cu12==8.9.2.26 nvidia-cufft-cu12==11.0.2.54 nvidia-curand-cu12==10.3.2.106 nvidia-cusolver-cu12==11.4.5.107 nvidia-cusparse-cu12==12.1.0.106 nvidia-nccl-cu12==2.20.5 nvidia-nvjitlink-cu12==12.5.40 nvidia-nvtx-cu12==12.1.105 openai==1.34.0 outlines @ file:///home/mwootten/Projects/outlines packaging==24.1 pandas==2.2.2 platformdirs==4.2.2 pluggy==1.5.0 pre-commit==3.7.1 py-cpuinfo==9.0.0 pyairports==2.1.1 pyarrow==16.1.0 pyarrow-hotfix==0.6 pycodestyle==2.11.1 pycountry==24.6.1 pydantic==2.7.4 pydantic_core==2.18.4 pyflakes==3.2.0 Pygments==2.18.0 pytest==8.2.2 pytest-benchmark==4.0.0 pytest-cov==5.0.0 pytest-mock==3.14.0 python-dateutil==2.9.0.post0 pytz==2024.1 PyYAML==6.0.1 referencing==0.35.1 regex==2024.5.15 requests==2.32.3 responses==0.25.3 rpds-py==0.18.1 safetensors==0.4.3 setuptools==70.0.0 six==1.16.0 sniffio==1.3.1 sympy==1.12.1 tokenizers==0.19.1 torch==2.3.0 tqdm==4.66.4 transformers==4.41.2 typing_extensions==4.12.2 tzdata==2024.1 urllib3==2.2.1 virtualenv==20.26.2 wheel==0.43.0 xxhash==3.4.1 yarl==1.9.4 ```

Context for the issue:

No response

@mwootten mwootten added the bug label Jun 14, 2024
mwootten added a commit to mwootten/outlines that referenced this issue Jun 14, 2024
… nulls, and strings

Use JSON's serialization, rather than Python's, in this case.

Fixes outlines-dev#971. Note that this still does not correctly handle arrays
and objects, which are allowed by the JSON schema spec; however,
those would be more complex to handle correctly.
mwootten added a commit to mwootten/outlines that referenced this issue Jun 17, 2024
… nulls, and strings

Use JSON's serialization, rather than Python's, in this case.

Fixes outlines-dev#971. Note that this still does not correctly handle arrays
and objects, which are allowed by the JSON schema spec; however,
those would be more complex to handle correctly.
rlouf pushed a commit to mwootten/outlines that referenced this issue Jun 17, 2024
… nulls, and strings

Use JSON's serialization, rather than Python's, in this case.

Fixes outlines-dev#971. Note that this still does not correctly handle arrays
and objects, which are allowed by the JSON schema spec; however,
those would be more complex to handle correctly.
@rlouf rlouf closed this as completed in 7d8269f Jun 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
1 participant