Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JSON strings cannot generate with "\" character in them #935

Closed
posionus opened this issue May 31, 2024 · 2 comments · Fixed by #991
Closed

JSON strings cannot generate with "\" character in them #935

posionus opened this issue May 31, 2024 · 2 comments · Fixed by #991
Labels

Comments

@posionus
Copy link
Contributor

posionus commented May 31, 2024

Describe the issue as clearly as possible:

I have fine tuned a model to generate json. I am using outlines with vLLM to force the model to generate correct json schema. Outlines is failing to generate:

{
  "cost": null,
  "description": "Net Equity Attributable to Holders of Redeemable Units (\"Net Equity\")",
  "footnotes": [],
  "percent_net_assets": 100.0,
  "value": 3767045854
}

Specifically, it is failing to generate the "(\"Net Equity\")" in the description field.

The issue seems to be with this line

STRING_INNER = r'([^"\\\x00-\x1f\x7f-\x9f]|\\\\)'

It is prohibiting any \ character inside of the string.

"Net Equity Attributable to Holders of Redeemable Units (\"Net Equity\")" is a valid json string. I have tested parsing the json above using python json.loads. It should be able to generate using outlines.

Steps/code to reproduce the bug:

My schema:

class StatementInvestmentsTotal(BaseModel):
    cost: Optional[float]
    description: Optional[str]
    footnotes: list[str]
    percent_net_assets: Optional[float]
    value: Optional[float]

Expected result:

Outlines should allow generating backslashes in json strings

Error message:

No response

Outlines/Python version information:

Version information

``` (command output here) ```

Context for the issue:

No response

@posionus posionus added the bug label May 31, 2024
@lapp0
Copy link
Collaborator

lapp0 commented Jun 12, 2024

Edit: sorry replied to the wrong thread, I'll take a look at this soon.

@lapp0
Copy link
Collaborator

lapp0 commented Jun 13, 2024

@posionus any interest verifying and creating a PR for this fix? I have a bit on my backlog.

>>> interegular.parse_pattern(json_schema.STRING).to_fsm().accepts(r'"my escaped string: \" okay"')
False

>>> STRING_INNER = r'(?:[^"\\]|\\.)*?'
>>> STRING = r'"' + STRING_INNER + r'"'
>>> interegular.parse_pattern(STRING).to_fsm().accepts(r'"my escaped string:  \" okay"')
True

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

2 participants