You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am fine tuning a model to convert non-standard JSONs to a standard schema. I'm using vLLM with Outlines to constrain the output generation so it will follow my desired schema. I have noticed that after a recent Outlines update, the model seems to no longer be able to generate parentheses in JSON strings. Some example diffs are below (minus sign represents the expected line and plus represents the actual output)
{
"call_date": null,
"cost": null,
"coupon_max": null,
"coupon_min": 0.05,
"currency_code": null,
- "description": "Bank of Nova Scotia (Dated 1/31/22, Repurchase Value $182,200,000, collateralized by U.S. Treasury Bill 0.000%, 2/24/22–6/16/22, and U.S. Treasury Note/Bond 0.125%–2.875%, 8/31/23–11/15/51, with a value of $185,844,000)",
+ "description": "Bank of Nova Scotia, Dated 1/31/22, Repurchase Value $182,200,000, collateralized by U.S. Treasury Bill 0.000%, 2/24/22–6/16/22, and U.S. Treasury Note/Bond 0.125%–2.875%, 8/31/23–11/15/51, with a value of $185,844,000",
"face_amount": 182200000,
"footnotes": [],
"maturity_date_max": null,
"maturity_date_min": "2/1/22",
"percent_net_assets": null,
"quantity": null,
"reference_rate": null,
"spread": null,
"value": 182200000,
"yield_max": null,
"yield_min": null
}
{
"call_date": null,
"cost": null,
"coupon_max": null,
"coupon_min": 0.05,
"currency_code": null,
- "description": "Barclays Capital Inc. (Dated 1/31/22, Repurchase Value $2,900,000, collateralized by U.S. Treasury Note/Bond 3.000%, 5/15/47, with a value of $2,958,000)",
+ "description": "Barclays Capital Inc. [Dated 1/31/22, Repurchase Value $2,900,000, collateralized by U.S. Treasury Note/Bond 3.000%, 5/15/47, with a value of $2,958,000]",
"face_amount": 2900000,
"footnotes": [],
"maturity_date_max": null,
"maturity_date_min": "2/1/22",
"percent_net_assets": null,
"quantity": null,
"reference_rate": null,
"spread": null,
"value": 2900000,
"yield_max": null,
"yield_min": null
}
{
"call_date": null,
"cost": null,
"coupon_max": null,
"coupon_min": null,
"currency_code": null,
- "description": "Other Assets and Liabilities (net)",
+ "description": "Other Assets and Liabilities",
"face_amount": null,
"footnotes": [],
"maturity_date_max": null,
"maturity_date_min": null,
"percent_net_assets": 4.3,
"quantity": null,
"reference_rate": null,
"spread": null,
"value": 141357139,
"yield_max": null,
"yield_min": null
}
As you can see above, it will it will either drop parentheses, replace them with a similar character like square bracket, or completely skip text in parentheses. I have seen about 100 other similar examples.
This previously was not an issue for me, so I check recent PRs in this repo to see if one of them affected relevant code. This one seems to be the culprit: #829
It looks like for some reason, opening and closing parentheses were added to the prohibited characters for strings.
I'm not confident to submit a PR to fix this bug because I'm not sure of the motivation behind the PR that caused the bug.
Steps/code to reproduce the bug:
importre# new version (doesn't match parentheses)print(re.match(r'([^("\\\x00-\x1f\x7f-\x9f)]|\\\\)', "("))
# old version (matches parentheses)print(re.match(r'(?:[^"\\\x00-\x1f\x7f-\x9f]|\\.)', "("))
Expected result:
The regex should match parentheses
Error message:
No response
Outlines/Python version information:
0.0.40
Python 3.11.0rc1 (main, Aug 12 2022, 10:02:14) [GCC 11.2.0]
Context for the issue:
No response
The text was updated successfully, but these errors were encountered:
Describe the issue as clearly as possible:
I am fine tuning a model to convert non-standard JSONs to a standard schema. I'm using vLLM with Outlines to constrain the output generation so it will follow my desired schema. I have noticed that after a recent Outlines update, the model seems to no longer be able to generate parentheses in JSON strings. Some example diffs are below (minus sign represents the expected line and plus represents the actual output)
As you can see above, it will it will either drop parentheses, replace them with a similar character like square bracket, or completely skip text in parentheses. I have seen about 100 other similar examples.
This previously was not an issue for me, so I check recent PRs in this repo to see if one of them affected relevant code. This one seems to be the culprit: #829
Specifically the following change:
It looks like for some reason, opening and closing parentheses were added to the prohibited characters for strings.
I'm not confident to submit a PR to fix this bug because I'm not sure of the motivation behind the PR that caused the bug.
Steps/code to reproduce the bug:
Expected result:
Error message:
No response
Outlines/Python version information:
0.0.40
Python 3.11.0rc1 (main, Aug 12 2022, 10:02:14) [GCC 11.2.0]
Context for the issue:
No response
The text was updated successfully, but these errors were encountered: