Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

introduce WHITESPACE regex constant #599

Merged
merged 2 commits into from
Feb 2, 2024
Merged

Conversation

jc-louis
Copy link
Contributor

@jc-louis jc-louis commented Feb 1, 2024

To harmonize with other regexes that are already in constants.

This will also allow to do monkey patching of this regex, to do experiments related to #484

@jc-louis jc-louis marked this pull request as ready for review February 1, 2024 14:45
@rlouf
Copy link
Member

rlouf commented Feb 1, 2024

This is a very good idea, we might want to be able to pass that as an argument in the near future as well, from our internal experiments it looks like it has a substantial effect on the generation.

@rlouf rlouf added structured generation Linked to structured generation JSON correctness Everything related to the generation correctness labels Feb 1, 2024
@rlouf
Copy link
Member

rlouf commented Feb 1, 2024

It looks like a warning that’s raised in transfomers is making the tests fail.

@Soufiane-Ra
Copy link

+1 for changing outlines/fsm/json_schema.py from whitespace = r"[\n ]* to whitespace = r"[\n ]?" (or just deleting whitespace altogether) to avoid endless generation of new lines

@rlouf rlouf force-pushed the patch-2 branch 2 times, most recently from 106717c to 499d23c Compare February 2, 2024 20:23
@rlouf
Copy link
Member

rlouf commented Feb 2, 2024

It is not clear from our experiments that this is actually better: #484. This PR will help a lot with experiments as one can now modify the value of fsm.json_schema.WHITESPACE.

@jc-louis thank you for the PR!

@rlouf rlouf merged commit 9a2f8de into outlines-dev:main Feb 2, 2024
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
correctness Everything related to the generation correctness JSON structured generation Linked to structured generation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants