Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add JSON generation method #140

Merged
merged 4 commits into from
Aug 14, 2023

Conversation

rlouf
Copy link
Member

@rlouf rlouf commented Jun 9, 2023

In this PR we add a new generation method that can generate valid JSON following a Pydantic mode or JSON Schema specification.

We first parse the (possibly nested) JSON schema into a generation schedule, which is a list of strings that represent the JSON structure, and annotations that correspond to the field specifications. The following annotations are currently supported, with a minimal set of functionalities:

  • string
  • int
  • float
  • bool
  • Union[type1, type2]
  • enum
  • List[type]
  • List[Union[type1, type2]]
  • Optional[type]
  • Nested schema
  • maxLength for strings
  • minLength for strings

The tests demonstrate how to build a schedule from Pydantic models, using the models' schema_json method.

We also add an outlines.text.generate.json function that parses a JSON schema, turns it into a regex and instantiates the Regex class.

@rlouf rlouf added text Linked to text generation enhancement structured generation Linked to structured generation labels Jul 17, 2023
@rlouf rlouf force-pushed the generate-schedule-from-json-schema branch 3 times, most recently from d128cc4 to eb4b9f4 Compare August 1, 2023 11:21
@rlouf rlouf changed the title Parse JSON schema into a generation schedule Add JSON generation method Aug 1, 2023
@rlouf rlouf force-pushed the generate-schedule-from-json-schema branch 4 times, most recently from d58164c to beb3b90 Compare August 4, 2023 23:31
@havietisov
Copy link

off topic : would be interesting to see constrained generation of HTML

@rlouf
Copy link
Member Author

rlouf commented Aug 6, 2023

off topic : would be interesting to see constrained generation of HTML

Coming with #178

@rlouf rlouf force-pushed the generate-schedule-from-json-schema branch 2 times, most recently from e09827d to 3d66893 Compare August 12, 2023 21:41
@rlouf rlouf force-pushed the generate-schedule-from-json-schema branch 3 times, most recently from 28c8671 to 92495f5 Compare August 12, 2023 21:46
@rlouf rlouf marked this pull request as ready for review August 12, 2023 21:52
@rlouf rlouf force-pushed the generate-schedule-from-json-schema branch from 92495f5 to 8e3e33d Compare August 12, 2023 21:52
@rlouf rlouf force-pushed the generate-schedule-from-json-schema branch 2 times, most recently from c39d3eb to a2e08ff Compare August 13, 2023 03:30
@rlouf rlouf force-pushed the generate-schedule-from-json-schema branch 2 times, most recently from b3555cd to 08dc150 Compare August 14, 2023 18:12
@rlouf rlouf force-pushed the generate-schedule-from-json-schema branch 2 times, most recently from bb8675c to 34e4539 Compare August 14, 2023 18:20
@rlouf rlouf force-pushed the generate-schedule-from-json-schema branch from 34e4539 to e1b136f Compare August 14, 2023 18:21
@rlouf rlouf merged commit df993c4 into outlines-dev:main Aug 14, 2023
4 checks passed
@rlouf rlouf deleted the generate-schedule-from-json-schema branch August 14, 2023 18:39
@rlouf rlouf mentioned this pull request Nov 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement structured generation Linked to structured generation text Linked to text generation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants