Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[feat] Add json constraint decoding to core-schema-impl/0.5.0 branch. #835

Merged

Conversation

JosephCatrambone
Copy link
Contributor

@JosephCatrambone JosephCatrambone commented Jun 14, 2024

Adds the formatters module and the JSONFormatter module, which in turn uses JSONFormer to better constrain the output of huggingface based models.

Performance delta on a 2023 MacBook Pro with Apple M3 Max:

  • With formatter: 1.48 s ± 23.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
  • Without formatter: 738 ms ± 130 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

These deltas may be a result of...

  • Increased prompt size when using the JSONFormatter.
  • 'Without Formatter' breaking early because of an invalid JSON return. (I.e., rather than iterate multiple times to complete generation, it breaks as soon as invalid JSON is encountered. This is consistent with observations of the returned data.)

Further investigation is planned. Additionally, tests currently are using gpt-2 as a sample model, but we shouldn't be doing any HF imports in the tests. When core-schema-impl is merged we will have access to the new testing harnesses, including the mock models.

Base automatically changed from core-schema-impl to 0.5.0-dev June 17, 2024 13:05
Copy link

@JosephCatrambone JosephCatrambone marked this pull request as ready for review June 18, 2024 16:52
@JosephCatrambone JosephCatrambone changed the title Feature: Add json constraint decoding to core-schema-impl/0.5.0 branch. [feat] Add json constraint decoding to core-schema-impl/0.5.0 branch. Jun 18, 2024
@@ -496,6 +500,14 @@ def from_pydantic(
guard._exec_opts = exec_opts
guard._output_type = schema.output_type
guard._base_model = output_class
if isinstance(output_formatter, str):
if isinstance(output_class, list):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do support root level lists for other flows. Is this restriction because of a limitation with JSONFormer?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. :(

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough, let's make the error message more specific then to explicitly state that output_formatter="jsonformer" can't be used with root-level lists.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw root-level lists are apparently an anti-pattern from an OpenAI function calling perspective - they only support top-level object for structured data

Copy link
Contributor Author

@JosephCatrambone JosephCatrambone Jun 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It turns out they're valid JSON, though! I did not know that. I've edited the message to say the following:

Root-level arrays are not supported with the jsonformer argument, but can be used with other json generation methods. Omit the output_formatter argument to use the other methods.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw root-level lists are apparently an anti-pattern from an OpenAI function calling perspective - they only support top-level object for structured data

@zsimjee
Is this a new limitation with tools vs the old functions args? We've been using function calling with top level arrays with that syntax for a while now

@CalebCourier CalebCourier mentioned this pull request Jun 18, 2024
24 tasks
…hub.com:guardrails-ai/guardrails into jc/feature-add-json-constraint-core-schema-impl
…hub.com:guardrails-ai/guardrails into jc/feature-add-json-constraint-core-schema-impl
…:guardrails-ai/guardrails into jc/feature-add-json-constraint-core-schema-impl
…jc/feature-add-json-constraint-core-schema-impl
…thub.com:guardrails-ai/guardrails into jc/feature-add-json-constraint-core-schema-impl"

This reverts commit a77f4ad, reversing
changes made to 9e36480.
@JosephCatrambone JosephCatrambone merged commit 9c7b5d2 into 0.5.0-dev Jun 20, 2024
12 checks passed
@JosephCatrambone JosephCatrambone deleted the jc/feature-add-json-constraint-core-schema-impl branch June 20, 2024 21:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants