Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Streaming chunk accumulation #741

Merged

Conversation

nichwch
Copy link
Collaborator

@nichwch nichwch commented May 7, 2024

Implements new validate_stream method on Validator, and changes stream_runner to pass individual chunks to validators. Validators now internally accumulate chunks until they reach a specified chunk size, then emit a validation up to the iterator in stream_runner.

These changes only apply to string_schema and SequentialValidatorService for now.

EDIT: This now contains the changes for the error_spans schema changes as well

Copy link
Collaborator

@zsimjee zsimjee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1: In stream_runner, how can I tell what the validator chunk size is? I need to know this to decide if the validator chunk size is smaller than the LLM chunk size, in which case I'll have to split the validator chunk size to be smaller.

See comments in stream_runner, string_schema

2: Is it true that the parse method for string_schema is just supposed to check that the output isn't empty?

It also makes sure that the output is a string. i.e. not num, obj, etc...

3: What would a good way to test these changes be? What general workflow do you guys use to test your changes (besides the built in tests)

best way to check for backwards compatability would be by using the integration tests + running the streaming noteboook (https://github.com/guardrails-ai/guardrails/blob/14bb0bccde051338dbc29cc50c77970a4bfb1a08/docs/how_to_guides/streaming.ipynb)

guardrails/schema/string_schema.py Outdated Show resolved Hide resolved
guardrails/validator_base.py Outdated Show resolved Hide resolved
@nichwch nichwch changed the title preliminary code and pseudocode Streaming chunk accumulation May 10, 2024
@nichwch nichwch marked this pull request as ready for review May 16, 2024 22:46
@nichwch nichwch requested a review from CalebCourier May 17, 2024 21:47
Copy link
Collaborator

@zsimjee zsimjee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good to approve this pending test and checks.

@CalebCourier CalebCourier changed the base branch from main to feat/streaming-update May 30, 2024 17:50
@CalebCourier CalebCourier changed the base branch from feat/streaming-update to main May 30, 2024 17:55
@CalebCourier CalebCourier changed the base branch from main to feat/streaming-update May 30, 2024 17:55
@CalebCourier CalebCourier merged commit 4e5d479 into feat/streaming-update Jun 3, 2024
20 checks passed
@CalebCourier CalebCourier deleted the nichwch/chunk-accumulation-rewrite branch June 3, 2024 19:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants