Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add outlines.generate.fsm() API entrypoint #699

Merged
merged 1 commit into from
Feb 25, 2024

Conversation

miftahmoha
Copy link
Contributor

This adds the outlines.generate.fsm() API entrypoint following #670.

Pre-commit:

Tests:

tests

@miftahmoha miftahmoha marked this pull request as draft February 21, 2024 23:21
Copy link
Collaborator

@lapp0 lapp0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please review the test cases tests/fsm/test_fsm.py and introduce test cases for this new method?

It's probably sufficient to simply verify that RegexFSM.from_interegular_fsm(interegular.parse_pattern(pattern)) == RegexFSM(pattern) for a few patterns.

outlines/fsm/fsm.py Show resolved Hide resolved
@miftahmoha
Copy link
Contributor Author

miftahmoha commented Feb 22, 2024

I added a type correction, using the cache, mypy wasn't able to check for Tuple[Tuple[str, int]] which should have been Tuple[Tuple[str, int], ...].

@miftahmoha miftahmoha marked this pull request as ready for review February 22, 2024 11:17
@rlouf rlouf added enhancement structured generation Linked to structured generation labels Feb 22, 2024
@rlouf
Copy link
Member

rlouf commented Feb 22, 2024

Would you mind rebasing your branch on main (and overwrite the merge)? Also we need to document this new feature and briefly explain why it is useful.

@miftahmoha
Copy link
Contributor Author

@rlouf Done.

According to #670 (comment), interegular.FSM offers operations that RegexFSM doesn't seem to have. For the examples, we've got #666 and #156 that we can use.

What do you think @lapp0?

@lapp0
Copy link
Collaborator

lapp0 commented Feb 22, 2024

@miftahmoha yes, we should document those use cases. This perfectly solves those problems.

Use a pattern but disallow certain keywords.

Example: A list of strings, but none of those strings can be "pink" or "elephant"

import interegular

list_of_strings_pattern = """\["[^"\s]*"(?:,"[^"\s]*")*\]"""
pink_elephant_pattern = """.*(pink|elephant).*"""

list_of_strings_fsm = interegular.parse_pattern(list_of_strings_pattern).to_fsm()
pink_elephant_fsm = interegular.parse_pattern(pink_elephant_pattern).to_fsm()

list_of_strings_fsm.accepts('["a","pink","elephant"]')
# True

subtracted_fsm = list_of_strings_fsm - pink_elephant_fsm

subtracted_fsm.accepts('["a","pink","elephant"]')
# False
subtracted_fsm.accepts('["a","blue","donkey"]')
# True

If you're talking about implementing not_in, I think that's out of scope for this PR and should be discussed in a separate issue.

@miftahmoha
Copy link
Contributor Author

@lapp0 @rlouf I added some documentation.

@rlouf rlouf merged commit 7a21043 into outlines-dev:main Feb 25, 2024
4 checks passed
@rlouf
Copy link
Member

rlouf commented Feb 25, 2024

Thank you so much for the valuable addition @miftahmoha !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement structured generation Linked to structured generation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants