Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create masks from regex #124

Merged
merged 2 commits into from
Jun 6, 2023
Merged

Create masks from regex #124

merged 2 commits into from
Jun 6, 2023

Conversation

rlouf
Copy link
Member

@rlouf rlouf commented Jun 1, 2023

This PR adds a function that creates a mask from strings that represent a regex. I also add a aliases for a few frequent use cases:

  • floats
  • integers
  • set of characters

Masks are returned as boolean np.ndarray, and can thus be combined using | (logical or) and & (logical and):

>>> import numpy as np
>>> a = np.array([True, False, True])
>>> b = np.array([True, True, False])
>>> a & b
array([ True, False, False])
>>> a | b
array([True, True, True])

I originally used torch.tensors but decided to use numpy.ndarrays in the end. While this could allow supporting models written with different frameworks, I am still wondering if the conversion to and from the frameworks would slow down the generation due to copying the content of the arrays.

In the Numpy-> Framework direction, jax.numpy.asarray does not copy the input is a numpy.ndarray, and neither does torch.from_numpy. So we should be able to return NumPy arrays here without adding any overhead.

@rlouf rlouf added text Linked to text generation enhancement labels Jun 1, 2023
@rlouf rlouf requested a review from brandonwillard June 2, 2023 11:37
@rlouf
Copy link
Member Author

rlouf commented Jun 6, 2023

I benchmarked create_int_mask, create_float_mask and create_char_set_mask against their pure python (no regex) equivalent. The regex version is systematically faster.

@rlouf rlouf merged commit cea7ac3 into outlines-dev:main Jun 6, 2023
3 checks passed
@rlouf rlouf deleted the add-regex-mask branch June 6, 2023 15:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement text Linked to text generation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants