Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove broken final state loop #874

Merged
merged 3 commits into from
May 11, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 1 addition & 5 deletions outlines/fsm/guide.py
Original file line number Diff line number Diff line change
Expand Up @@ -193,12 +193,8 @@ def get_next_state(self, state: int, token_id: int) -> int:
The new state of the guide.

"""
if token_id == self.eos_token_id:
if token_id == self.eos_token_id or state not in self.states_to_token_maps:
Copy link

@ekagra-ranjan ekagra-ranjan May 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@br3no I was wondering if we really need the 2nd condition state not in self.states_to_token_maps ? The condition basically checks for states which do not have outgoing edges. But such states would be a part of final states in the FSM and this block of code adds EOS as an edge to such states which makes them have atleast one outgoing edges. Therefore, no states in FSM will be absent in the states_to_token_maps. Wdyt?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ekagra-ranjan yes, we do need this second condition.

The block of code you linked to does not add an EOS outbound transition to these states. It only adds transitions to final states which are present in states_to_token_maps. But these states are not present there. states_to_token_subsets.get(state) will return None for these states.

I'm not really knowledgeable about the way Outlines (and interegular) build the state machines out of regexes. The matter of fact is that the states_to_token_maps does not contain all states that are reachable. I have noticed this while debugging the code for some example regexes.

This is not a problem in principle, as these states are considered to be final and states_to_token_subsets.get(state) is None is used all over the code to handle this special case (as in the block you linked to).

I actually believe this could be improved and Outlines would profit from removing this special case that needs to be thought of all over the place and could lead to bugs. But this is, as I said, not a problem in principle.

return -1
elif (
state in self.final_states
): # Necessary because we keep generating EOS tokens when finished
return state

last_token_to_end_state = self.states_to_token_maps[state]
next_state = last_token_to_end_state.get(token_id)
Expand Down
4 changes: 1 addition & 3 deletions tests/fsm/test_fsm.py
Original file line number Diff line number Diff line change
Expand Up @@ -82,9 +82,7 @@ def convert_token_to_string(self, token):
assert fsm.is_final_state(state)

state = fsm.next_state(state=5, token_id=103)
assert state == 5

assert fsm.is_final_state(-1)
assert fsm.is_final_state(state)


def test_cfg():
Expand Down
4 changes: 1 addition & 3 deletions tests/fsm/test_guide.py
Original file line number Diff line number Diff line change
Expand Up @@ -180,9 +180,7 @@ def convert_token_to_string(self, token):
assert fsm.is_final_state(state)

state = fsm.get_next_state(state=5, token_id=103)
assert state == 5

assert fsm.is_final_state(-1)
assert fsm.is_final_state(state)


def test_cfg():
Expand Down
Loading