-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Address issues with source with component names and replacing listeners #12701
Address issues with source with component names and replacing listeners #12701
Conversation
I'll let the tests run in the CI first to show the issues. The new tests have a few temporary modifications (marked with TODOs for later removal) so that the test failures show the interesting failures rather than more minor technical issues. |
With the pipeline state kept in sync for the new pipeline by It doesn't cause any problems, but it should never apply. In particular, it was it looks like it was unintentional that |
Tests for
|
…rs (explosion#12701) When sourcing a component, the object from the original pipeline is added to the new pipeline as the same object. This creates a situation where there are several attributes that cannot be in sync between the original pipeline and the new pipeline at the same time for this one object: * component.name * component.listener_map / component.listening_components for tok2vec and transformer When running replace_listeners on a component, the config is not updated correctly if the state of the component is incorrect for the current pipeline (in particular changes that should be applied from model.attrs["replace_listener_cfg"] as used in spacy-transformers) due to the fact that: * find_listeners relies on component.name to set the name in the listener_map * replace_listeners relies on listener_map to determine how to modify the configs In addition, there are several places where pipeline components are modified and the listener map and/or internal component names aren't currently updated. In cases where there is a component shared by two pipelines that cannot be in sync, this PR chooses to prioritize the most recently modified or initialized pipeline. There is no actual solution with the current source behavior that will make both pipelines usable, so the current pipeline is updated whenever components are added/renamed/removed or the pipeline is initialized for training.
Follow-up to explosion#12701.
Description
When sourcing a component, the object from the original pipeline is added to the new pipeline as the same object. This creates a situation where there are several attributes that cannot be in sync between the original pipeline and the new pipeline at the same time for this one object:
component.name
component.listener_map
/component.listening_components
fortok2vec
andtransformer
When running
replace_listeners
on a component, the config is not updated correctly if the state of the component is incorrect for the current pipeline (in particular changes that should be applied frommodel.attrs["replace_listener_cfg"]
as used inspacy-transformers
) due to the fact that:find_listeners
relies oncomponent.name
to set the name in thelistener_map
replace_listeners
relies onlistener_map
to determine how to modify the configsIn addition, there are several places where pipeline components are modified and the listener map and/or internal component names aren't currently updated.
In cases where there is a component shared by two pipelines that cannot be in sync, this PR chooses to prioritize the most recently modified or initialized pipeline. There is no actual solution with the current
source
behavior that will make both pipelines usable, so the current pipeline is updated whenever components are added/renamed/removed or the pipeline is initialized for training.Types of change
Bug fix.
Checklist