Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Address issues with source with component names and replacing listeners #12701

Conversation

adrianeboyd
Copy link
Contributor

@adrianeboyd adrianeboyd commented Jun 6, 2023

Description

When sourcing a component, the object from the original pipeline is added to the new pipeline as the same object. This creates a situation where there are several attributes that cannot be in sync between the original pipeline and the new pipeline at the same time for this one object:

  • component.name
  • component.listener_map / component.listening_components for tok2vec and transformer

When running replace_listeners on a component, the config is not updated correctly if the state of the component is incorrect for the current pipeline (in particular changes that should be applied from model.attrs["replace_listener_cfg"] as used in spacy-transformers) due to the fact that:

  • find_listeners relies on component.name to set the name in the listener_map
  • replace_listeners relies on listener_map to determine how to modify the configs

In addition, there are several places where pipeline components are modified and the listener map and/or internal component names aren't currently updated.

In cases where there is a component shared by two pipelines that cannot be in sync, this PR chooses to prioritize the most recently modified or initialized pipeline. There is no actual solution with the current source behavior that will make both pipelines usable, so the current pipeline is updated whenever components are added/renamed/removed or the pipeline is initialized for training.

Types of change

Bug fix.

Checklist

  • I confirm that I have the right to submit this contribution under the project's MIT license.
  • I ran the tests, and all new and existing tests passed.
  • My changes don't require a change to the documentation, or if they do, I've added all required information.

@adrianeboyd adrianeboyd added bug Bugs and behaviour differing from documentation feat / pipeline Feature: Processing pipeline and components labels Jun 6, 2023
@adrianeboyd
Copy link
Contributor Author

I'll let the tests run in the CI first to show the issues. The new tests have a few temporary modifications (marked with TODOs for later removal) so that the test failures show the interesting failures rather than more minor technical issues.

@adrianeboyd
Copy link
Contributor Author

With the pipeline state kept in sync for the new pipeline by add_pipe, the code block at the end of Language.from_config to sync listener status should no longer be needed.

It doesn't cause any problems, but it should never apply. In particular, it was it looks like it was unintentional that replace_listeners was being checked in two different places and I suspect that at least that part probably should have been removed in #7620.

@svlandeg svlandeg mentioned this pull request Jun 9, 2023
3 tasks
@adrianeboyd
Copy link
Contributor Author

@honnibal honnibal merged commit c067b52 into explosion:master Jun 27, 2023
adrianeboyd added a commit to adrianeboyd/spaCy that referenced this pull request Jun 28, 2023
…rs (explosion#12701)

When sourcing a component, the object from the original pipeline is added to the new pipeline as the same object. This creates a situation where there are several attributes that cannot be in sync between the original pipeline and the new pipeline at the same time for this one object:

* component.name
* component.listener_map / component.listening_components for tok2vec and transformer

When running replace_listeners on a component, the config is not updated correctly if the state of the component is incorrect for the current pipeline (in particular changes that should be applied from model.attrs["replace_listener_cfg"] as used in spacy-transformers) due to the fact that:

* find_listeners relies on component.name to set the name in the listener_map
* replace_listeners relies on listener_map to determine how to modify the configs

In addition, there are several places where pipeline components are modified and the listener map and/or internal component names aren't currently updated.

In cases where there is a component shared by two pipelines that cannot be in sync, this PR chooses to prioritize the most recently modified or initialized pipeline. There is no actual solution with the current source behavior that will make both pipelines usable, so the current pipeline is updated whenever components are added/renamed/removed or the pipeline is initialized for training.
adrianeboyd added a commit to adrianeboyd/spaCy that referenced this pull request Jul 18, 2023
adrianeboyd added a commit that referenced this pull request Jul 18, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Bugs and behaviour differing from documentation feat / pipeline Feature: Processing pipeline and components
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants