Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Connectors Refactoring Discussion #4510

Closed
justusschock opened this issue Nov 4, 2020 · 9 comments · Fixed by #5743
Closed

Connectors Refactoring Discussion #4510

justusschock opened this issue Nov 4, 2020 · 9 comments · Fixed by #5743
Assignees
Labels
design Includes a design discussion discussion In a discussion stage feature Is an improvement or enhancement
Milestone

Comments

@justusschock
Copy link
Member

This is a brief summary of our refactoring discussion with @tchaton @awaelchli .

The easy part

Refactor all connectors to be completely self-contained and establish call hierarchy.
This means:

  • Connectors should not call other connectors
  • Connectors should not have a trainer reference or change anything on trainer
  • Trainer properties will just gather information from respective connector.

A potential call hierarchy could be like this:
Trainer -> Loops -> Connectors -> Accelerators -> Plugins

That also means, that every class can only call types that are below it in the hierarchy and nothing that is on the same level or higher. I.e. Accelerators cannot call other accelerators, trainer, loops or connectors but only plugins.

The more difficult part

Refactor all Accelerators and Plugins to further separate them.
Accelerators should only contain hardware specific stuff, whereas plugins include changes in the training routine.

So we would end up with something about 3-5 accelerators and everything else (like DDP, DDP2, DDP-Spawn, DP, AMP etc.) would become a Plugin. This has the advantages that
1.) it is cleaner to implement once the things are more separated
2.) It is easier to look at specific parts since currently the DDP stuff for example is still scattered across the plugin and the accelerators
3.) Every new plugin should apply to all/most accelerators with no change.

How to do this

The first part can be changed slowly while fixing bugs and implementing features.

The second part however needs to be done more carefully since this requires breaking changes in our internals.
So I'll start there with a draft and see, how hard it would be and how the result would look like before we apply this to all the accelerators and plugins.

cc @edenlightning

@justusschock justusschock added feature Is an improvement or enhancement design Includes a design discussion labels Nov 4, 2020
@justusschock justusschock self-assigned this Nov 4, 2020
@tchaton tchaton added this to the 1.1 milestone Nov 5, 2020
@tchaton
Copy link
Contributor

tchaton commented Nov 5, 2020

Hey @justusschock,

Great summary!
Should plugins have access to the trainer ?

Best regards,
T.C

@tarepan
Copy link
Contributor

tarepan commented Nov 6, 2020

Separation of Concern with controlled dependency is great!
Can we check above idea with a concrete case?

For example, CheckpointConnector change trainer state.
It break the rule Connectors should not have a trainer reference or change anything on trainer,
but changing trainer state is concern of this connector.
How we should refactor?

@justusschock
Copy link
Member Author

@tchaton Ideally they wouldn't.

@tarepan This is not yet sure. But I'm currently trying to prototype this.

@edenlightning
Copy link
Contributor

@justusschock any updates?

@justusschock
Copy link
Member Author

@edenafek I have refactored some of the accelerators and plugins as proof of concept. currently working on trainer integration.

@edenlightning
Copy link
Contributor

link to draft PR?

@justusschock
Copy link
Member Author

There is no so far. I'm doing this in my own fork and I'll open a draft PR, as soon as I got a running version.

@Borda Borda modified the milestones: 1.1, 1.2 Nov 30, 2020
@awaelchli
Copy link
Contributor

awaelchli commented Dec 28, 2020

Quick update since there was some interest: current state is here in accelerator_refactor branch in Justus's fork.
https://github.com/justusschock/pytorch-lightning/tree/accelerator-refactor

Slides:
https://docs.google.com/presentation/d/1v5EOO9yl3rxQ_uXGPPc4VURNOvx4odqPCiP5eRRPEy0/edit?usp=sharing

@Borda
Copy link
Member

Borda commented Jan 8, 2021

Slides: https://docs.google.com/presentation/d/1v5EOO9yl3rxQ_uXGPPc4VURNOvx4odqPCiP5eRRPEy0/edit?usp=sharing

mind open it for comments? :]

@edenlightning edenlightning changed the title Refactoring Discussion Connectors Refactoring Discussion Jan 8, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
design Includes a design discussion discussion In a discussion stage feature Is an improvement or enhancement
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants