[WIP] Add a pytorch lightning trainer #3860

matt-gardner · 2020-02-27T16:14:38Z

I'm still noodling around, but in about 20 minutes I was able to get a simple test passing. This needs a lot of work to actually be usable, but the proof of concept is here, it doesn't look like it's very much work, or very much code that we'd have to maintain.

I only used Params here because that's what the test I was copying did. Still to do:

Completely remove the Params bit out of the code that's added here, and just make a TrainWithLightning object that takes concrete objects as arguments (that we can construct FromParams if anyone wants to use that pipeline). This probably also requires a Registrable AllenNlpLightningModule that can be subclassed for different configuration.
Make sure this actually trains a real model, and serializes stuff how we want (we almost certainly want to create our typical model archive, for example, though I'm not sure what pytorch lightning does by default)
Add whatever other bells and whistles people might want (not really sure what's necessary here; if anyone reading this uses or wants to use pytorch lightning, let me know what features should definitely be supported)

Bottom line: this is definitely possible, and should be very easy to support. I don't know exactly what that support should look like, as I have no experience with this other trainer. If you have specific requests, comment below. I also will be out of town for a week and likely won't be working on this for a little bit, so if anyone is super eager and wants to try fleshing this out, let me know.

matt-gardner · 2020-02-27T16:23:54Z

Also, if you're excited about this possibility, please put an emoji reaction on the PR description, so we know how much to prioritize this work.

bryant1410 · 2020-02-27T18:03:45Z

Also, if you're excited about this possibility, please put an emoji reaction on the PR description, so we know how much to prioritize this work.

Now that you mention this, there's FeatHub that allows people to propose and vote for features. See this for an example.

DeNeutoy · 2020-02-27T21:34:36Z

@matt-gardner how does this interact with the integration of some callbacks into the original trainer? Should I wait on that given this PR? We should have either this or the current trainer with some additional callbacks, we should not retain two trainers. I would be in favour of this pytorch lightning approach as I think it will make it easier/encourage people to use allennlp as a library.

matt-gardner · 2020-02-27T21:38:02Z

I think we should have both for now, until we have some evidence on what people prefer and what actually works better. This is not a Trainer subclass, it's an entirely separate way of training. We just have two different ways of doing things.

matt-gardner · 2020-02-27T21:40:34Z

And before you say, "two ways of doing things is confusing", I don't think so in this case, because we're not actually implementing a trainer here. We're implementing a shim that connects allennlp model and data code with the pytorch lightning trainer, just as our TrainModel shim connects allennlp model and data code with our trainer. Having these two options, and clearly explaining them, shows people that you can use whatever pieces of allennlp you want with whatever other libraries you want to use. We have a trainer that works, and has some allennlp-specific customizations. If you don't care about those, and want a different trainer, that's fine, here's how to do it.

matt-gardner · 2020-02-27T22:31:00Z

Another argument: we should definitely not make our library entirely dependent on another trainer. Because then if that other trainer decides to change things in a way that we're not happy with, we're in a pretty bad position, and our library is totally unusable. We should have a built-in trainer.

What this work is trying to do is demonstrate that our data and model code are also compatible with other trainers. I'm not really sure why this is a bad thing.

bryant1410 · 2020-02-27T23:56:12Z

What about doing it in another repo, that has both dependencies?

williamFalcon · 2020-03-02T21:17:53Z

cool that you guys are adding this! happy to help if needed.
feel free to join the lightning slack and ping any of us there.

https://join.slack.com/t/pytorch-lightning/shared_invite/enQtODU5ODIyNTUzODQwLTFkMDg5Mzc1MDBmNjEzMDgxOTVmYTdhYjA1MDdmODUyOTg2OGQ1ZWZkYTQzODhhNzdhZDA3YmNhMDhlMDY4YzQ

mikerossgithub · 2020-03-04T15:49:36Z

@matt-gardner I plan to take a shot at the details in the next few days. Should I submit as a separate PR or make a pull-request into this pull-request?

DeNeutoy · 2020-03-04T18:13:01Z

@mikerossgithub Matt is traveling at the moment, so i'd suggest a separate PR - we can merge them or close this one later. Thanks for contributing!

mikerossgithub · 2020-05-28T20:35:56Z

I checked in a version to the pytorch-lightning library. I'm not really sure which project the code belongs in, as it needs to remain compatible with both allennlp and pytorch-lightning

matt-gardner · 2020-05-28T21:12:21Z

Thanks @mikerossgithub, that's great! For the next week, we're pretty focused on EMNLP submissions and our 1.0 release, but I should be able to look at what you have and what else is left in a week or two. I'm fine with this code living in either place; @williamFalcon, any opinion on which repo should have it?

matt-gardner · 2020-06-17T23:27:50Z

Within the next week, I'm going to make a github template repository that gives an easy starting place for someone who wants to use pytorch lightning with allennlp data and model code. It'll be similar to this, but with the fixes I mentioned above. A template project seems like the right way to provide this kind of integration.

Given this planned work, I'm closing this PR.

williamFalcon · 2020-06-18T02:05:55Z

@mikerossgithub (@matt-gardner) Depends what you are looking for:

if it's just a demo then we can add to the lightning demos (which will be tested in our CI including on GPUs).
If it's more functional and meant to intermingle with other models, then bolts is the best place.
https://github.com/PyTorchLightning/pytorch-lightning-bolts.
Bolts is meant to be a collection of contributions built-in lightning (ie: allenAI can have its own path in bolts with your models, etc... including attribution and author information).

In terms of using lightning as a backbone, we're heavily used at many labs and companies. As such, we work closely with them to make sure they have what they need. If there's something specific you might want just ping me on our Slack.

The downsize of having a parallel trainer is that we are moving very fast and adding new features/testing/support which you'll likely miss out on. For instance we now support TPU training and are currently working on doing TPU testing on CI as well. Not to mention all the integrations and other partners that we have, in addition to the almost 200 contributors + 8 core members + staff members on the lightning team working around the clock to add new features and make the library more robust :)

tkornuta-nvidia · 2020-08-10T23:34:35Z

. It'll be similar to this, but with the fixes I mentioned above. A template project seems like the right way to provide this kind of integration.

Hi @matt-gardner any updates on this project?

williamFalcon · 2020-08-11T00:12:00Z

matt and I spoke a few weeks back. We’re allocating some resources to build the templates for the team. cc @nateraw

initial test, passing

2831a27

matt-gardner mentioned this pull request Feb 27, 2020

Consider removing CallbackTrainer #3519

Closed

sai-prasanna mentioned this pull request Feb 27, 2020

Evaluate Hydra for composing config and command line arguments Lightning-AI/pytorch-lightning#807

Closed

mikerossgithub mentioned this pull request Feb 29, 2020

Allennlp Support Lightning-AI/pytorch-lightning#987

Closed

mikerossgithub mentioned this pull request May 28, 2020

Adds AllenNlp lightning module Lightning-AI/pytorch-lightning#1993

Closed

mikerossgithub mentioned this pull request Jun 10, 2020

Preliminary Allennlp Support Lightning-Universe/lightning-bolts#34

Closed

matt-gardner closed this Jun 17, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Add a pytorch lightning trainer #3860

[WIP] Add a pytorch lightning trainer #3860

matt-gardner commented Feb 27, 2020

matt-gardner commented Feb 27, 2020

bryant1410 commented Feb 27, 2020

DeNeutoy commented Feb 27, 2020

matt-gardner commented Feb 27, 2020

matt-gardner commented Feb 27, 2020

matt-gardner commented Feb 27, 2020

bryant1410 commented Feb 27, 2020

williamFalcon commented Mar 2, 2020

mikerossgithub commented Mar 4, 2020

DeNeutoy commented Mar 4, 2020

mikerossgithub commented May 28, 2020

matt-gardner commented May 28, 2020

matt-gardner commented Jun 17, 2020

williamFalcon commented Jun 18, 2020 •

edited

Loading

tkornuta-nvidia commented Aug 10, 2020

williamFalcon commented Aug 11, 2020

[WIP] Add a pytorch lightning trainer #3860

[WIP] Add a pytorch lightning trainer #3860

Conversation

matt-gardner commented Feb 27, 2020

matt-gardner commented Feb 27, 2020

bryant1410 commented Feb 27, 2020

DeNeutoy commented Feb 27, 2020

matt-gardner commented Feb 27, 2020

matt-gardner commented Feb 27, 2020

matt-gardner commented Feb 27, 2020

bryant1410 commented Feb 27, 2020

williamFalcon commented Mar 2, 2020

mikerossgithub commented Mar 4, 2020

DeNeutoy commented Mar 4, 2020

mikerossgithub commented May 28, 2020

matt-gardner commented May 28, 2020

matt-gardner commented Jun 17, 2020

williamFalcon commented Jun 18, 2020 • edited Loading

tkornuta-nvidia commented Aug 10, 2020

williamFalcon commented Aug 11, 2020

williamFalcon commented Jun 18, 2020 •

edited

Loading