Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LightningCLI: Support config files on object stores #7360

Closed
leezu opened this issue May 4, 2021 · 13 comments · Fixed by #7521
Closed

LightningCLI: Support config files on object stores #7360

leezu opened this issue May 4, 2021 · 13 comments · Fixed by #7521
Labels
argparse (removed) Related to argument parsing (argparse, Hydra, ...) feature Is an improvement or enhancement help wanted Open to be worked on

Comments

@leezu
Copy link
Contributor

leezu commented May 4, 2021

🚀 Feature

In addition to python trainer.py --config config.yaml, LightningCLI should enable python trainer.py --config s3://bucket/config.yaml

Motivation

pytorch-lightning has good support for remote filesystems thanks to #2164 #3320 (and possibly others). Thanks to that, it's straightforward to deploy lightning-based scripts to compute cluster environments without having to write extra code preparing / post-processing the local file-systems https://pytorch-lightning.readthedocs.io/en/latest/clouds/cluster.html It would be nice to extend this to LightningCLI.

When using LightningCLI however, it's currently necessary to deploy the config.yaml file to each node, instead of referencing a copy in an object store such as S3.

cc @mauvilsa?

@leezu leezu added feature Is an improvement or enhancement help wanted Open to be worked on labels May 4, 2021
@leezu leezu changed the title LightningCLI: Support config files on remote filesystems LightningCLI: Support config files on object stores May 4, 2021
@mauvilsa
Copy link
Contributor

mauvilsa commented May 5, 2021

@leezu I think this is a good idea, will look into it.

@edgarriba
Copy link
Contributor

@leezu @mauvilsa thanks for bringing this up. Please, let us know if you need any support with the CLI integration /cc @carmocca

@carmocca carmocca added the argparse (removed) Related to argument parsing (argparse, Hydra, ...) label May 5, 2021
@mauvilsa
Copy link
Contributor

mauvilsa commented May 6, 2021

Had a quick look at this and I have the following thoughts/doubts:

  • The idea would be to use fsspec, which means this should be implemented in such a way that any protocol registered in fsspec in principle could work.
  • The fsspec filesystems might require additional settings, e.g. key, secret, token, etc. for s3fs. Having just s3://bucket/config.yaml is not enough in common use cases. How would these additional settings be specified?
  • I see that fsspec is a requirement of lightning. But in the tests I only see simple http loading, e.g. load_from_checkpoint, nothing more complex like s3 or non-url based additional options.

@carmocca
Copy link
Contributor

carmocca commented May 6, 2021

The idea would be to use fsspec

Do you think this should be done here or in upstream jsonargparse?

How would these additional settings be specified?

I guess environment variables are the best option here

nothing more complex like s3 or non-url based additional options.

Not sure if it's worth for us to have those tests instead of just trusting upstream fsspec

@leezu
Copy link
Contributor Author

leezu commented May 6, 2021

The fsspec filesystems might require additional settings, e.g. key, secret, token, etc. for s3fs. Having just s3://bucket/config.yaml is not enough in common use cases. How would these additional settings be specified?

I guess environment variables are the best option here

Manually setting environment variables is possible, but may not be the so convenient :) If your code is running within an AWS network (eg. EC2), the credentials can be automatically retrieved (assuming your instance is configured to have access to the bucket in question) and fsspec will "just work". See https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/iam-roles-for-amazon-ec2.html#instance-metadata-security-credentials

@mauvilsa
Copy link
Contributor

mauvilsa commented May 6, 2021

The idea would be to use fsspec

Do you think this should be done here or in upstream jsonargparse?

jsonargparse already supports http/https for the config, though requires optional packages and be explicitly enabled. The fsspec support should be added extending that. Here what we can do is enable fsspec by default, since it can be assumed that fsspec is installed.

nothing more complex like s3 or non-url based additional options.

Not sure if it's worth for us to have those tests instead of just trusting upstream fsspec

I am interested in looking at code and tests using fsspec. What I was able to find in lightning is not enough to know how to best do this. I have not used fsspec. It looks very powerful but the documentation seems to lack a bit.

The fsspec filesystems might require additional settings, e.g. key, secret, token, etc. for s3fs. Having just s3://bucket/config.yaml is not enough in common use cases. How would these additional settings be specified?

I guess environment variables are the best option here

Manually setting environment variables is possible, but may not be the so convenient :) If your code is running within an AWS network (eg. EC2), the credentials can be automatically retrieved (assuming your instance is configured to have access to the bucket in question) and fsspec will "just work". See https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/iam-roles-for-amazon-ec2.html#instance-metadata-security-credentials

That is nice. Though I think the scope of this shouldn't be to make it work only for s3 on aws. If someone wants to use this in some private cluster which uses a minio server as storage, would fsspec "just work"? Or what if someone wants to use some other file system supported by fsspec? How would users know how to do this or that this is possible? The documentation in lightning and in fsspec is not enough. Or maybe I missed where these things are explained better.

@leezu when you say "environment variables is possible" do you mean that we could implement it like this, or fsspec already supports environment variables? Even if we initially implement it so that it works only for s3 on aws it would be good to have an idea how it can be extended without needing completely change the original implementation.

@leezu
Copy link
Contributor Author

leezu commented May 6, 2021

when you say "environment variables is possible" do you mean that we could implement it like this, or fsspec already supports environment variables? Even if we initially implement it so that it works only for s3 on aws it would be good to have an idea how it can be extended without needing completely change the original implementation.

fsspec just calls the AWS SDK for Python which then attempts to find credentials. Above example just serves as documentation to understand how the SDK does that. Sorry if it was confusing. The takeaway is that you don't need to worry about authentication in jsonargparse/pytorch-lightning and that each of the fsspec backends should have their own ways (via environment variables or other features) to ensure authentication is taken care of.

@mauvilsa
Copy link
Contributor

mauvilsa commented May 7, 2021

I have implemented the fsspec support in jsonargparse. @leezu would you mind trying it out with LightningCLI and an s3 bucket? To make it work would be installing jsonargparse from the branch fsspec-support and then in the cli tool add

from jsonargparse import set_config_read_mode
set_config_read_mode(fsspec_enabled=True)

@leezu
Copy link
Contributor Author

leezu commented May 7, 2021

Thank you @mauvilsa. It works great!

@edenlightning
Copy link
Contributor

Closing this issue, feel free to reopen if there's anything to be done on the lightning front.

@leezu
Copy link
Contributor Author

leezu commented May 9, 2021

@edenlightning it'd be helpful for lightning to call

from jsonargparse import set_config_read_mode
set_config_read_mode(fsspec_enabled=True)

as lightning already depends on fsspec

@carmocca
Copy link
Contributor

@leezu Want to open a PR with it?

@mauvilsa
Copy link
Contributor

There isn't yet a jsonargparse release with the fsspec support. When this gets released I can create the PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
argparse (removed) Related to argument parsing (argparse, Hydra, ...) feature Is an improvement or enhancement help wanted Open to be worked on
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants