Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable any ML experiment tracking framework #47

Closed
williamFalcon opened this issue Aug 6, 2019 · 22 comments · Fixed by #223
Closed

Enable any ML experiment tracking framework #47

williamFalcon opened this issue Aug 6, 2019 · 22 comments · Fixed by #223
Labels
feature Is an improvement or enhancement help wanted Open to be worked on
Milestone

Comments

@williamFalcon
Copy link
Contributor

People seem to have strong preferences for either using MLFlow, test-tube, polyaxon, etc...

Let's just add generic support for whatever people want to use. I don't know if generic support is possible, but each can easily be supported individually.

To make this work we'd need to:

  • change the logging to be non test-tube specific. Logging only happens in 2 places (train and validation completion).
  • Each call to log needs to be process-safe. Meaning when using distributed only rank=0 will log.
  • the experiment param in Trainer will need to be generalized (signature the same), to take any logger.

I think that's all that's needed to add this support.

Any suggestions and takers for working on this integration?
@Borda @alok

@williamFalcon williamFalcon changed the title Enable any ML exp framework Enable any ML experiment tracking framework Aug 6, 2019
@williamFalcon williamFalcon added feature Is an improvement or enhancement help wanted Open to be worked on labels Aug 6, 2019
@williamFalcon
Copy link
Contributor Author

I also wonder if SlurmManager should just be brought into lightning. Thoughts?

@Borda
Copy link
Member

Borda commented Aug 6, 2019

So far I see, all used methods are following

from test_tube import HyperOptArgumentParser, Experiment, SlurmCluster

I would remove the test_tube dependence from pytorch_lightning.testing.lm_test_module so the pytorch_lightning package will be independent on these additional frameworks and then I would even keep this test_tube for testing/examples...
And instead HyperOptArgumentParser I would use standard/basic python argparse
https://docs.python.org/3/library/argparse.html

@williamFalcon
Copy link
Contributor Author

There are also these lines (which need to be generic for each logger (but seems hard seeing they all have a different interface). So, maybe have a function that calls the appropriate thing.

here, here, here, here (basically search for self.experiment).

I kind of like the idea of limiting support for a few things here so that people aren't lost in what they can use. We can support many different ones, but would be nice in the docs if people were told which ones they can choose from.

A big help that lightning provides is helping people who don't know what they don't know. For instance some people not in CS/ML have never heard of tensorboard. But if they see it listed here, they'll know they can get the support for free if they use it.

Also remember that the experiment object in test-tube is a subclass of PyTorch SummaryWriter.

@williamFalcon
Copy link
Contributor Author

So far I see, all used methods are following

from test_tube import HyperOptArgumentParser, Experiment, SlurmCluster

I would remove the test_tube dependence from pytorch_lightning.testing.lm_test_module so the pytorch_lightning package will be independent on these additional frameworks and then I would even keep this test_tube for testing/examples...
And instead HyperOptArgumentParser I would use standard/basic python argparse
https://docs.python.org/3/library/argparse.html

HyperargumentParser is a subclass of Argparse. It's the same thing except it adds more features for hyperparameter optimization.

@williamFalcon
Copy link
Contributor Author

Most people also use ArgumentParser to pass in args to their models. Thus, this is the standard way of doing it in lightning. Remove the choices for things that should be standard.

A core idea that I like about Lightning is that it removes the unnecessary choices you might need to make. So it's highly opinionated about standard practices. (ie: to log we all do this, to viz we all do this, to grid search we all do this, etc...). Except instead of doing something a million ways, there's one way.

@Borda
Copy link
Member

Borda commented Aug 6, 2019

sure, I was thinking about having it really lightweight so even drop or move the arg parser class to this framework... about the logging, what about have one more parameter like log_callback which will be used if it is specified... and then in examples you just keep your proposal with using test-tube, but if someone come and want to use anything else, he just change the callback function... I think that Keras is also using these callbacks in a similar way...

@williamFalcon
Copy link
Contributor Author

williamFalcon commented Aug 6, 2019

sure, I was thinking about having it really lightweight so even drop or move the arg parser class to this framework... about the logging, what about have one more parameter like log_callback which will be used if it is specified... and then in examples you just keep your proposal with using test-tube, but if someone come and want to use anything else, he just change the callback function... I think that Keras is also using these callbacks in a similar way...

Good suggestion, however I think that's just adding another thing for the user to remember. I'd rather them just plug in the logger and it work.

So, in the meantime, why don't we just internally call the correct function depending on the logger?

something like:

def log(logger, metrics):
    package_name = logger.__module__.split('.')[0]
   if type(logger) is 'test_tube':
       logger.ml_flow_way_of_logging(metrics)

And replace the lines of

self.experiment.log(x)

# replace with
self.log(logger, x)

Just tried this:

(ddt) comp:~ user$ python
Python 3.7.3 (default, Mar 27 2019, 16:54:48)
[Clang 4.0.1 (tags/RELEASE_401/final)] :: Anaconda, Inc. on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from test_tube import Experiment
WARNING:root:This caffe2 python run does not have GPU support. Will run in CPU only mode.
>>> Experiment.__module__
'test_tube.log'
>>>

@karanchahal
Copy link

karanchahal commented Aug 6, 2019

I'd like to pick one of the frameworks up, MLflow if it's isn't already taken. I'll look at it tomorrow morning and try to figure out how it can be integrated. I'm a bit confused though, can't tensorboard be used to track experiments and different model runs?

Also, could lighting add a feature for quantisation or pruning ?

Maybe expose an API so that people can try out different quantisation and pruning strategies to compress their Network.

I recently was able to quantise a fp32 model to 8 bits ( only the fully connected and conv layers) with next to no loss in accuracy, this was an MNIST network. But I'm following and using the same algos the Tensorflow lite people are using. I can share a Google colab notebook of a sample implementation of you think it's a viable thing for this project.

@williamFalcon
Copy link
Contributor Author

@karanchahal I agree with you... most people I know use tensorboard, but happy to allow this to be flexible. There's no real need to force an opinion on this particular thing from the framework.

I think @Borda is probably going to look at it (but we can defonflict here if you want to look at it).

However, a way of doing network pruning and quantisation would be good. I don't know those in details, so maybe make a proposal and what the use cases are for? (different issue with enhancement tag)

@Borda
Copy link
Member

Borda commented Aug 6, 2019

I would keep #44 simple just adding CI, so I would leave the resolving test-tube to another PR/user 😄

@williamFalcon
Copy link
Contributor Author

williamFalcon commented Aug 7, 2019

@karanchahal feel free to give this a shot!
link to this issue

@williamFalcon williamFalcon added this to the 3.7.1 milestone Aug 8, 2019
@williamFalcon
Copy link
Contributor Author

@karanchahal still interested in making these improvements?

@karanchahal
Copy link

karanchahal commented Aug 10, 2019 via email

@neggert
Copy link
Contributor

neggert commented Sep 6, 2019

Alright, started taking a look at this. I think it's going to be more straightforward than I initially thought. I'd like some feedback on a proposal though, before I start working on it.

I'd like to create a base class for experiment loggers. Loggers for individual tracking frameworks would inherit from this base class. The particular logger that a user wants can be passed to the Trainer constructor, just like the test-tube experiment is today.

Proposed base class:

class LightningLoggerBase:

    def log_metrics(self, metrics_dict, epoch_num):
        pass
    
    def log_hyperparams(self, param_dict):
        pass
    
    def save(self):
        pass

Implementation of a logger for a particular tracking framework would look like this:

class MLFlowLogger(LightningLoggerBase):
    ...

And using it would be straightforward

from pytorch_lightning.loggers import MLFlowLogger

logger = MLFlowLogger(...)
trainer = pl.Trainer(logger=logger, ...)
trainer.fit(model)

Any feedback on this? In particular, I'm wondering if I'm missing methods that LightningLoggerBase should have. Off the top of my head, do we want properties for experiment_name and experiment_version? Or perhaps an optional method to save a model or other artifact?

I'm also wondering what we do with the existing test-tube integration. We could rip the existing stuff out and replace it with a test-tube logger that inherits from LightingLoggerBase. Or we could leave it in, since it's optional, and add other frameworks using the new interface. Thoughts?

@williamFalcon
Copy link
Contributor Author

williamFalcon commented Sep 6, 2019

beautiful solution. then we don't have to wrap them, each user can wrap their own?

I think we also move these options to this logger class from trainer:

log_save_interval=100,
add_log_row_interval=10,

Because if no logger is passed in, these flags won't be used. So, makes sense to define them in the logger.

The root experiment also needs:

  • rank property (to make sure it only logs from process 0 in ddp mode).
  • get_non_ddp_exp() - when launching ddp, PT pickles everything. if a logger isn't picklable ddp won't work. either we let those loggers fail (ie: no use) or we allow user to create a pickle safe version. Test tube exp has the following meta data class:
class DDPExperiment(object):
    def __init__(
        self,
        exp
    ):
        """
        Used as meta_data storage if the experiment needs to be pickled
        :param name:
        :param debug:
        :param version:
        :param save_dir:
        :param autosave:
        :param description:
        :param create_git_tag:
        :param args:
        :param kwargs:
        """

        self.tag_markdown_saved = exp.tag_markdown_saved
        self.no_save_dir = exp.no_save_dir
        self.metrics = exp.metrics
        self.tags = exp.tags
        self.name = exp.name
        self.debug = exp.debug
        self.version = exp.version
        self.autosave = exp.autosave
        self.description = exp.description
        self.create_git_tag = exp.create_git_tag
        self.exp_hash = exp.exp_hash
        self.created_at = exp.created_at
        self.save_dir = exp.save_dir


    def get_non_ddp_exp(self):
        return Experiment(
            name=self.name,
            debug=self.debug,
            version=self.version,
            save_dir=self.save_dir,
            autosave=self.autosave,
            description=self.description,
            create_git_tag=self.create_git_tag
        )

Experiment name and version

A nice thing about the test-tube logger is that you get a nice organization of experiments each with a version and a set of hyperparams saved in meta_tags.csv.

This should just be provided by the core class which people use to wrap up their experiments. We can add a flag to turn this off in the case where an MLTracker already does this.

I think those are all the main points we need to account for. If you have a cleaner way of handling the above needs we can go with that too. These are just points to account for.

@alok
Copy link
Contributor

alok commented Sep 6, 2019

@neggert For the test tube logger, I think that it would be best to make it just another logger and remove the special-casing it has right now. Since this is meant to become more generic in the PyTorch ecosystem, I think making it more pluggable is a better idea.

@neggert
Copy link
Contributor

neggert commented Sep 9, 2019

I was envisioning that Lightning could provide loggers for some of the more common frameworks (maybe Test Tube, MLFlow, and polyaxon?). Of course, users would be free to write their own as well.

I'm not sure I agree about log_save_interval and add_log_row_interval. Those control how often the trainer calls the logger methods. IMO they should stay in the trainer, but I don't feel strongly about it. On a similar note, IMO the Trainer should be responsible for logging only from the rank 0 process.

What were you running into that made test-tube Experiments non-pickleable? I would have guessed that it would just work.

@williamFalcon
Copy link
Contributor Author

williamFalcon commented Sep 9, 2019

process = 0 would have to be inside the abstraction.

the user will for sure call self.logger(...) at any point in their code. they won’t know what process it is and it will also get annoying adding an if statement every time. more room for user error.

we can keep the logging interval stuff in trainer for now.

@neggert
Copy link
Contributor

neggert commented Sep 10, 2019

Ah, okay. Makes sense. I was thinking that the user wouldn't ever call the logger themselves, but I guess there's no way to stop them :P.

Any insight into what was causing the problems with pickling that you ran into? It's not clear to me if this is a problem we'll run into frequently, or if there's something specific to Test Tube that causes it.

@williamFalcon
Copy link
Contributor Author

i don’t remember if it was test tube itself. i think it’s PyTorch summarywriter. i didn’t spend too much time debugging because the PT code is a bit messy there.

But i didn’t spend too much time on that, however, it might be helpful to try a quick pickle of an experiment instance.

@neggert
Copy link
Contributor

neggert commented Sep 10, 2019

Does it make sense to do #183 along the way? This would let users load models from whatever logger they want.

I'm imagining adding a couple methods to LightningLoggerBase:

    def get_hyperparams(self):
        pass
    
    def save_weights(self, trainer, filename):
        pass
    
    def get_weights_path(self, filename):
        pass

Then a model can be loaded like so:

    new_hparams = logger.get_hyperparams()
    new_weights_path = logger.get_weights_path("save_test.ckpt")
    model_2 = LightningTemplateModel.load(new_weights_path, new_hparams, on_gpu=False)

much of the logic in the existing load_from_metrics would get moved into TestTubeLogger.get_hyperparams.

@williamFalcon
Copy link
Contributor Author

i think the trainer should be the one to save hyperparameters. that will allow loading when people don’t use a logger.

same for weights.

let’s do the .load separately because we need to think through it. On one hand it is more flexible to allow users to load using arbitrary args, but it will break reproducibility. So, most likely i think we should force users to use the hyperparams to restore the model. this means the trainer should save hyperparams automatically instead of the logger. This will all be part of the style guide I’m creating.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature Is an improvement or enhancement help wanted Open to be worked on
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants