Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

5-Fold with PyTorchLightning + Wandb seems to log to the same experiment #8614

Closed
tchaton opened this issue Jul 29, 2021 · 10 comments · Fixed by #8714
Closed

5-Fold with PyTorchLightning + Wandb seems to log to the same experiment #8614

tchaton opened this issue Jul 29, 2021 · 10 comments · Fixed by #8714
Labels
good first issue Good for newcomers logger Related to the Loggers

Comments

@tchaton
Copy link
Contributor

tchaton commented Jul 29, 2021

I am training 5-fold CV with PyTorch Lightning in a for loop. I am also logging all the results to wandb. I want wanbd to reinitalize the run after each fold, but it seems to continue with the same run and it logs all the results to the same run. I also tried passing kwargs in the WandbLogger as mentioned in the docs here, with no luck.
Here's a pseudo code of it:

def run(fold):
    kwargs = {
        "reinit": True,
        "group": f"{CFG['exp_name']}"
    }
    wandb_logger = WandbLogger(project='<name>', 
                        entity='<entity>', 
                        config = CFG,
                        name=f"fold_{fold}",
                        **kwargs
            )
    trainer = Trainer(
        precision=16,
        gpus=1,
        fast_dev_run=False,
        callbacks = [checkpoint_callback],
        logger=wandb_logger,
        progress_bar_refresh_rate=1,
        max_epochs=2,
        log_every_n_steps=1

    )

    trainer.fit(
        lit_model,
        data_module
    )

if __name__ == "__main__":
    for fold in range(5):
        run(fold)

Originally posted by @Gladiator07 in #8572

@tchaton tchaton added the good first issue Good for newcomers label Jul 29, 2021
@tchaton
Copy link
Contributor Author

tchaton commented Jul 29, 2021

Dear @Gladiator07,

I converted this to an issue as it should be created several runs.

Best,
T.C

@awaelchli
Copy link
Contributor

@Gladiator07 I think I have a workaround for you. Put this

import wandb
wandb.finish()

before instantiating WandbLogger.

This will make sure that the experiment from the previous "fold" gets finished. For context, our WandbLogger is simply wrapping the wandb.Run object which is sort of a global variable in wandb according to my understanding. I will try to make this into a real fix for our WandbLogger. Any feedback appreciated. Maybe @borisdayma has another idea :)

@awaelchli
Copy link
Contributor

Actually, just found this PR which enforced the current behavior and also defines tests for this. #4648
As written on the PR users are advised to call wandb.finish() when they want to manually create multiple distinct experiments as OP wants. So what I presented as workaround is meant to be the real usage.

@borisdayma
Copy link
Contributor

Yes this is perfectly correct @awaelchli
In some cases the users could want to use the same run, for example when training in multiple stages.

@awaelchli awaelchli added the logger Related to the Loggers label Jul 29, 2021
@justusschock
Copy link
Member

@borisdayma @awaelchli could we say that one run corresponds to one logger object? So when training multiple stages you just reuse the logger object and if recreating that, you just get a new run?

@borisdayma
Copy link
Contributor

So the main issue is that users could have a run already created before, even without the logger: using sweeps or for example if they use an artifact (like a previous checkpoint logged).

What do you think about adding a warning when a run is existing saying that we will be using the same one and that they can manually call wandb.finish()

@awaelchli
Copy link
Contributor

@borisdayma I like that.

Perhaps my PR #8617 adding the finish() method should be closed as it is confusing to have finish() alongside finalize() and close() methods.

@borisdayma
Copy link
Contributor

Yes I think I would just close it.
Basically we just need to change this line to add a warning.

@awaelchli
Copy link
Contributor

See the new PR attached. Let me know what you think of this

@borisdayma
Copy link
Contributor

I like it!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers logger Related to the Loggers
Projects
None yet
4 participants