5-Fold with PyTorchLightning + Wandb seems to log to the same experiment #8614

tchaton · 2021-07-29T09:44:27Z

I am training 5-fold CV with PyTorch Lightning in a for loop. I am also logging all the results to wandb. I want wanbd to reinitalize the run after each fold, but it seems to continue with the same run and it logs all the results to the same run. I also tried passing kwargs in the WandbLogger as mentioned in the docs here, with no luck.
Here's a pseudo code of it:

def run(fold):
    kwargs = {
        "reinit": True,
        "group": f"{CFG['exp_name']}"
    }
    wandb_logger = WandbLogger(project='<name>', 
                        entity='<entity>', 
                        config = CFG,
                        name=f"fold_{fold}",
                        **kwargs
            )
    trainer = Trainer(
        precision=16,
        gpus=1,
        fast_dev_run=False,
        callbacks = [checkpoint_callback],
        logger=wandb_logger,
        progress_bar_refresh_rate=1,
        max_epochs=2,
        log_every_n_steps=1

    )

    trainer.fit(
        lit_model,
        data_module
    )

if __name__ == "__main__":
    for fold in range(5):
        run(fold)

Originally posted by @Gladiator07 in #8572

tchaton · 2021-07-29T09:45:08Z

Dear @Gladiator07,

I converted this to an issue as it should be created several runs.

Best,
T.C

awaelchli · 2021-07-29T11:44:43Z

@Gladiator07 I think I have a workaround for you. Put this

import wandb
wandb.finish()

before instantiating WandbLogger.

This will make sure that the experiment from the previous "fold" gets finished. For context, our WandbLogger is simply wrapping the wandb.Run object which is sort of a global variable in wandb according to my understanding. I will try to make this into a real fix for our WandbLogger. Any feedback appreciated. Maybe @borisdayma has another idea :)

awaelchli · 2021-07-29T12:12:06Z

Actually, just found this PR which enforced the current behavior and also defines tests for this. #4648
As written on the PR users are advised to call wandb.finish() when they want to manually create multiple distinct experiments as OP wants. So what I presented as workaround is meant to be the real usage.

borisdayma · 2021-07-29T15:11:34Z

Yes this is perfectly correct @awaelchli
In some cases the users could want to use the same run, for example when training in multiple stages.

justusschock · 2021-07-30T06:17:00Z

@borisdayma @awaelchli could we say that one run corresponds to one logger object? So when training multiple stages you just reuse the logger object and if recreating that, you just get a new run?

borisdayma · 2021-08-01T14:59:44Z

So the main issue is that users could have a run already created before, even without the logger: using sweeps or for example if they use an artifact (like a previous checkpoint logged).

What do you think about adding a warning when a run is existing saying that we will be using the same one and that they can manually call wandb.finish()

awaelchli · 2021-08-03T10:25:43Z

@borisdayma I like that.

Perhaps my PR #8617 adding the finish() method should be closed as it is confusing to have finish() alongside finalize() and close() methods.

borisdayma · 2021-08-04T04:01:04Z

Yes I think I would just close it.
Basically we just need to change this line to add a warning.

awaelchli · 2021-08-04T10:20:51Z

See the new PR attached. Let me know what you think of this

borisdayma · 2021-08-04T14:27:21Z

I like it!

tchaton added the good first issue Good for newcomers label Jul 29, 2021

awaelchli mentioned this issue Jul 29, 2021

add finish method to WandbLogger #8617

Closed

11 tasks

awaelchli added the logger Related to the Loggers label Jul 29, 2021

awaelchli mentioned this issue Aug 4, 2021

Add warning when wandb.run already exists #8714

Merged

11 tasks

awaelchli closed this as completed in #8714 Aug 10, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

5-Fold with PyTorchLightning + Wandb seems to log to the same experiment #8614

5-Fold with PyTorchLightning + Wandb seems to log to the same experiment #8614

tchaton commented Jul 29, 2021

tchaton commented Jul 29, 2021

awaelchli commented Jul 29, 2021

awaelchli commented Jul 29, 2021

borisdayma commented Jul 29, 2021

justusschock commented Jul 30, 2021

borisdayma commented Aug 1, 2021

awaelchli commented Aug 3, 2021

borisdayma commented Aug 4, 2021

awaelchli commented Aug 4, 2021

borisdayma commented Aug 4, 2021

5-Fold with PyTorchLightning + Wandb seems to log to the same experiment #8614

5-Fold with PyTorchLightning + Wandb seems to log to the same experiment #8614

Comments

tchaton commented Jul 29, 2021

tchaton commented Jul 29, 2021

awaelchli commented Jul 29, 2021

awaelchli commented Jul 29, 2021

borisdayma commented Jul 29, 2021

justusschock commented Jul 30, 2021

borisdayma commented Aug 1, 2021

awaelchli commented Aug 3, 2021

borisdayma commented Aug 4, 2021

awaelchli commented Aug 4, 2021

borisdayma commented Aug 4, 2021