Fixed two issues: SSIM evaluation and DDP subsampling bug #60

z-fabian · 2020-08-13T02:21:41Z

(1) There was a bug in validation_epoch_end in the MriModule where the slices were concatenated along a spatial dimension instead of a new slice dimension. evaluate.ssim operated on these single concatenated slices instead of volumes and therefore there was no averaging along the slice dimension. I also added a print to show validation metrics in command line output.

(2) The random seed of SliceDataset has not been explicitly set. Different processes during ddp training had different random seed. This led to different parts of the training data selected by different workers if sub_sample < 1. I set the random seed to all workers to 0, but we could also pass the random seed from the argparser if needed.

facebook-github-bot · 2020-08-13T02:21:54Z

Hi @z-fabian!

Thank you for your pull request and welcome to our community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file.

In order for us to review and merge your code, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA.

If you have received this in error or have any questions, please contact us at cla@fb.com. Thanks!

facebook-github-bot · 2020-08-13T02:35:38Z

Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Facebook open source project. Thanks!

mmuckley

Nice find on the stack bug. It looks to be an error I introduced during the refactor. I'd like to merge that one right away and do the seed separately (maybe another PR). For the seed you could update the unet and varnet examples in experimental to use deterministic=True and seed_everything().

mmuckley · 2020-08-13T02:57:47Z

fastmri/data/mri_data.py

@@ -114,7 +114,7 @@ class SliceDataset(Dataset):
            what fraction of the volumes should be loaded.
    """

-    def __init__(self, root, transform, challenge, sample_rate=1):
+    def __init__(self, root, transform, challenge, sample_rate=1, seed=0):


Random seeds should be done at the project/trainer level. See the following:

https://pytorch-lightning.readthedocs.io/en/latest/trainer.html#reproducibility

And this:

Lightning-AI/pytorch-lightning#1572

So for example I should add seed_everything(seed_val) at the start of varnet.py and set deterministic = True for the trainer object?

Yeah. I don't know if it will fix your val/test discrepancy but it should make your experiment behave better.

The SSIM bug may be impacting things but I don't know if that will fix the discrepancy either. Would be interested in what you see, first.

mmuckley · 2020-08-13T02:57:59Z

fastmri/data/mri_data.py

@@ -126,6 +126,7 @@ def __init__(self, root, transform, challenge, sample_rate=1):

        files = list(pathlib.Path(root).iterdir())
        if sample_rate < 1:
+            random.seed(seed) # get the same files in every process


See init arg comment.

mmuckley · 2020-08-13T02:58:23Z

fastmri/mri_module.py

+            output = torch.stack([out for _, out in sorted(outputs[fname])], dim=0).numpy()
+            target = torch.stack([tgt for _, tgt in sorted(targets[fname])], dim=0).numpy()


Nice find on this one. You can remove the dim=0 here.

mmuckley · 2020-08-13T02:58:52Z

fastmri/mri_module.py

        log_metrics = {
            f"metrics/{metric}": values / tot_examples
            for metric, values in metrics.items()
        }
+        print(log_metrics)


Remove this - should not be default behavior.

z-fabian · 2020-08-13T16:53:19Z

Okay, thanks for the feedback. I will make two separate PRs shortly for the two issues.

facebook-github-bot added the CLA Signed Do not delete this pull request or issue due to inactivity. label Aug 13, 2020

mmuckley requested changes Aug 13, 2020

View reviewed changes

mmuckley mentioned this pull request Aug 13, 2020

Varnet training default parameters #57

Closed

z-fabian closed this Aug 13, 2020

z-fabian force-pushed the master branch from 560e583 to 955b6a6 Compare August 13, 2020 17:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixed two issues: SSIM evaluation and DDP subsampling bug #60

Fixed two issues: SSIM evaluation and DDP subsampling bug #60

z-fabian commented Aug 13, 2020

facebook-github-bot commented Aug 13, 2020

facebook-github-bot commented Aug 13, 2020

mmuckley left a comment •

edited

Loading

mmuckley Aug 13, 2020

asaksena98 Aug 13, 2020

mmuckley Aug 13, 2020 •

edited

Loading

mmuckley Aug 13, 2020

mmuckley Aug 13, 2020

mmuckley Aug 13, 2020

z-fabian commented Aug 13, 2020

		output = torch.stack([out for _, out in sorted(outputs[fname])], dim=0).numpy()
		target = torch.stack([tgt for _, tgt in sorted(targets[fname])], dim=0).numpy()

Fixed two issues: SSIM evaluation and DDP subsampling bug #60

Fixed two issues: SSIM evaluation and DDP subsampling bug #60

Conversation

z-fabian commented Aug 13, 2020

facebook-github-bot commented Aug 13, 2020

facebook-github-bot commented Aug 13, 2020

mmuckley left a comment • edited Loading

Choose a reason for hiding this comment

mmuckley Aug 13, 2020

Choose a reason for hiding this comment

asaksena98 Aug 13, 2020

Choose a reason for hiding this comment

mmuckley Aug 13, 2020 • edited Loading

Choose a reason for hiding this comment

mmuckley Aug 13, 2020

Choose a reason for hiding this comment

mmuckley Aug 13, 2020

Choose a reason for hiding this comment

mmuckley Aug 13, 2020

Choose a reason for hiding this comment

z-fabian commented Aug 13, 2020

mmuckley left a comment •

edited

Loading

mmuckley Aug 13, 2020 •

edited

Loading