Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed two issues: SSIM evaluation and DDP subsampling bug #60

Closed
wants to merge 0 commits into from

Conversation

z-fabian
Copy link
Contributor

(1) There was a bug in validation_epoch_end in the MriModule where the slices were concatenated along a spatial dimension instead of a new slice dimension. evaluate.ssim operated on these single concatenated slices instead of volumes and therefore there was no averaging along the slice dimension. I also added a print to show validation metrics in command line output.

(2) The random seed of SliceDataset has not been explicitly set. Different processes during ddp training had different random seed. This led to different parts of the training data selected by different workers if sub_sample < 1. I set the random seed to all workers to 0, but we could also pass the random seed from the argparser if needed.

@facebook-github-bot
Copy link

Hi @z-fabian!

Thank you for your pull request and welcome to our community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file.

In order for us to review and merge your code, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA.

If you have received this in error or have any questions, please contact us at cla@fb.com. Thanks!

@facebook-github-bot facebook-github-bot added the CLA Signed Do not delete this pull request or issue due to inactivity. label Aug 13, 2020
@facebook-github-bot
Copy link

Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Facebook open source project. Thanks!

Copy link
Contributor

@mmuckley mmuckley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice find on the stack bug. It looks to be an error I introduced during the refactor. I'd like to merge that one right away and do the seed separately (maybe another PR). For the seed you could update the unet and varnet examples in experimental to use deterministic=True and seed_everything().

@@ -114,7 +114,7 @@ class SliceDataset(Dataset):
what fraction of the volumes should be loaded.
"""

def __init__(self, root, transform, challenge, sample_rate=1):
def __init__(self, root, transform, challenge, sample_rate=1, seed=0):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Random seeds should be done at the project/trainer level. See the following:

https://pytorch-lightning.readthedocs.io/en/latest/trainer.html#reproducibility

And this:

Lightning-AI/pytorch-lightning#1572

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So for example I should add seed_everything(seed_val) at the start of varnet.py and set deterministic = True for the trainer object?

Copy link
Contributor

@mmuckley mmuckley Aug 13, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. I don't know if it will fix your val/test discrepancy but it should make your experiment behave better.

The SSIM bug may be impacting things but I don't know if that will fix the discrepancy either. Would be interested in what you see, first.

@@ -126,6 +126,7 @@ def __init__(self, root, transform, challenge, sample_rate=1):

files = list(pathlib.Path(root).iterdir())
if sample_rate < 1:
random.seed(seed) # get the same files in every process
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See init arg comment.

Comment on lines 216 to 217
output = torch.stack([out for _, out in sorted(outputs[fname])], dim=0).numpy()
target = torch.stack([tgt for _, tgt in sorted(targets[fname])], dim=0).numpy()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice find on this one. You can remove the dim=0 here.

log_metrics = {
f"metrics/{metric}": values / tot_examples
for metric, values in metrics.items()
}
print(log_metrics)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove this - should not be default behavior.

@z-fabian
Copy link
Contributor Author

Okay, thanks for the feedback. I will make two separate PRs shortly for the two issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed Do not delete this pull request or issue due to inactivity.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants