-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Model validation code is not called #2351
Comments
As a note, the validation code is called twice when leaving out the |
Your dataset is wrong. That's causing the issue.
since your batch size and dataset length don't work out well. |
@williamFalcon Changing the dataset to produce 128 samples does not change this either: import pytorch_lightning as pl
import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.utils.data import DataLoader
class Dataset(torch.utils.data.IterableDataset):
def __init__(self):
super().__init__()
def __iter__(self):
def get_sample():
for _ in range(128):
yield torch.randn(20)
return get_sample()
def __len__(self):
return 128
class Model(pl.LightningModule):
def __init__(self):
super().__init__()
self.enc = nn.Linear(20, 10)
self.dec = nn.Linear(10, 20)
def forward(self, x):
x = self.enc(x)
x = F.relu(x)
x = self.dec(x)
return x
def training_step(self, batch, batchIdx):
x = self.forward(batch)
return {'loss': torch.mean(x)}
def validation_step(self, batch, batchIdx):
raise NotImplementedError()
x = self.forward(batch)
return {'val_loss': torch.mean(x)}
def validation_epoch_end(self, outputs):
return {'val_loss': torch.mean(torch.stack([x['val_loss'] for x in outputs]))}
def configure_optimizers(self):
return torch.optim.AdamW(self.parameters())
if __name__ == '__main__':
trainer = pl.Trainer(num_sanity_val_steps=0)
net = Model()
dataset = Dataset()
trainer.fit(net, train_dataloader=DataLoader(dataset, batch_size=8, num_workers=0), val_dataloaders=DataLoader(dataset, batch_size=8, num_workers=0))
|
Also changing the |
Same. I have to set the validation to a minimal number... I can't train on all the data and leave validation out when training with TPU, it hangs after first epoch. To narrow it down, this is a problem with TPU because it works fine with CPU and GPU. |
Same problem here. I use |
@Steve-Tod Can you check the spelling on your |
Any update on this? |
@SamPusegaonkar What's your issue? Cannot reproduce the behaviour in the description of this issue. Could you create another issue since this is quite outdated? |
#13726 (reply in thread) |
Nevermind. Fixed my error - I wasn't passing a validation dataloader. That said, I think that if someone has a validation function but no validation dataloader passed, surely an error should be thrown? |
I my case, Lightning skip validation when I set a |
Thanks @lee-junjie! It helps me debug my issues. It's annoying because it doesn't throw any warning. |
This inspired me to solve my issue! Turned out that my length of my customized sampler is longer than actual batches. After I corrected the len() of my batchsampler, it works!!!! |
🐛 Bug
My defined methods for
validation_step
as well asvalidation_epoch_end
do not seem to get called.To Reproduce
Just call the provided code sample. Python should show the
NotImplementedError
. Instead the model completes 'successfully'.Code sample
Expected behavior
Instead, I'd expect the code sample above to fail.
Environment
The text was updated successfully, but these errors were encountered: