RuntimeError: OrderedDict mutated during iteration #2281

Kshitij09 · 2020-06-19T20:22:20Z

🐛 Bug

I was getting RuntimeError: OrderedDict mutated during iteration.
It seems like using the same LightningModule object with ModelSummary and Trainer causes this error.

To Reproduce

from pytorch_lightning.core.memory import ModelSummary
model = CifarNet() # any pl module would work here
ModelSummary(model,mode='full')
trainer = Trainer(fast_dev_run=True,gpus=1)
trainer.fit(model)

Steps to reproduce the behavior:

View model summary using ModelSummary class
Call trainer.fit with same object.

Stacktrace

RuntimeError                              Traceback (most recent call last)
<ipython-input-20-8badc092c0ba> in <module>()
      1 # Checking for errors
      2 trainer = Trainer(fast_dev_run=True,gpus=1)
----> 3 trainer.fit(model)
11 frames
/usr/local/lib/python3.6/dist-packages/pytorch_lightning/trainer/trainer.py in fit(self, model, train_dataloader, val_dataloaders)
    916 
    917         elif self.single_gpu:
--> 918             self.single_gpu_train(model)
    919 
    920         elif self.use_tpu:  # pragma: no-cover
/usr/local/lib/python3.6/dist-packages/pytorch_lightning/trainer/distrib_parts.py in single_gpu_train(self, model)
    174             self.reinit_scheduler_properties(self.optimizers, self.lr_schedulers)
    175 
--> 176         self.run_pretrain_routine(model)
    177 
    178     def tpu_train(self, tpu_core_idx, model):
/usr/local/lib/python3.6/dist-packages/pytorch_lightning/trainer/trainer.py in run_pretrain_routine(self, model)
   1091 
   1092         # CORE TRAINING LOOP
-> 1093         self.train()
   1094 
   1095     def test(
/usr/local/lib/python3.6/dist-packages/pytorch_lightning/trainer/training_loop.py in train(self)
    373                 # RUN TNG EPOCH
    374                 # -----------------
--> 375                 self.run_training_epoch()
    376 
    377                 if self.max_steps and self.max_steps == self.global_step:
/usr/local/lib/python3.6/dist-packages/pytorch_lightning/trainer/training_loop.py in run_training_epoch(self)
    456             # RUN TRAIN STEP
    457             # ---------------
--> 458             _outputs = self.run_training_batch(batch, batch_idx)
    459             batch_result, grad_norm_dic, batch_step_metrics, batch_output = _outputs
    460 
/usr/local/lib/python3.6/dist-packages/pytorch_lightning/trainer/training_loop.py in run_training_batch(self, batch, batch_idx)
    632 
    633                 # calculate loss
--> 634                 loss, batch_output = optimizer_closure()
    635 
    636                 # check if loss or model weights are nan
/usr/local/lib/python3.6/dist-packages/pytorch_lightning/trainer/training_loop.py in optimizer_closure()
    596                                                                     opt_idx, self.hiddens)
    597                         else:
--> 598                             output_dict = self.training_forward(split_batch, batch_idx, opt_idx, self.hiddens)
    599 
    600                         # format and reduce outputs accordingly
/usr/local/lib/python3.6/dist-packages/pytorch_lightning/trainer/training_loop.py in training_forward(self, batch, batch_idx, opt_idx, hiddens)
    771             batch = self.transfer_batch_to_gpu(batch, gpu_id)
    772             args[0] = batch
--> 773             output = self.model.training_step(*args)
    774 
    775         # TPU support
<ipython-input-11-2482ebcf9d12> in training_step(self, batch, batch_idx)
     55   def training_step(self,batch,batch_idx):
     56     x, y = batch
---> 57     y_hat = self(x)
     58 
     59     return {'loss': F.cross_entropy(y_hat, y)}
/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
    548             result = self._slow_forward(*input, **kwargs)
    549         else:
--> 550             result = self.forward(*input, **kwargs)
    551         for hook in self._forward_hooks.values():
    552             hook_result = hook(self, input, result)
<ipython-input-11-2482ebcf9d12> in forward(self, x)
     11 
     12   def forward(self,x):
---> 13     return self.model(x)
     14 
     15   def prepare_data(self):
/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
    549         else:
    550             result = self.forward(*input, **kwargs)
--> 551         for hook in self._forward_hooks.values():
    552             hook_result = hook(self, input, result)
    553             if hook_result is not None:
RuntimeError: OrderedDict mutated during iteration

Expected behavior

We should be able to use same object with both the classes

Environment

* CUDA:
	- GPU:
		- Tesla T4
	- available:         True
	- version:           10.1
* Packages:
	- numpy:             1.18.5
	- pyTorch_debug:     False
	- pyTorch_version:   1.5.0+cu101
	- pytorch-lightning: 0.8.1
	- tensorboard:       2.2.2
	- tqdm:              4.41.1
* System:
	- OS:                Linux
	- architecture:
		- 64bit
		- 
	- processor:         x86_64
	- python:            3.6.9
	- version:           #1 SMP Wed Feb 19 05:26:34 PST 2020

The text was updated successfully, but these errors were encountered:

github-actions · 2020-06-19T20:23:01Z

Hi! thanks for your contribution!, great first issue!

awaelchli · 2020-06-19T21:31:22Z

model = CifarNet() # any pl module would work here

Could you paste the minimal code for CifarNet? I cannot reproduce it with PL examples, sorry.

Kshitij09 · 2020-06-19T22:59:56Z

Okay ! I'm not sure which part is pertaining to this issue, so here is the link to my colab notebook

awaelchli · 2020-06-20T08:48:54Z

Thanks, your notebook was very helpful. I fixed the bug here #2298
You can verify that it works by installing from

!pip install --upgrade git+https://github.com/awaelchli/pytorch-lightning@bugfix/summary_hook_handles timm wandb

in the first cell of your notebook.

Kshitij09 added bug Something isn't working help wanted Open to be worked on labels Jun 19, 2020

Borda assigned awaelchli Jun 19, 2020

awaelchli mentioned this issue Jun 20, 2020

Fix summary hook handles not getting removed #2298

Merged

7 tasks

williamFalcon closed this as completed in #2298 Jun 20, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RuntimeError: OrderedDict mutated during iteration #2281

RuntimeError: OrderedDict mutated during iteration #2281

Kshitij09 commented Jun 19, 2020

github-actions bot commented Jun 19, 2020

awaelchli commented Jun 19, 2020

Kshitij09 commented Jun 19, 2020

awaelchli commented Jun 20, 2020 •

edited

Loading

RuntimeError: OrderedDict mutated during iteration #2281

RuntimeError: OrderedDict mutated during iteration #2281

Comments

Kshitij09 commented Jun 19, 2020

🐛 Bug

To Reproduce

Stacktrace

Expected behavior

Environment

github-actions bot commented Jun 19, 2020

awaelchli commented Jun 19, 2020

Kshitij09 commented Jun 19, 2020

awaelchli commented Jun 20, 2020 • edited Loading

awaelchli commented Jun 20, 2020 •

edited

Loading