You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Traceback (most recent call last):
File "src/run.py", line 96, in <module>
args.func(args)
File "src/run.py", line 65, in train
trainer.fit(model, train_dl, valid_dl)
File "/qiuzihan/image-gpt/gpt2-image/lib/python3.6/site-packages/pytorch_lightning/trainer/trainer.py", line 458, in fit
self._run(model)
File "/qiuzihan/image-gpt/gpt2-image/lib/python3.6/site-packages/pytorch_lightning/trainer/trainer.py", line 756, in _run
self.dispatch()
File "/qiuzihan/image-gpt/gpt2-image/lib/python3.6/site-packages/pytorch_lightning/trainer/trainer.py", line 797, in dispatch
self.accelerator.start_training(self)
File "/qiuzihan/image-gpt/gpt2-image/lib/python3.6/site-packages/pytorch_lightning/accelerators/accelerator.py", line 96, in start_training
self.training_type_plugin.start_training(trainer)
File "/qiuzihan/image-gpt/gpt2-image/lib/python3.6/site-packages/pytorch_lightning/plugins/training_type/training_type_plugin.py", line 144, in start_training
self._results = trainer.run_stage()
File "/qiuzihan/image-gpt/gpt2-image/lib/python3.6/site-packages/pytorch_lightning/trainer/trainer.py", line 807, in run_stage
return self.run_train()
File "/qiuzihan/image-gpt/gpt2-image/lib/python3.6/site-packages/pytorch_lightning/trainer/trainer.py", line 842, in run_train
self.run_sanity_check(self.lightning_module)
File "/qiuzihan/image-gpt/gpt2-image/lib/python3.6/site-packages/pytorch_lightning/trainer/trainer.py", line 1107, in run_sanity_check
self.run_evaluation()
File "/qiuzihan/image-gpt/gpt2-image/lib/python3.6/site-packages/pytorch_lightning/trainer/trainer.py", line 962, in run_evaluation
output = self.evaluation_loop.evaluation_step(batch, batch_idx, dataloader_idx)
File "/qiuzihan/image-gpt/gpt2-image/lib/python3.6/site-packages/pytorch_lightning/trainer/evaluation_loop.py", line 174, in evaluation_step
output = self.trainer.accelerator.validation_step(args)
File "/qiuzihan/image-gpt/gpt2-image/lib/python3.6/site-packages/pytorch_lightning/accelerators/accelerator.py", line 226, in validation_step
return self.training_type_plugin.validation_step(*args)
File "/qiuzihan/image-gpt/gpt2-image/lib/python3.6/site-packages/pytorch_lightning/plugins/training_type/ddp.py", line 326, in validation_step
return self.model(*args, **kwargs)
File "/qiuzihan/image-gpt/gpt2-image/lib/python3.6/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/qiuzihan/image-gpt/gpt2-image/lib/python3.6/site-packages/deepspeed/runtime/engine.py", line 1098, in forward
loss = self.module(*inputs, **kwargs)
File "/qiuzihan/image-gpt/gpt2-image/lib/python3.6/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/qiuzihan/image-gpt/gpt2-image/lib/python3.6/site-packages/pytorch_lightning/plugins/training_type/deepspeed.py", line 62, in forward
return super().forward(*inputs, **kwargs)
File "/qiuzihan/image-gpt/gpt2-image/lib/python3.6/site-packages/pytorch_lightning/overrides/base.py", line 57, in forward
output = self.module.validation_step(*inputs, **kwargs)
File "/qiuzihan/image-gpt/src/image_gpt.py", line 125, in validation_step
logits = self.gpt(x)
File "/qiuzihan/image-gpt/gpt2-image/lib/python3.6/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/qiuzihan/image-gpt/src/gpt2.py", line 74, in forward
h = layer(h)
File "/qiuzihan/image-gpt/gpt2-image/lib/python3.6/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/qiuzihan/image-gpt/src/gpt2.py", line 24, in forward
x = self.ln_1(x)
File "/qiuzihan/image-gpt/gpt2-image/lib/python3.6/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/qiuzihan/image-gpt/gpt2-image/lib/python3.6/site-packages/torch/nn/modules/normalization.py", line 171, in forward
input, self.normalized_shape, self.weight, self.bias, self.eps)
File "/qiuzihan/image-gpt/gpt2-image/lib/python3.6/site-packages/torch/nn/functional.py", line 2202, in layer_norm
return torch.layer_norm(input, normalized_shape, weight, bias, eps, torch.backends.cudnn.enabled)
RuntimeError: expected scalar type Float but found Half
Environment
Note: Bugs with code are solved faster ! Colab Notebook should be made public !
Colab Notebook: Please copy and paste the output from our environment collection script (or fill out the checklist below manually).
You can get the script and run it with:
wget https://github.com/raw/PyTorchLightning/pytorch-lightning/master/tests/collect_env_details.py
# For security purposes, please check the contents of collect_env_details.py before running it.
python collect_env_details.py
PyTorch Version (e.g., 1.0): 1.8.0
OS (e.g., Linux): Linux
How you installed PyTorch (conda, pip, source): pip3
Build command you used (if compiling from source):
I've made a fix in lightning bolts here: Lightning-Universe/lightning-bolts#694 with the latest DeepSpeed this works as they've fixed the underlying issue with GPT vision models as well :)
For anyone who is doing custom code, make sure the types are correct of any tensors you're making within the forward pass of your module.
🐛 Bug
Please reproduce using the BoringModel
To Reproduce
####1.
(add plugin deepspeed_stage_2 into pl.Trainer )
####2.change:
into:
####3. run
Use following BoringModel and post here
Expected behavior
Environment
Note:
Bugs with code
are solved faster !Colab Notebook
should be madepublic
!IDE
: Please, use our python bug_report_model.py template.Colab Notebook
: Please copy and paste the output from our environment collection script (or fill out the checklist below manually).You can get the script and run it with:
conda
,pip
, source): pip3Additional context
The text was updated successfully, but these errors were encountered: