to() got an unexpected keyword argument 'non_blocking' for DGLGraph #2637

JiangYize · 2020-07-18T09:57:26Z

🐛 Bug

To Reproduce

I use dgl library to make a gnn and batch the DGLGraph.
No problem during training, but in test, I got a TypeError: to() got an unexpected keyword argument 'non_blocking'

<class 'dgl.graph.DGLGraph'> .to() function has no keyword argument 'non_blocking'

Code sample

Expected behavior

Environment

OS: Linux
CUDA: 10.1
Python Version: 3.7
PyTorch Version: 1.5.1
DGL Version: 0.4.3post2
PyTorch-Lightning Version: 0.8.5

Additional context

   File "../src/main.py", line 131, in <module>
    run(params)
  File "../src/main.py", line 92, in run
    trainer.test(model)
  File "/home/jiangyize/miniconda3/envs/galixir/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 1279, in test
    results = self.__test_given_model(model, test_dataloaders)
  File "/home/jiangyize/miniconda3/envs/galixir/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 1346, in __test_given_model
    results = self.fit(model)
  File "/home/jiangyize/miniconda3/envs/galixir/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 1003, in fit
    results = self.single_gpu_train(model)
  File "/home/jiangyize/miniconda3/envs/galixir/lib/python3.7/site-packages/pytorch_lightning/trainer/distrib_parts.py", line 186, in single_gpu_train
    results = self.run_pretrain_routine(model)
  File "/home/jiangyize/miniconda3/envs/galixir/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 1166, in run_pretrain_routine
    results = self.run_evaluation(test_mode=True)
  File "/home/jiangyize/miniconda3/envs/galixir/lib/python3.7/site-packages/pytorch_lightning/trainer/evaluation_loop.py", line 391, in run_evaluation
    eval_results = self._evaluate(self.model, dataloaders, max_batches, test_mode)
  File "/home/jiangyize/miniconda3/envs/galixir/lib/python3.7/site-packages/pytorch_lightning/trainer/evaluation_loop.py", line 293, in _evaluate
    output = self.evaluation_forward(model, batch, batch_idx, dataloader_idx, test_mode)
  File "/home/jiangyize/miniconda3/envs/galixir/lib/python3.7/site-packages/pytorch_lightning/trainer/evaluation_loop.py", line 458, in evaluation_forward
    batch = self.transfer_batch_to_gpu(batch, root_gpu)
  File "/home/jiangyize/miniconda3/envs/galixir/lib/python3.7/site-packages/pytorch_lightning/trainer/distrib_parts.py", line 159, in transfer_batch_to_gpu
    return self.__transfer_batch_to_device(batch, device)
  File "/home/jiangyize/miniconda3/envs/galixir/lib/python3.7/site-packages/pytorch_lightning/trainer/distrib_parts.py", line 164, in __transfer_batch_to_device
    return model.transfer_batch_to_device(batch, device)
  File "/home/jiangyize/miniconda3/envs/galixir/lib/python3.7/site-packages/pytorch_lightning/core/hooks.py", line 242, in transfer_batch_to_device
    return move_data_to_device(batch, device)
  File "/home/jiangyize/miniconda3/envs/galixir/lib/python3.7/site-packages/pytorch_lightning/utilities/apply_func.py", line 109, in move_data_to_device
    return apply_to_collection(batch, dtype=(TransferableDataType, Batch), function=batch_to)
  File "/home/jiangyize/miniconda3/envs/galixir/lib/python3.7/site-packages/pytorch_lightning/utilities/apply_func.py", line 40, in apply_to_collection
    for k, v in data.items()})
  File "/home/jiangyize/miniconda3/envs/galixir/lib/python3.7/site-packages/pytorch_lightning/utilities/apply_func.py", line 40, in <dictcomp>
    for k, v in data.items()})
  File "/home/jiangyize/miniconda3/envs/galixir/lib/python3.7/site-packages/pytorch_lightning/utilities/apply_func.py", line 35, in apply_to_collection
    return function(data, *args, **kwargs)
  File "/home/jiangyize/miniconda3/envs/galixir/lib/python3.7/site-packages/pytorch_lightning/utilities/apply_func.py", line 107, in batch_to
    return data.to(device, non_blocking=True)
TypeError: to() got an unexpected keyword argument 'non_blocking'

The text was updated successfully, but these errors were encountered:

github-actions · 2020-07-18T09:58:17Z

Hi! thanks for your contribution!, great first issue!

jacobdanovitch · 2020-07-22T14:16:42Z

Having the same problem; it's because DGLGraph.to (docs, source) doesn't take the non_blocking argument. Example:

dgl.DGLGraph().to('cuda', non_blocking=True)

Here's my temporary solution:

class LightningDGLGraph(DGLGraph):
    def to(self, ctx, *args, **kwargs):
        return super().to(torch.device(ctx))

g = LightningDGLGraph()
g.to('cuda', non_blocking=True)

Works, but probably not ideal.

Emrys-Hong · 2020-07-22T20:23:12Z

Having the same problem; it's because DGLGraph.to (docs, source) doesn't take the non_blocking argument. Example:
dgl.DGLGraph().to('cuda', non_blocking=True)
Here's my temporary solution:
class LightningDGLGraph(DGLGraph):
    def to(self, ctx, *args, **kwargs):
        return super().to(torch.device(ctx))

g = LightningDGLGraph()
g.to('cuda', non_blocking=True)
Works, but probably not ideal.

Hi, I wonder how this will work if using dgl.batch? the class type they return to you is a DGLGraph.

Ok, they also have this quick fix here: dmlc/dgl#1600.
so uninstall the stable version and install the latest version from main solves my problem:

pip install --pre dgl           # For CPU Build
pip install --pre dgl-cu90      # For CUDA 9.0 Build
pip install --pre dgl-cu92      # For CUDA 9.2 Build
pip install --pre dgl-cu100     # For CUDA 10.0 Build
pip install --pre dgl-cu101     # For CUDA 10.1 Build

jacobdanovitch · 2020-07-22T20:54:33Z

It seems to work for me for now.

jacobdanovitch · 2020-07-26T16:16:30Z

Ok, they also have this quick fix here: dmlc/dgl#1600.
so uninstall the stable version and install the latest version from main solves my problem:

Just saw your edit. ~~This seems to work if I don't specify the number of gpus; when I do, same error.~~ E: It's the distributed backend; it never calls graph.to. You can throw a 0/0 in there and it'll never break with distributed_backend ddp.

awaelchli · 2020-08-01T17:12:01Z

For a clean solution in Lightning, override this model hook and call .to() yourself on the graph object.

awaelchli · 2020-08-01T17:14:14Z

Regarding ddp, is DGLGraph supposed to work with that (I mean in plain pytorch)? I don't think it can work with scatter and gather.

jacobdanovitch · 2020-08-11T00:05:50Z

For a clean solution in Lightning, override this model hook and call .to() yourself on the graph object.

@awaelchli Is this supposed to be overridden in the model? It doesn't seem to get called for me in a distributed setting.

awaelchli · 2020-08-11T02:57:55Z

@jacobdanovitch Yes, this hook only works for single gpu, because in distributed we need to scatter and gather a batch, and if it is a custom object we don't know how to do that. For this, you would have to define your own DistributedDataParallel module and configure it in the configure_ddp model hook. We should probably update the docs regarding that.

JiangYize added bug Something isn't working help wanted Open to be worked on labels Jul 18, 2020

awaelchli mentioned this issue Aug 11, 2020

Do not pass non_blocking=True if it does not support this argument #2910

Merged

7 tasks

williamFalcon closed this as completed in #2910 Aug 11, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

to() got an unexpected keyword argument 'non_blocking' for DGLGraph #2637

to() got an unexpected keyword argument 'non_blocking' for DGLGraph #2637

JiangYize commented Jul 18, 2020 •

edited by Borda

Loading

github-actions bot commented Jul 18, 2020

jacobdanovitch commented Jul 22, 2020

Emrys-Hong commented Jul 22, 2020 •

edited

Loading

jacobdanovitch commented Jul 22, 2020

jacobdanovitch commented Jul 26, 2020 •

edited

Loading

awaelchli commented Aug 1, 2020 •

edited

Loading

awaelchli commented Aug 1, 2020

jacobdanovitch commented Aug 11, 2020

awaelchli commented Aug 11, 2020 •

edited

Loading

to() got an unexpected keyword argument 'non_blocking' for DGLGraph #2637

to() got an unexpected keyword argument 'non_blocking' for DGLGraph #2637

Comments

JiangYize commented Jul 18, 2020 • edited by Borda Loading

🐛 Bug

To Reproduce

Code sample

Expected behavior

Environment

Additional context

github-actions bot commented Jul 18, 2020

jacobdanovitch commented Jul 22, 2020

Emrys-Hong commented Jul 22, 2020 • edited Loading

jacobdanovitch commented Jul 22, 2020

jacobdanovitch commented Jul 26, 2020 • edited Loading

awaelchli commented Aug 1, 2020 • edited Loading

awaelchli commented Aug 1, 2020

jacobdanovitch commented Aug 11, 2020

awaelchli commented Aug 11, 2020 • edited Loading

JiangYize commented Jul 18, 2020 •

edited by Borda

Loading

Emrys-Hong commented Jul 22, 2020 •

edited

Loading

jacobdanovitch commented Jul 26, 2020 •

edited

Loading

awaelchli commented Aug 1, 2020 •

edited

Loading

awaelchli commented Aug 11, 2020 •

edited

Loading