Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

to() got an unexpected keyword argument 'non_blocking' for DGLGraph #2637

Closed
JiangYize opened this issue Jul 18, 2020 · 9 comments · Fixed by #2910
Closed

to() got an unexpected keyword argument 'non_blocking' for DGLGraph #2637

JiangYize opened this issue Jul 18, 2020 · 9 comments · Fixed by #2910
Labels
bug Something isn't working help wanted Open to be worked on

Comments

@JiangYize
Copy link

JiangYize commented Jul 18, 2020

🐛 Bug

To Reproduce

I use dgl library to make a gnn and batch the DGLGraph.
No problem during training, but in test, I got a TypeError: to() got an unexpected keyword argument 'non_blocking'

<class 'dgl.graph.DGLGraph'> .to() function has no keyword argument 'non_blocking'

Code sample

Expected behavior

Environment

  • OS: Linux
  • CUDA: 10.1
  • Python Version: 3.7
  • PyTorch Version: 1.5.1
  • DGL Version: 0.4.3post2
  • PyTorch-Lightning Version: 0.8.5

Additional context

   File "../src/main.py", line 131, in <module>
    run(params)
  File "../src/main.py", line 92, in run
    trainer.test(model)
  File "/home/jiangyize/miniconda3/envs/galixir/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 1279, in test
    results = self.__test_given_model(model, test_dataloaders)
  File "/home/jiangyize/miniconda3/envs/galixir/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 1346, in __test_given_model
    results = self.fit(model)
  File "/home/jiangyize/miniconda3/envs/galixir/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 1003, in fit
    results = self.single_gpu_train(model)
  File "/home/jiangyize/miniconda3/envs/galixir/lib/python3.7/site-packages/pytorch_lightning/trainer/distrib_parts.py", line 186, in single_gpu_train
    results = self.run_pretrain_routine(model)
  File "/home/jiangyize/miniconda3/envs/galixir/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 1166, in run_pretrain_routine
    results = self.run_evaluation(test_mode=True)
  File "/home/jiangyize/miniconda3/envs/galixir/lib/python3.7/site-packages/pytorch_lightning/trainer/evaluation_loop.py", line 391, in run_evaluation
    eval_results = self._evaluate(self.model, dataloaders, max_batches, test_mode)
  File "/home/jiangyize/miniconda3/envs/galixir/lib/python3.7/site-packages/pytorch_lightning/trainer/evaluation_loop.py", line 293, in _evaluate
    output = self.evaluation_forward(model, batch, batch_idx, dataloader_idx, test_mode)
  File "/home/jiangyize/miniconda3/envs/galixir/lib/python3.7/site-packages/pytorch_lightning/trainer/evaluation_loop.py", line 458, in evaluation_forward
    batch = self.transfer_batch_to_gpu(batch, root_gpu)
  File "/home/jiangyize/miniconda3/envs/galixir/lib/python3.7/site-packages/pytorch_lightning/trainer/distrib_parts.py", line 159, in transfer_batch_to_gpu
    return self.__transfer_batch_to_device(batch, device)
  File "/home/jiangyize/miniconda3/envs/galixir/lib/python3.7/site-packages/pytorch_lightning/trainer/distrib_parts.py", line 164, in __transfer_batch_to_device
    return model.transfer_batch_to_device(batch, device)
  File "/home/jiangyize/miniconda3/envs/galixir/lib/python3.7/site-packages/pytorch_lightning/core/hooks.py", line 242, in transfer_batch_to_device
    return move_data_to_device(batch, device)
  File "/home/jiangyize/miniconda3/envs/galixir/lib/python3.7/site-packages/pytorch_lightning/utilities/apply_func.py", line 109, in move_data_to_device
    return apply_to_collection(batch, dtype=(TransferableDataType, Batch), function=batch_to)
  File "/home/jiangyize/miniconda3/envs/galixir/lib/python3.7/site-packages/pytorch_lightning/utilities/apply_func.py", line 40, in apply_to_collection
    for k, v in data.items()})
  File "/home/jiangyize/miniconda3/envs/galixir/lib/python3.7/site-packages/pytorch_lightning/utilities/apply_func.py", line 40, in <dictcomp>
    for k, v in data.items()})
  File "/home/jiangyize/miniconda3/envs/galixir/lib/python3.7/site-packages/pytorch_lightning/utilities/apply_func.py", line 35, in apply_to_collection
    return function(data, *args, **kwargs)
  File "/home/jiangyize/miniconda3/envs/galixir/lib/python3.7/site-packages/pytorch_lightning/utilities/apply_func.py", line 107, in batch_to
    return data.to(device, non_blocking=True)
TypeError: to() got an unexpected keyword argument 'non_blocking'
@JiangYize JiangYize added bug Something isn't working help wanted Open to be worked on labels Jul 18, 2020
@github-actions
Copy link
Contributor

Hi! thanks for your contribution!, great first issue!

@jacobdanovitch
Copy link

Having the same problem; it's because DGLGraph.to (docs, source) doesn't take the non_blocking argument. Example:

dgl.DGLGraph().to('cuda', non_blocking=True)

Here's my temporary solution:

class LightningDGLGraph(DGLGraph):
    def to(self, ctx, *args, **kwargs):
        return super().to(torch.device(ctx))

g = LightningDGLGraph()
g.to('cuda', non_blocking=True)

Works, but probably not ideal.

@Emrys-Hong
Copy link

Emrys-Hong commented Jul 22, 2020

Having the same problem; it's because DGLGraph.to (docs, source) doesn't take the non_blocking argument. Example:

dgl.DGLGraph().to('cuda', non_blocking=True)

Here's my temporary solution:

class LightningDGLGraph(DGLGraph):
    def to(self, ctx, *args, **kwargs):
        return super().to(torch.device(ctx))

g = LightningDGLGraph()
g.to('cuda', non_blocking=True)

Works, but probably not ideal.

Hi, I wonder how this will work if using dgl.batch? the class type they return to you is a DGLGraph.

Ok, they also have this quick fix here: dmlc/dgl#1600.
so uninstall the stable version and install the latest version from main solves my problem:

pip install --pre dgl           # For CPU Build
pip install --pre dgl-cu90      # For CUDA 9.0 Build
pip install --pre dgl-cu92      # For CUDA 9.2 Build
pip install --pre dgl-cu100     # For CUDA 10.0 Build
pip install --pre dgl-cu101     # For CUDA 10.1 Build

@jacobdanovitch
Copy link

It seems to work for me for now.

@jacobdanovitch
Copy link

jacobdanovitch commented Jul 26, 2020

Ok, they also have this quick fix here: dmlc/dgl#1600.
so uninstall the stable version and install the latest version from main solves my problem:

Just saw your edit. This seems to work if I don't specify the number of gpus; when I do, same error. E: It's the distributed backend; it never calls graph.to. You can throw a 0/0 in there and it'll never break with distributed_backend ddp.

@awaelchli
Copy link
Member

awaelchli commented Aug 1, 2020

For a clean solution in Lightning, override this model hook and call .to() yourself on the graph object.

@awaelchli
Copy link
Member

Regarding ddp, is DGLGraph supposed to work with that (I mean in plain pytorch)? I don't think it can work with scatter and gather.

@jacobdanovitch
Copy link

For a clean solution in Lightning, override this model hook and call .to() yourself on the graph object.

@awaelchli Is this supposed to be overridden in the model? It doesn't seem to get called for me in a distributed setting.

@awaelchli
Copy link
Member

awaelchli commented Aug 11, 2020

@jacobdanovitch Yes, this hook only works for single gpu, because in distributed we need to scatter and gather a batch, and if it is a custom object we don't know how to do that. For this, you would have to define your own DistributedDataParallel module and configure it in the configure_ddp model hook. We should probably update the docs regarding that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working help wanted Open to be worked on
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants