RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:6 and cuda:0! #19

Haruka1307 · 2024-05-25T15:29:00Z

Hi!

I try to do step 2 on device cuda"6" since cuda "0" is in use,so I move the batches and model to cuda"6". I print device of batch and model in obtain_gradients_with_adam function to confirm.

But err occurs as below:

Traceback (most recent call last):
File "/home/u2019000171/.conda/envs/less/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/home/u2019000171/.conda/envs/less/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/home/u2019000171/cjy/LESS/less/data_selection/get_info.py", line 156, in
collect_grads(dataloader,
File "/home/u2019000171/cjy/LESS/less/data_selection/collect_grad_reps.py", line 263, in collect_grads
vectorized_grads = obtain_gradients_with_adam(model, batch, m, v)
File "/home/u2019000171/cjy/LESS/less/data_selection/collect_grad_reps.py", line 121, in obtain_gradients_with_adam
loss = model(**batch,).loss
File "/home/u2019000171/.conda/envs/less/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/u2019000171/.conda/envs/less/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/home/u2019000171/.conda/envs/less/lib/python3.10/site-packages/peft/peft_model.py", line 1081, in forward
return self.base_model(
File "/home/u2019000171/.conda/envs/less/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/u2019000171/.conda/envs/less/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/home/u2019000171/.conda/envs/less/lib/python3.10/site-packages/peft/tuners/tuners_utils.py", line 103, in forward
return self.model.forward(*args, **kwargs)
File "/home/u2019000171/.conda/envs/less/lib/python3.10/site-packages/accelerate/hooks.py", line 166, in new_forward
output = module._old_forward(*args, **kwargs)
File "/home/u2019000171/.conda/envs/less/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 1183, in forward
outputs = self.model(
File "/home/u2019000171/.conda/envs/less/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/u2019000171/.conda/envs/less/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/home/u2019000171/.conda/envs/less/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 1026, in forward
inputs_embeds = self.embed_tokens(input_ids)
File "/home/u2019000171/.conda/envs/less/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/u2019000171/.conda/envs/less/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/home/u2019000171/.conda/envs/less/lib/python3.10/site-packages/accelerate/hooks.py", line 166, in new_forward
output = module._old_forward(*args, **kwargs)
File "/home/u2019000171/.conda/envs/less/lib/python3.10/site-packages/torch/nn/modules/sparse.py", line 162, in forward
return F.embedding(
File "/home/u2019000171/.conda/envs/less/lib/python3.10/site-packages/torch/nn/functional.py", line 2233, in embedding
return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:6 and cuda:0! (when checking argument for argument index in method wrapper_CUDA__index_select)
Using Adam gradients
cuda:6
cuda:6

I didn't figure out that...

xiamengzhou · 2024-06-17T14:13:41Z

Could you try export CUDA_VISIBLE_DEVICES=6 instead?

DDrShieh · 2024-06-18T07:29:57Z

Could you try export CUDA_VISIBLE_DEVICES=6 instead?

It works for 80G momory devices but not for less memory devices. Could you please share any advice for other devices like 32G instead of CPU-based method?

Haruka1307 · 2024-07-11T05:54:39Z

Could you try export CUDA_VISIBLE_DEVICES=6 instead?

It works for 80G momory devices but not for less memory devices. Could you please share any advice for other devices like 32G instead of CPU-based method?

You may try fix device_map to device_map = {'': 'cuda:0'} ? .I think auto device map may lead to put model to cuda 1

DDrShieh mentioned this issue Jun 22, 2024

OOM Errors in Step 3.1 #20

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:6 and cuda:0! #19

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:6 and cuda:0! #19

Haruka1307 commented May 25, 2024

xiamengzhou commented Jun 17, 2024

DDrShieh commented Jun 18, 2024

Haruka1307 commented Jul 11, 2024

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:6 and cuda:0! #19

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:6 and cuda:0! #19

Comments

Haruka1307 commented May 25, 2024

xiamengzhou commented Jun 17, 2024

DDrShieh commented Jun 18, 2024

Haruka1307 commented Jul 11, 2024