Tensor with negative dimensions / overflow error using Accelerate #352

frutiemax92 · 2024-07-06T14:50:07Z

When I try to use the internlm/internlm-xcomposer2-vl-1_8b using 2 gpus, I am getting an error when using Accelerate with this usual line:
model = accelerator.prepare(model)

This is the error:

[rank1]:     model = accelerator.prepare(model)
[rank1]:   File "C:\Users\lucas\AppData\Local\Programs\Python\Python310\lib\site-packages\accelerate\accelerator.py", line 1274, in prepare
[rank1]:     result = tuple(
[rank1]:   File "C:\Users\lucas\AppData\Local\Programs\Python\Python310\lib\site-packages\accelerate\accelerator.py", line 1275, in <genexpr>
[rank1]:     self._prepare_one(obj, first_pass=True, device_placement=d) for obj, d in zip(args, device_placement)
[rank1]:   File "C:\Users\lucas\AppData\Local\Programs\Python\Python310\lib\site-packages\accelerate\accelerator.py", line 1151, in _prepare_one
[rank1]:     return self.prepare_model(obj, device_placement=device_placement)
[rank1]:   File "C:\Users\lucas\AppData\Local\Programs\Python\Python310\lib\site-packages\accelerate\accelerator.py", line 1403, in prepare_model
[rank1]:     model = torch.nn.parallel.DistributedDataParallel(
[rank1]:   File "C:\Users\lucas\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\parallel\distributed.py", line 812, in __init__
[rank1]:     self._ddp_init_helper(
[rank1]:   File "C:\Users\lucas\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\parallel\distributed.py", line 1152, in _ddp_init_helper
[rank1]:     self.reducer = dist.Reducer(
[rank1]: RuntimeError: Trying to create tensor with negative dimension -2146648064: [-2146648064]

I've seen code examples where a single model is loaded on 2 separate gpus, but what I want to do is run two simulaneous processes using the internlm/internlm-xcomposer2-vl-1_8b model. In my setup I have 2 RTX4070 cards which I want to run in 2 separate processes which share a dataloader.

The text was updated successfully, but these errors were encountered:

mm-assistant bot assigned myownskyW7 Jul 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tensor with negative dimensions / overflow error using Accelerate #352

Tensor with negative dimensions / overflow error using Accelerate #352

frutiemax92 commented Jul 6, 2024

Tensor with negative dimensions / overflow error using Accelerate #352

Tensor with negative dimensions / overflow error using Accelerate #352

Comments

frutiemax92 commented Jul 6, 2024