Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tensor with negative dimensions / overflow error using Accelerate #352

Open
frutiemax92 opened this issue Jul 6, 2024 · 0 comments
Open
Assignees

Comments

@frutiemax92
Copy link

When I try to use the internlm/internlm-xcomposer2-vl-1_8b using 2 gpus, I am getting an error when using Accelerate with this usual line:
model = accelerator.prepare(model)

This is the error:

[rank1]:     model = accelerator.prepare(model)
[rank1]:   File "C:\Users\lucas\AppData\Local\Programs\Python\Python310\lib\site-packages\accelerate\accelerator.py", line 1274, in prepare
[rank1]:     result = tuple(
[rank1]:   File "C:\Users\lucas\AppData\Local\Programs\Python\Python310\lib\site-packages\accelerate\accelerator.py", line 1275, in <genexpr>
[rank1]:     self._prepare_one(obj, first_pass=True, device_placement=d) for obj, d in zip(args, device_placement)
[rank1]:   File "C:\Users\lucas\AppData\Local\Programs\Python\Python310\lib\site-packages\accelerate\accelerator.py", line 1151, in _prepare_one
[rank1]:     return self.prepare_model(obj, device_placement=device_placement)
[rank1]:   File "C:\Users\lucas\AppData\Local\Programs\Python\Python310\lib\site-packages\accelerate\accelerator.py", line 1403, in prepare_model
[rank1]:     model = torch.nn.parallel.DistributedDataParallel(
[rank1]:   File "C:\Users\lucas\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\parallel\distributed.py", line 812, in __init__
[rank1]:     self._ddp_init_helper(
[rank1]:   File "C:\Users\lucas\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\parallel\distributed.py", line 1152, in _ddp_init_helper
[rank1]:     self.reducer = dist.Reducer(
[rank1]: RuntimeError: Trying to create tensor with negative dimension -2146648064: [-2146648064]

I've seen code examples where a single model is loaded on 2 separate gpus, but what I want to do is run two simulaneous processes using the internlm/internlm-xcomposer2-vl-1_8b model. In my setup I have 2 RTX4070 cards which I want to run in 2 separate processes which share a dataloader.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants