Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RuntimeError: could not create an engine #2379

Open
aruhela opened this issue Jul 6, 2024 · 0 comments
Open

RuntimeError: could not create an engine #2379

aruhela opened this issue Jul 6, 2024 · 0 comments
Labels
bug Something isn't working

Comments

@aruhela
Copy link

aruhela commented Jul 6, 2024

Hi Intel Team

I am observing "could not create an engine" error in executing demo.py example from "oneCCL Bindings for PyTorch Getting Started Sample*". The code is run on Saphire node with 4 PVCs at TACC system. Any suggestions on identifying the cause and fixing it?

(base) c551-003pvc$ mpirun -n 2 -l python demo.py -dev xpu
[0] Runing Iteration: 0 on device xpu:0
[0] Runing forward: 0 on device xpu:0
[0] Traceback (most recent call last):
[0] File "/scratch/05231/aruhela/demo.py", line 67, in
[1] Runing Iteration: 0 on device xpu:1
[1] Runing forward: 0 on device xpu:1
[1] Traceback (most recent call last):
[1] File "/scratch/05231/aruhela/demo.py", line 67, in
[0] res = model(input)
[0] File "/scratch/projects/compilers/intel24.2/oneapi/intelpython/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
[1] res = model(input)
[1] File "/scratch/projects/compilers/intel24.2/oneapi/intelpython/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
[0] return self._call_impl(*args, **kwargs)
[0] File "/scratch/projects/compilers/intel24.2/oneapi/intelpython/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
[1] return self._call_impl(*args, **kwargs)
[1] File "/scratch/projects/compilers/intel24.2/oneapi/intelpython/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
[0] return forward_call(*args, **kwargs)
[0] File "/scratch/projects/compilers/intel24.2/oneapi/intelpython/lib/python3.9/site-packages/torch/nn/parallel/distributed.py", line 1519, in forward
[1] return forward_call(*args, **kwargs)
[1] File "/scratch/projects/compilers/intel24.2/oneapi/intelpython/lib/python3.9/site-packages/torch/nn/parallel/distributed.py", line 1519, in forward
[0] else self._run_ddp_forward(*inputs, **kwargs)
[0] File "/scratch/projects/compilers/intel24.2/oneapi/intelpython/lib/python3.9/site-packages/torch/nn/parallel/distributed.py", line 1355, in _run_ddp_forward
[1] else self._run_ddp_forward(*inputs, **kwargs)
[1] File "/scratch/projects/compilers/intel24.2/oneapi/intelpython/lib/python3.9/site-packages/torch/nn/parallel/distributed.py", line 1355, in _run_ddp_forward
[0] return self.module(*inputs, **kwargs) # type: ignore[index]
[0] File "/scratch/projects/compilers/intel24.2/oneapi/intelpython/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
[1] return self.module(*inputs, **kwargs) # type: ignore[index]
[1] File "/scratch/projects/compilers/intel24.2/oneapi/intelpython/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
[0] return self._call_impl(*args, **kwargs)
[0] File "/scratch/projects/compilers/intel24.2/oneapi/intelpython/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
[1] return self._call_impl(*args, **kwargs)
[1] File "/scratch/projects/compilers/intel24.2/oneapi/intelpython/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
[0] return forward_call(*args, **kwargs)
[0] File "/scratch/05231/aruhela/demo.py", line 26, in forward
[1] return forward_call(*args, **kwargs)
[1] File "/scratch/05231/aruhela/demo.py", line 26, in forward
[0] return self.linear(input)
[0] File "/scratch/projects/compilers/intel24.2/oneapi/intelpython/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
[1] return self.linear(input)
[1] File "/scratch/projects/compilers/intel24.2/oneapi/intelpython/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
[0] return self._call_impl(*args, **kwargs)
[0] File "/scratch/projects/compilers/intel24.2/oneapi/intelpython/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
[1] return self._call_impl(*args, **kwargs)
[1] File "/scratch/projects/compilers/intel24.2/oneapi/intelpython/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
[0] return forward_call(*args, **kwargs)
[0] File "/scratch/projects/compilers/intel24.2/oneapi/intelpython/lib/python3.9/site-packages/torch/nn/modules/linear.py", line 114, in forward
[1] return forward_call(*args, **kwargs)
[1] File "/scratch/projects/compilers/intel24.2/oneapi/intelpython/lib/python3.9/site-packages/torch/nn/modules/linear.py", line 114, in forward
[0] return F.linear(input, self.weight, self.bias)
[0] RuntimeError: could not create an engine
[1] return F.linear(input, self.weight, self.bias)
[1] RuntimeError: could not create an engine
(base) c551-003pvc$

Notes: OneAPI release is 2024.2
Install command (AI Selector Tool):
conda install -c intel -c conda-forge --override-channels intel/label/oneapi::intel-extension-for-pytorch=2.1.20 intel/label/oneapi::pytorch=2.1.0 intel/label/oneapi::oneccl_bind_pt=2.1.200 intel/label/oneapi::torchvision=0.16.0 intel/label/oneapi::torchaudio=2.1.0 conda-forge::deepspeed=0.14.0 python=3.9

Thanks
Amit Ruhela

@aruhela aruhela added the bug Something isn't working label Jul 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant