'RuntimeError: GET was unable to find an engine to execute this computation' #43

VikasAmaraneni · 2024-04-12T19:20:45Z

Hello Everyone,
I'm using pytorch version=2.2.1 and CUDA=12.1, python version = 3.12.2 and I'm getting the following error;

'RuntimeError: RuntimeError Traceback (most recent call last)
Cell In[16], line 47
45 num_epochs = 10
46 for epoch in range(num_epochs):
---> 47 train_loss, train_time = train(model, train_loader, criterion, optimizer)
48 val_loss, val_accuracy, val_time = validate(model, val_loader, criterion)
49 print(f'Epoch {epoch+1}/{num_epochs}, Train Loss: {train_loss:.4f}, Train Time: {train_time:.2f}s, '
50 f'Val Loss: {val_loss:.4f}, Val Accuracy: {val_accuracy:.4f}, Val Time: {val_time:.2f}s')

Cell In[16], line 13, in train(model, train_loader, criterion, optimizer)
11 outputs = model(inputs)
12 loss = criterion(outputs, labels) # Calculate loss between model outputs and ground truth
---> 13 loss.backward()
14 optimizer.step()
15 running_loss += loss.item() * inputs.size(0) # Update running loss

File ~/.conda/envs/torchTest1/lib/python3.12/site-packages/torch/_tensor.py:522, in Tensor.backward(self, gradient, retain_graph, create_graph, inputs)
512 if has_torch_function_unary(self):
513 return handle_torch_function(
514 Tensor.backward,
515 (self,),
(...)
520 inputs=inputs,
521 )
--> 522 torch.autograd.backward(
523 self, gradient, retain_graph, create_graph, inputs=inputs
524 )

File ~/.conda/envs/torchTest1/lib/python3.12/site-packages/torch/autograd/init.py:266, in backward(tensors, grad_tensors, retain_graph, create_graph, grad_variables, inputs)
261 retain_graph = create_graph
263 # The reason we repeat the same comment below is that
264 # some Python versions print out the first line of a multi-line function
265 # calls in the traceback and some print out the last line
--> 266 Variable.execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
267 tensors,
268 grad_tensors,
269 retain_graph,
270 create_graph,
271 inputs,
272 allow_unreachable=True,
273 accumulate_grad=True,
274 )

RuntimeError: GET was unable to find an engine to execute this computation'

Originally posted by @VikasAmaraneni in ultralytics/ultralytics#4060 (comment)

shuyueW1991 · 2024-05-16T06:40:16Z

hi, there. I fixed a similar problem by matching the version of torch, torchvision, as well as torchaudio according to what is said on the PyTorch official release website. One such feasible solution is:
torch==2.1.0 torchvision==0.16.0 torchaudio==2.1.0

VikasAmaraneni · 2024-05-17T16:19:59Z

Thank you so much, it worked.

shuyueW1991 · 2024-06-19T06:35:45Z

I run into the problem again. I think the solution is not really the matching versions between. torch, torch vision, and torch audio. The solution should be:

echo $LD_LIBRARY_PATH;
go to the directory
rename the problematic libcudnn_cnn_train.so.8 (or whatever is mentioned in message) as a copy.
Now the system wouldn't go to this env var for cuda/cudnn shit. The underlying reason is that torch brings its own cuda/cudnn. We need to make them called.
Done.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

'RuntimeError: GET was unable to find an engine to execute this computation' #43

'RuntimeError: GET was unable to find an engine to execute this computation' #43

VikasAmaraneni commented Apr 12, 2024

shuyueW1991 commented May 16, 2024

VikasAmaraneni commented May 17, 2024

shuyueW1991 commented Jun 19, 2024

'RuntimeError: GET was unable to find an engine to execute this computation' #43

'RuntimeError: GET was unable to find an engine to execute this computation' #43

Comments

VikasAmaraneni commented Apr 12, 2024

shuyueW1991 commented May 16, 2024

VikasAmaraneni commented May 17, 2024

shuyueW1991 commented Jun 19, 2024