Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error while running benchmark.py on GPU #185

Closed
brm738 opened this issue Apr 1, 2022 · 6 comments · Fixed by #221 or #353
Closed

Error while running benchmark.py on GPU #185

brm738 opened this issue Apr 1, 2022 · 6 comments · Fixed by #221 or #353
Assignees
Labels
Bug Something isn't working

Comments

@brm738
Copy link

brm738 commented Apr 1, 2022

Running benchmark.py on Nvidia A100 SXM4 produces an error:

Exception: Error occurred while computing benchmark on device File "tools/benchmarking/benchmark.py", line 177, in distribute
raise Exception(f"Error occurred while computing benchmark on device {job}") from exception
Exception: Error occurred while computing benchmark on device <Future at 0x7f05c18ac310 state=finished raised Exception>

Note: In my setup this error does not occur when running on CPU only. It does not occur either when running on CPU and GPU for patchcore model. It occurs when running on GPU for padim or cflow

@samet-akcay samet-akcay added Benchmark Bug Something isn't working labels Apr 1, 2022
@samet-akcay samet-akcay modified the milestone: Backlog Apr 4, 2022
@ashwinvaidya17 ashwinvaidya17 added this to the Backlog milestone Apr 4, 2022
@samet-akcay samet-akcay modified the milestones: Backlog, v0.2.7 Apr 4, 2022
@ashwinvaidya17
Copy link
Collaborator

Can you share the entire log?

@brm738
Copy link
Author

brm738 commented Apr 7, 2022

Traceback (most recent call last):
File "/usr/lib/python3.8/concurrent/futures/process.py", line 239, in _process_worker
r = call_item.fn(*call_item.args, **call_item.kwargs)
File "/data/home/epi/anomalib/tools/benchmarking/benchmark.py", line 137, in compute_on_gpu
model_metrics = sweep(run_config, device, seed)
File "/data/home/epi/anomalib/tools/benchmarking/benchmark.py", line 233, in sweep
model_metrics = get_single_model_metrics(model_config=model_config, openvino_metrics=convert_openvino)
File "/data/home/epi/anomalib/tools/benchmarking/benchmark.py", line 99, in get_single_model_metrics
convert_to_openvino(model, openvino_export_path, model_config.model.input_size)
File "/data/home/epi/anomalib/tools/benchmarking/utils/convert.py", line 28, in convert_to_openvino
export_convert(model, input_size, onnx_path, export_path)
File "/data/home/epi/anomalib/anomalib/deploy/optimize.py", line 73, in export_convert
torch.onnx.export(
File "/data/home/epi/env_Anomalib/lib/python3.8/site-packages/torch/onnx/init.py", line 275, in export
return utils.export(model, args, f, export_params, verbose, training,
File "/data/home/epi/env_Anomalib/lib/python3.8/site-packages/torch/onnx/utils.py", line 88, in export
_export(model, args, f, export_params, verbose, training, input_names, output_names,
File "/data/home/epi/env_Anomalib/lib/python3.8/site-packages/torch/onnx/utils.py", line 689, in _export
_model_to_graph(model, args, verbose, input_names,
File "/data/home/epi/env_Anomalib/lib/python3.8/site-packages/torch/onnx/utils.py", line 463, in _model_to_graph
graph = _optimize_graph(graph, operator_export_type,
File "/data/home/epi/env_Anomalib/lib/python3.8/site-packages/torch/onnx/utils.py", line 223, in _optimize_graph
torch._C._jit_pass_onnx_graph_shape_type_inference(graph, params_dict, _export_onnx_opset_version)
RuntimeError: Exporting model exceed maximum protobuf size of 2GB. Please call torch.onnx.export with use_external_data_format=True.
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "tools/benchmarking/benchmark.py", line 164, in distribute_over_gpus
job.result()
File "/usr/lib/python3.8/concurrent/futures/_base.py", line 444, in result
return self.__get_result()
File "/usr/lib/python3.8/concurrent/futures/_base.py", line 389, in __get_result
raise self._exception
RuntimeError: Exporting model exceed maximum protobuf size of 2GB. Please call torch.onnx.export with use_external_data_format=True.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "tools/benchmarking/benchmark.py", line 258, in
distribute()
File "tools/benchmarking/benchmark.py", line 194, in distribute
distribute_over_gpus()
File "tools/benchmarking/benchmark.py", line 166, in distribute_over_gpus
raise Exception(f"Error occurred while computing benchmark on device {job}") from exc
Exception: Error occurred while computing benchmark on device <Future at 0x7efd9b83ae20 state=finished raised RuntimeError>

@FranLucchini
Copy link

I have the same issue while running on collab pro using a GPU

@ashwinvaidya17
Copy link
Collaborator

I think I linked the wrong issue. I haven't addressed this one yet. I am reopening it.

@ashwinvaidya17
Copy link
Collaborator

Can you tell me which version of torch and onnx do you have on colab? Also, how did you install anomalib? By cloning the repo?

@samet-akcay samet-akcay modified the milestones: v0.3.0, v0.3.1 Apr 22, 2022
@ashwinvaidya17
Copy link
Collaborator

Closing due to inactivity. Open again if you still see this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Something isn't working
Projects
None yet
4 participants