You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When moving the model to GPU using module.to(...), the program hangs for an extremely long time (almost half an hour) even though the model is tiny and can be created on GPU nearly instantly on python.
This is the code snippet where the freeze occurs. If I wait long enough, eventually the code continues fine, so I think CUDA is ok.
Module model = ...
DeviceGuard g = new DeviceGuard(new Device(torch.kCUDA()));
model.to(g.current_device()); // hangs here for a long time
Any help is appreciated!
The text was updated successfully, but these errors were encountered:
The binaries are compiled only for a single GPU architecture, so it's probably just JIT compiling PTX code for your GPU architecture. If you set your compute cache large enough to something like 256 MB, it should only do it once. You could also build from source for your architecture, but we cannot easily build the binaries with more architectures on GitHub Actions since that would take longer than the hard limit of 6 hours, and I don't have the resources to maintain any additional infrastructure.
Another thing you could do is use the binaries from LibTorch. Simply extract it somewhere on your system, include its libraries somewhere in your system PATH, and set the "org.bytedeco.javacpp.pathsFirst" system property to "true" before loading anything with JavaCPP.
Incidentally, that is also one way that you could make JavaCPP load the same libraries as DJL.
/cc @frankfliu@stu1130
When moving the model to GPU using
module.to(...)
, the program hangs for an extremely long time (almost half an hour) even though the model is tiny and can be created on GPU nearly instantly on python.This is the code snippet where the freeze occurs. If I wait long enough, eventually the code continues fine, so I think CUDA is ok.
Any help is appreciated!
The text was updated successfully, but these errors were encountered: