Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add initial OpenCV CUDA bindings #416

Merged
merged 10 commits into from
Dec 19, 2017
Merged

Conversation

SamCarlberg
Copy link
Contributor

@SamCarlberg SamCarlberg commented May 5, 2017

Creates bindings for:

Header Class
opencv2/core/cuda.hpp opencv_cuda
opencv2/cudaimgproc.hpp opencv_cudaimgproc
opencv2/cudafilters.hpp opencv_cudafilters

Addresses (but doesn't solve) #364, bytedeco/javacv#481

Bindings for opencv2/cuda.hpp, opencv2/cudaimgproc.hpp, opencv2/cudafilters.hpp
Update cppbuild.sh to use all available cores (was hardcoded to 4)
Removes the generated opencv_core.Stream class, but that was generated from a forward declaration in the header file so it wasn't useful anyway
@saudet
Copy link
Member

saudet commented May 5, 2017

Looking good thanks! There are few things to fix before we can merge this though:

  • Since CUDA is available only on some platforms, we need to list them in the annotations. Something like the following should work:
@Platform(value = {"linux-x86_64", "linux-arm64", "linux-ppc64le", "macosx-x86_64", "windows-x86_64"}, ...
  • And so -DWITH_CUDA=ON cannot be used on other platforms in the cppbuild.sh script.
  • The content of opencv_photo and opencv_shape changes, I wonder why...?

@SamCarlberg
Copy link
Contributor Author

I think opencv_photo and opencv_shape were changed because I removed the generated class for the forward declaration of cv::cuda::Stream in opencv_core I guess I could change those presets to inherit from opencv_cuda to get the real class declaration.

@saudet
Copy link
Member

saudet commented May 6, 2017 via email

CUDA remains enabled for linux x86_64, linux arm64, linux ppc64le, mac os, windows x86_64
@SamCarlberg
Copy link
Contributor Author

SamCarlberg commented May 6, 2017

I'm not too happy about stuff like this:

public native void detectMultiScale(@ByVal Mat image,
                                    @ByVal Mat objects);
public native void detectMultiScale(@ByVal UMat image,
                                    @ByVal UMat objects);
public native void detectMultiScale(@ByVal GpuMat image,
                                    @ByVal GpuMat objects);

that should really only accept GpuMat parameters like this:

public native void detectMultiScale(@ByVal GpuMat image,
                                    @ByVal GpuMat objects);

FWIW, this method is generated from this declaration in opencv/cudaobjdetect.hpp:

virtual void detectMultiScale(InputArray image,
                              OutputArray objects,
                              Stream& stream = Stream::Null()) = 0;

But InputArray and OutputArray need to be aliased as Mat or UMat for some conversion code, as well as the methods needed for uploading and downloading mats in host memory to device memory (GpuMat::upload and GpuMat::download). What would be a good solution to this problem?

@saudet
Copy link
Member

saudet commented May 7, 2017

We can remove "Mat" and "UMat" from the redefinition, or do we need them somewhere? If it's just one place or two, we can use Info.javaText() for those.

BTW, the build is failing, on Windows:

Failed to execute JavaCPP Builder: Could not parse "opencv2\cudafilters.hpp": File does not exist

@saudet
Copy link
Member

saudet commented May 11, 2017

Ouch, it looks like OpenCV builds for all these archs by default:

--     NVIDIA GPU arch:             20 30 35 37 50 52 60 61

Limiting that to say 30 will reduce compile time by 8. And we could also use CUDA_ARCH_PTX to create binaries that will still work on all architectures.

@saudet
Copy link
Member

saudet commented May 11, 2017

/cc @vb216 It looks like it's failing on AppVeyor because CUDA isn't installed when building OpenCV.

@SamCarlberg
Copy link
Contributor Author

Limiting that to say 30 will reduce compile time by 8. And we could also use CUDA_ARCH_PTX to create binaries that will still work on all architectures.

That sounds good. I'll try tackling that later this week.

SamCarlberg added a commit to SamCarlberg/GRIP that referenced this pull request May 12, 2017
Currently requires custom CUDA bindings from javaccp-presets (see bytedeco/javacpp-presets#416)
Trying to use CUDA-accelerated operations without an nvidia GPU and running nvidia drivers will probably crash the app
This should speed up compilation of the CUDA modules by about 6x and cut down the size of the output jar by about 60% (vs all CUDA platorms being buit)
@SamCarlberg
Copy link
Contributor Author

21c4f4d only compiles the CUDA stuff for 3.0 + PTX. Everything should still work, but I had some weird issues with double frees by the PTX JIT compiler. Full dump here. I'm not sure how robust the PTX stuff is.

@saudet
Copy link
Member

saudet commented May 14, 2017

Thanks! Now the build finishes on time for Linux, but still not for Mac :/

@vb216 is looking into stages, yes. It won't work for Windows with AppVeyor though :(

So, recent versions of CUDA have issues compiling older PTX versions? That's unfortunate...

@SamCarlberg
Copy link
Contributor Author

SamCarlberg commented May 14, 2017

Looks like the mac build was cancelled because the log got too big from warnings about unused parameters and functions in the core and cudev modules (mostly from stubs as far as I can tell)

And it's a problem with the PTX JIT. Caused that double free and probably a bunch of other issues I'm randomly getting when the cuda module gets loaded. Never had any issues with the build my machine uses natively (sm_50, GTX 950m)

@saudet
Copy link
Member

saudet commented May 16, 2017

@vb216 What do you think? Should we merge this?

@vb216
Copy link
Member

vb216 commented May 16, 2017 via email

@saudet
Copy link
Member

saudet commented May 16, 2017

I see why some modules like opencv_photo change. It's because we need to update their dependencies. In the case of opencv_photo, it's easy. We just need to replace opencv_imgproc.class by opencv_cudaimgproc.class, but others like opencv_stitching depend on modules that are not currently mapped:

$ readelf -d libopencv_stitching.so.3.2 | grep libopencv_cuda
 0x0000000000000001 (NEEDED)             Shared library: [libopencv_cudafeatures2d.so.3.2]
 0x0000000000000001 (NEEDED)             Shared library: [libopencv_cudalegacy.so.3.2]
 0x0000000000000001 (NEEDED)             Shared library: [libopencv_cudawarping.so.3.2]
 0x0000000000000001 (NEEDED)             Shared library: [libopencv_cudaimgproc.so.3.2]
 0x0000000000000001 (NEEDED)             Shared library: [libopencv_cudafilters.so.3.2]
 0x0000000000000001 (NEEDED)             Shared library: [libopencv_cudaarithm.so.3.2]

We'll need to add those to the preload list so they get properly loaded...

@saudet
Copy link
Member

saudet commented May 31, 2017

Not sure how I missed that, but when built with CUDA, it looks like all libraries become dependent on CUDA! Check the output of ldd libopencv_core.so | grep cu:

libopencv_cudev.so.3.2 => .../javacpp-presets/opencv/cppbuild/linux-x86_64/lib/libopencv_cudev.so.3.2 (0x00007fc1b5277000)
libcudart.so.8.0 => /usr/local/cuda/lib64/libcudart.so.8.0 (0x00007fc1b49bf000)
libnppc.so.8.0 => /usr/local/cuda/lib64/libnppc.so.8.0 (0x00007fc1b474f000)
libnppi.so.8.0 => /usr/local/cuda/lib64/libnppi.so.8.0 (0x00007fc1add2b000)
libnpps.so.8.0 => /usr/local/cuda/lib64/libnpps.so.8.0 (0x00007fc1ad335000)
libcufft.so.8.0 => /usr/local/cuda/lib64/libcufft.so.8.0 (0x00007fc1a44e6000)

We can't have that. How should we deal with this?

@SamCarlberg
Copy link
Contributor Author

@saudet From my understanding, most modules will compile with CUDA support if the flag is set. If OpenCV loads the CUDA library only when a CUDA function is called (ie not at startup), like how Java does it, then I think it's okay. But if not, I think there would have to be separate builds for CUDA and non-CUDA versions

@saudet
Copy link
Member

saudet commented Jun 30, 2017

All the libraries link statically with CUDA, so users would have to install it. It's ok for something like Caffe where most users use a GPU anyway, but that's not the case with OpenCV.

@jcfrk101
Copy link

jcfrk101 commented Aug 1, 2017

Has anyone decided how to proceed? A separate CUDA build or not?

@saudet
Copy link
Member

saudet commented Aug 1, 2017 via email

@saudet
Copy link
Member

saudet commented Sep 8, 2017

I've added an "extensions" feature to JavaCPP to allow us to have more than one set of binaries per platform: bytedeco/javacpp@35463a6. We still need to modify the pom.xml files of the presets a bit, but it should work as follows:

  1. Execute the build as usual, "mvn install"
  2. Reexecute once more, but with the new "extension" plugin property set to say "-cuda". (This will build or pick up libraries in cppbuild/<platform>-cuda subdirectories.), say "mvn install -Djavacpp.extension=-cuda".

Furthermore, we should be able to create Maven profiles in the parent pom.xml to override the list of modules to only list the submodules that support say "cuda", and another list for say "avx512", etc.

At runtime, JavaCPP will look at the @Platform(extensions={"-cuda", "-avx512", ...} ... annotation value, and try to load them one by one, only trying the default one without extension name at the end if all else fails. The idea is that users wanting to use the "-cuda" one would have to put that artifact in their class path manually, if not only the default one would get added by the "-platform" artifact.

Thoughts?

/cc @akdeoras

@SamCarlberg
Copy link
Contributor Author

That's probably the best solution since OpenCV will break if built with CUDA and is loaded on a system that doesn't have CUDA support.

Side note, Travis has beta support for build stages now so it should be possible to make the mac builds not time out any more. https://docs.travis-ci.com/user/build-stages/

@saudet
Copy link
Member

saudet commented Sep 8, 2017

I seem to recall there was still an issue with using stages and that's why we're not using them, @vb216 ? Also, it's not available on AppVeyor...

@vb216
Copy link
Member

vb216 commented Sep 9, 2017

Yeah problem is that there's no means of sharing output between build stages, there's issues open with Travis and Appveyor on this as alot of people are keen on that functionality.

Thinking of using S3 to upload build output to, making it available to dependencies and then splitting out the long opencv chain.

@saudet saudet merged commit 7443377 into bytedeco:master Dec 19, 2017
@saudet
Copy link
Member

saudet commented Dec 19, 2017

Thanks for the contribution @SamCarlberg! Let me know if you encounter any problems using those functions.

@SamCarlberg SamCarlberg deleted the opencv-cuda branch January 18, 2018 00:55
SamCarlberg added a commit to SamCarlberg/GRIP that referenced this pull request Apr 9, 2019
Currently requires custom CUDA bindings from javaccp-presets (see bytedeco/javacpp-presets#416)
Trying to use CUDA-accelerated operations without an nvidia GPU and running nvidia drivers will probably crash the app
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants