-
-
Notifications
You must be signed in to change notification settings - Fork 55.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cv::cuda::transpose() limitation #22782
Comments
PR proposal : opencv/opencv_contrib#3371 |
Its possible the other data types didn't exist in npp when this was implemented or they could have been buggy or slow. What is the performance impact of using npp for |
|
I would think it should be OK, in the past npp was notoriously bad under some circumstances, unecessary start up costs for some routines, wrong results for others but I guess it should be OK for a transpose operation on SDK 11+. The only issue could be that those using SDK 9 or older may see a slow down in going from If it was me I would run the test cases before and after the change through Nvidia Nsight Systems to see if there is anything off or look at the results from the perf tests if they exist. |
I have interesting results after a first performance investigation. The observation tends to show that nppiTranspose always performs better than gridTranspose() for an equivalent elemSize My results :
|
After a few more tests, using However, this is still only tested on a GTX 750 limited to cuda 5.0 compute capability. |
System Information
OpenCV 4.6.0
Windows 7 64 bits
Visual Studio 2019 (latest)
NVidia CUDA SDK (10.2)
Detailed description
As claimed by the doc, CV_16UC1 is currently not supported by cv::cuda::transpose()
Internally, it is limited by
CV_Assert( elemSize == 1 || elemSize == 4 || elemSize == 8 );
However, the limitation is hard to understand.
Currently, for
(elemSize == 1)
,nppiTranspose_8u_C1R()
is calledHowever,
nppiTranspose_16u_C1R()
does exist (among others). I looked at NPPI old release notes, it was already present in the CUDA SDK 8.0 (https://docs.nvidia.com/cuda/archive/8.0/pdf/NPP_Library_Image_Support_And_Data_Exchange.pdf)Thus, I don't understand why only
nppiTranspose_8u_C1R()
is to be used.Steps to reproduce
Issue submission checklist
The text was updated successfully, but these errors were encountered: