-
Notifications
You must be signed in to change notification settings - Fork 579
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Theano fft experimental version #5
Conversation
You probably want to rewrite my modification to the README, as my English need upgrade:) |
Theano fft experimental version
thanks Frederic! I will :) |
@nouiz do I have to reinstall theano? or is there a custom module that I have to install? Right now it gives me this:
|
The current FFT-based implementation in Theano depends on pyCUDA, and (unless they've modified it in the meantime) scikits.cuda. So you will need those two packages to be able to run this. |
@benanne could you help me modify the README.md in the theano section to have additional setup instructions. |
What kind of instructions do you mean? Just the added dependencies? |
right now these are the instructions:
Install pylearn2:
Launch the script:
|
I'm assuming I have to add install instructions for pycuda and scikit-learn, correct? |
I see. That should suffice for the legacy kernels and the wrapped cuda-convnet code. For the FFT-based implementation you will indeed need pycuda / scikits.cuda (not scikit-learn) as dependencies. |
cool, made some more progress, but still errors out, is there a version of pycuda/scikits.cuda that i require?
|
I installed the latest versions listed on pypi https://pypi.python.org/pypi/pycuda |
Yeah, that scikits.cuda version is too old, you'll need 0.5.0 at least. The cublassCgemmBatched wrapper was something I added when I worked on this. Sorry, I should have mentioned this earlier. |
ok great, works now! thanks. |
just to confirm (so that this place isn't another war zone), the fft version looks to be about 2x faster than ConvElemwise for the first case. Does that sound about right? |
Could be, I haven't tested the current implementation myself. It'll also depend on the input size a lot. I believe you're using 3 input feature maps at the moment - in my experience, the FFT-based version will be mostly beneficial when there are a lot of input feature maps, because this becomes the inner dimension of a batched dot product in the Fourier domain. Note that it will also have some overhead on the first run, because the FFT plan has to be created. Subsequent runs should be faster. |
indeed, it is very fast for the later layers. Full log here: |
very cool :) I should mention though that the Gflop/s metric doesn't really make sense for the FFT implementation, it's not actually performing this many floating point operations, the FFT approach just needs fewer. 7 Tflop/s is actually more than the maximum that the Titan is capable of (about 4.5 Tflop/s). I suppose the same goes for the Toeplitz matrix approach that Caffe uses, it will also need a different number of flops for a given convolution. That said it's still useful to see how many Gflop/s it is equivalent to compared to a naive implementation. |
I am changing the metrics as we speak :) |
One last question before I add this entry into the table:
|
you can find the code here: https://github.com/Theano/Theano/blob/master/theano/sandbox/cuda/fftconv.py Regarding usability: afaik there are tests, I don't know if anyone has tried using it 'in production' though. The main problem with it is that it uses a lot of memory, so it isn't applicable for every use case. No free lunch! :) By :backward() I assume you mean the gradient. The way this is implemented is as an optimization that replaces Theano's own ConvOp with the FFT-based one. Because this only happens in the optimization phase, the gradient has already been calculated at that point, so the convolutions that are part of the gradient are also replaced by their FFT versions automatically. In short, it does not have its own gradient implementation, but because of the way Theano works, this is not necessary. The implementation of Theano's own ConvOp is reused. |
Awesome, I am going to add this in for now with the information you provided. Thank you. I have finished up the benchmark code for all the other libraries (except ccv) for the backpropagation as well (i.e. calculate gradients wrt image, gradients wrt parameters). Would anyone from LISA Lab modify this test to add the backpropagation timings as well? |
Added! and made the table report numbers for 5 configurations. This module is currently the FASTEST! |
Cool! I have a feeling that might change when you get to the backward pass ;) |
I could help with making ccv work with your configuration. For larger kernel window, FFT is the best, although AlexNet and MattNet has small kernels. Probably small kernel works better just because we have fewer samples though. |
thanks, i will add the :backward numbers for all the modules this weekend (I've already finished it for ccn2, caffe and torch) i hope someone can change the Theano benchmark to incorporate the backward pass numbers as well. |
@liuliu that would be great! the way ccv benchmark works right now, it is a little hacky and i didn't find the time to modify cwc-bench for each of the configurations. Thanks! |
@soumith , yeah, looking closer to your layer configuration, it is a bit hard to gets ccv's number because each layer's output doesn't match the other. Due to all the assertion check for layer output / input consistency, you have to go all the way to call the actual CUDA kernel probably to get some numbers out. |
Also, you probably want to run with some real data (such as image or something) rather than the allocated uninitialized memory, I noticed that for all zero regions, my TITAN card sometimes cheats and have a shorter running time. |
I just saw the update table results. I think it you should keep the fft Also, maybe it would be great to add a section with the limitation of each Also, what about a new column with the extra temps memory needed for each On Tue, Jul 29, 2014 at 7:13 PM, Liu Liu notifications@github.com wrote:
|
This add the benchmark of Theano fft experimental version. I also try to make it more clear that this is work in progress and which conclusion can't be infered.