Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deep Learning Convolutional Neural Network (CNN) example implementation #162

Closed
SimLeek opened this issue Feb 24, 2021 · 17 comments
Closed

Comments

@SimLeek
Copy link

SimLeek commented Feb 24, 2021

I didn't see them in the list of shaders, and searching "conv" and "convolution" in this repository didn't return much.

I have naive glsl shaders for convolutions (forwards and backwards), so I could convert those.

@axsaucedo
Copy link
Member

Good morning @SimLeek - that's a very interesting question and suggestion. I am currently working on a revamp of the interface that will make it easier to interact with operations, and hence there will be an opportunity to start adding more "native" shaders provided by Kompute. If you have some example convolution shaders that would be awesome - if you can share some examples here we can explore how they could fit best in the repo. Thank you @SimLeek !

@SimLeek
Copy link
Author

SimLeek commented Feb 24, 2021

Good morning @axsaucedo! And you're welcome.

I gutted some library for glsl functions and tested convolutions until they worked here:

Forward: https://github.com/SimLeek/gltensors/blob/master/gltensors/shaders/dense_conv_forward_2d.glsl
Backward: https://github.com/SimLeek/gltensors/blob/master/gltensors/shaders/dense_conv_2d_backward.glsl
Runner: https://github.com/SimLeek/gltensors/blob/master/gltensors/GLSLComputer.py
Test/Example: https://github.com/SimLeek/gltensors/blob/master/tests/test_single_layer_autoencoder.py

They're 2D only for now, are set up for glsl compute shaders but not Vulkan quite yet, and don't have strides and other options fully set up and tested. However, I'd be happy to add support for all of that and link it in correctly if it'll help get more machine learning onto Vulkan. (It might take at least a few days though.)

@axsaucedo
Copy link
Member

That looks awesome! Yeah that would be great - we currently only have one OpMult implementation that shows how to add a custom shared, but initially for simplicity you can try to get it working on the Python side, and I can then help porting it to the C++ as a native operation (or give you some pointers of where it would fit).

The only thing to mention is that currently Kompute doesn't support uniforms, but you should be able to do everything with buffers - I recently added Specialization Constants and I am just wrapping up functionality to add PushConstants, but it may be easiest to just use version 0.6.0 instead of master.

The easiest way to get something up and running quick could be by trying quickly the functionality via the colab notebooks https://github.com/EthicalML/vulkan-kompute#interactive-notebooks--hands-on-videos

@SimLeek
Copy link
Author

SimLeek commented Feb 24, 2021

Oh, yeah, the uniform vs buffer part is where it's not quite set up for Vulkan. I didn't think Vulkan supported uniforms.

Thanks for the tips. I'll read through those and try working on it tomorrow then (since it's just past midnight here).

@alexander-g
Copy link
Contributor

@SimLeek I have generic 2D convolutions implemented in my vkJAX project. This includes padding. strides, dilated and transposed convolutions (and backwards which is just a combination of those parameters). The values are checked for correctness against the JAX CPU implementation. It's enough to run ResNet50 inference, training should also work but not really tested yet. However, it's not optimized at all at the moment, in fact it's slower than the JAX on CPU (i.e. BLAS/LAPACK). Speed tuning will come soon.

@SimLeek
Copy link
Author

SimLeek commented Feb 24, 2021

Looks like I might have some optimization in mine. I think I'll also look into fft and winograd optimizations and make some tests to see when the different variations are better.

@axsaucedo
Copy link
Member

@alexander-g that's awesome! I didn't know you had implemented conv2d, as well as quite a few other really cool ones - I would be really keen to explore how to help increase the speed, potentially initially by adding them pre-compiled into native Operations - I am currently doing a significant re-write that will allow operations to be created outside of managers, so it will provide further functionality to create own operations even from python. I will have a look around what would be a good way to provide integration with further extensions and shaders.

@SimLeek in regards to the FFT and Winograd, that also sounds awesome - another contributor actually had shared some insights about his project implementing vulkan FFT and would be awesome explore how that could look implemented with Kompute (https://github.com/DTolm/VkFFT)

@SimLeek
Copy link
Author

SimLeek commented Feb 25, 2021

Alright, I looked into this more. Making convolutions fast was much, much harder than I thought, and not very much of the work is done in GLSL, however there is a bit in OpenCL.

I've got a fairly large todo list now:

  • Set up a codebase / notebook to benchmark different convolution implementations. (Might want to set this up as its own repository depending on colab compute limits)
  • Set up our naive convolutions on the codebase to use as baselines
  • Modify Input Tensor and Kernel for faster convolution with Winograd or other modifications. [1, 2, 3, 4, 5, 6, 7]
  • Convert/set up OpenCL Blast for Vulkan (and just keep the license/citation with the used files). [library]
  • Test how much each different BLAS method speeds up each different convolution method.
  • Or, implement one of the recent optimized matrix multiplication algorithms, either for convolutions and other special cases, or in general [1, 2].
  • Add FFT optimizations for the less usual cases when large kernels are used (>64) or there are many filters per layer, using the Vulkan FFT library [library, related]
  • Set up parameter tuning or in-depth benchmarks to determine when to switch between different convolution implementations. (This would likely run into colab compute limits)

Making the convolutions fast is pretty important. Unoptimized convolutions can be hundreds of times slower than optimized ones.

@axsaucedo
Copy link
Member

@SimLeek that sounds really aweosme! One thing that I would be very keen to do is to identify the key optimizations that can make Kompute a simpler way to enable for approaching some of these more complex use-cases. I would love to hear your thoughts as you approach each - please let me know if you run into any blockers or if you have issues, happy to provide pointers or extend the framework as required.

@SimLeek
Copy link
Author

SimLeek commented Feb 28, 2021

@axsaucedo Sure! Right now I'm trying to use push constants and specialization constants, and seeing which would be better for multiple convolution passes. (And if they're actually supported on various GPUs). Is there support for those, or for accessing the mapping setup for those in Python?

Also, is there a way to keep the shader in memory as opposed to something like eval_async_algo_file_def? I see kp.Shader.compile_source in test_kompute.py, but that gives me a Cannot find reference 'Shader' in 'kp.py'.

Right now I'm planning on making a c++ project so I can test more of the core/lesser known Vulkan commands.

@axsaucedo
Copy link
Member

@SimLeek great questions - the answers are yes and yes, but not in version 0.5.2, relevant extra functionality including push constants is being added via #164. More specifically:

I will be finishing the work for the new interface today, but it would be fantastic to hear your thoughts - you are able to try it yourself if you clone the branch and install it with pip install . from the top level repo. I would be keen to also get further insights of what's the best way to trigger the creation / update of push constants, currently what I have is via an OpAlgoPush operation that basically allows push constants to be updated - however I also am exploring updating it with just youralgo.push(kp.Constant([1,2,3]).

Let me know your thoughts, very keen to hear what you think.

this is the main reason for the refactor in #164.

@unexploredtest
Copy link
Contributor

unexploredtest commented Feb 28, 2021

Also, is there a way to keep the shader in memory as opposed to something like eval_async_algo_file_def? I see kp.Shader.compile_source in test_kompute.py, but that gives me a Cannot find reference 'Shader' in 'kp.py'

Oh sorry I forgot(you're using the latest version from master?) to implement the new argument to the kp.Shader.compile_source function(why didn't the python tests fail?) and thus kp.Shader.compile_source doesn't work(it returns segfault in my case), working on implementing the new arguement.

@axsaucedo
Copy link
Member

@aliPMPAINT it would be good to confirm whether he's using master or 0.5.2, as his error seems to be more of an import issue. Although you're saying that there is a segfault? I am currently doing tests in the PR linked above, and it seems to work - the integration tests are passing now so that should address the fix. One thing to mention is that by default the python installation uses the in-repo build of glslang as opposed to any installed one.

@unexploredtest
Copy link
Contributor

unexploredtest commented Feb 28, 2021

Although you're saying that there is a segfault

Yeah I now realize that it doesn't have to do with kp.Shader.compile_source, I get it when I import kp, but the code works fine and I only get segfault at the end of each execution.

@axsaucedo
Copy link
Member

@aliPMPAINT hmm this sounds like an issue, if you can replicate can you open a gh issue? We can also continue the discussion there

@axsaucedo
Copy link
Member

As an update, I have merged #164 which introduces quite a lot features including support for push constants and specialization constants, so that will help some of the discussions in this thread. Just as a heads up I am also going to start exploring the development of a OpAlgoFactory class with the initial objective of speeding up the work from @alexander-g on vkjax by exploring how it can be made possible to provide both compiled shaders as well as allow for a one-time processing of shaders on initialisation with storage in same folder / home folder. I'll have a prototype soon, but any ideas are welcomed as performance optimizations will be the main focus towards the road for 1.0

@axsaucedo axsaucedo changed the title Are there any vulkan convolution implementations available? Deep Learning Convolutional Neural Network (CNN) example implementation Sep 12, 2021
@axsaucedo
Copy link
Member

We now have a (very basic) VGG7 example in the repo with VGG7 #227

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants