Deep Learning Convolutional Neural Network (CNN) example implementation #162

SimLeek · 2021-02-24T06:37:52Z

I didn't see them in the list of shaders, and searching "conv" and "convolution" in this repository didn't return much.

I have naive glsl shaders for convolutions (forwards and backwards), so I could convert those.

axsaucedo · 2021-02-24T06:50:37Z

Good morning @SimLeek - that's a very interesting question and suggestion. I am currently working on a revamp of the interface that will make it easier to interact with operations, and hence there will be an opportunity to start adding more "native" shaders provided by Kompute. If you have some example convolution shaders that would be awesome - if you can share some examples here we can explore how they could fit best in the repo. Thank you @SimLeek !

SimLeek · 2021-02-24T07:21:24Z

Good morning @axsaucedo! And you're welcome.

I gutted some library for glsl functions and tested convolutions until they worked here:

Forward: https://github.com/SimLeek/gltensors/blob/master/gltensors/shaders/dense_conv_forward_2d.glsl
Backward: https://github.com/SimLeek/gltensors/blob/master/gltensors/shaders/dense_conv_2d_backward.glsl
Runner: https://github.com/SimLeek/gltensors/blob/master/gltensors/GLSLComputer.py
Test/Example: https://github.com/SimLeek/gltensors/blob/master/tests/test_single_layer_autoencoder.py

They're 2D only for now, are set up for glsl compute shaders but not Vulkan quite yet, and don't have strides and other options fully set up and tested. However, I'd be happy to add support for all of that and link it in correctly if it'll help get more machine learning onto Vulkan. (It might take at least a few days though.)

axsaucedo · 2021-02-24T07:34:23Z

That looks awesome! Yeah that would be great - we currently only have one OpMult implementation that shows how to add a custom shared, but initially for simplicity you can try to get it working on the Python side, and I can then help porting it to the C++ as a native operation (or give you some pointers of where it would fit).

The only thing to mention is that currently Kompute doesn't support uniforms, but you should be able to do everything with buffers - I recently added Specialization Constants and I am just wrapping up functionality to add PushConstants, but it may be easiest to just use version 0.6.0 instead of master.

The easiest way to get something up and running quick could be by trying quickly the functionality via the colab notebooks https://github.com/EthicalML/vulkan-kompute#interactive-notebooks--hands-on-videos

SimLeek · 2021-02-24T07:41:41Z

Oh, yeah, the uniform vs buffer part is where it's not quite set up for Vulkan. I didn't think Vulkan supported uniforms.

Thanks for the tips. I'll read through those and try working on it tomorrow then (since it's just past midnight here).

alexander-g · 2021-02-24T16:48:01Z

@SimLeek I have generic 2D convolutions implemented in my vkJAX project. This includes padding. strides, dilated and transposed convolutions (and backwards which is just a combination of those parameters). The values are checked for correctness against the JAX CPU implementation. It's enough to run ResNet50 inference, training should also work but not really tested yet. However, it's not optimized at all at the moment, in fact it's slower than the JAX on CPU (i.e. BLAS/LAPACK). Speed tuning will come soon.

SimLeek · 2021-02-24T17:49:55Z

Looks like I might have some optimization in mine. I think I'll also look into fft and winograd optimizations and make some tests to see when the different variations are better.

axsaucedo · 2021-02-24T18:29:46Z

@alexander-g that's awesome! I didn't know you had implemented conv2d, as well as quite a few other really cool ones - I would be really keen to explore how to help increase the speed, potentially initially by adding them pre-compiled into native Operations - I am currently doing a significant re-write that will allow operations to be created outside of managers, so it will provide further functionality to create own operations even from python. I will have a look around what would be a good way to provide integration with further extensions and shaders.

@SimLeek in regards to the FFT and Winograd, that also sounds awesome - another contributor actually had shared some insights about his project implementing vulkan FFT and would be awesome explore how that could look implemented with Kompute (https://github.com/DTolm/VkFFT)

SimLeek · 2021-02-25T05:57:00Z

Alright, I looked into this more. Making convolutions fast was much, much harder than I thought, and not very much of the work is done in GLSL, however there is a bit in OpenCL.

I've got a fairly large todo list now:

Set up a codebase / notebook to benchmark different convolution implementations. (Might want to set this up as its own repository depending on colab compute limits)
Set up our naive convolutions on the codebase to use as baselines
Modify Input Tensor and Kernel for faster convolution with Winograd or other modifications. [1, 2, 3, 4, 5, 6, 7]
Convert/set up OpenCL Blast for Vulkan (and just keep the license/citation with the used files). [library]
Test how much each different BLAS method speeds up each different convolution method.
Or, implement one of the recent optimized matrix multiplication algorithms, either for convolutions and other special cases, or in general [1, 2].
Add FFT optimizations for the less usual cases when large kernels are used (>64) or there are many filters per layer, using the Vulkan FFT library [library, related]
Set up parameter tuning or in-depth benchmarks to determine when to switch between different convolution implementations. (This would likely run into colab compute limits)

Making the convolutions fast is pretty important. Unoptimized convolutions can be hundreds of times slower than optimized ones.

axsaucedo · 2021-02-26T18:42:45Z

@SimLeek that sounds really aweosme! One thing that I would be very keen to do is to identify the key optimizations that can make Kompute a simpler way to enable for approaching some of these more complex use-cases. I would love to hear your thoughts as you approach each - please let me know if you run into any blockers or if you have issues, happy to provide pointers or extend the framework as required.

SimLeek · 2021-02-28T04:38:31Z

@axsaucedo Sure! Right now I'm trying to use push constants and specialization constants, and seeing which would be better for multiple convolution passes. (And if they're actually supported on various GPUs). Is there support for those, or for accessing the mapping setup for those in Python?

Also, is there a way to keep the shader in memory as opposed to something like eval_async_algo_file_def? I see kp.Shader.compile_source in test_kompute.py, but that gives me a Cannot find reference 'Shader' in 'kp.py'.

Right now I'm planning on making a c++ project so I can test more of the core/lesser known Vulkan commands.

axsaucedo · 2021-02-28T06:49:09Z

@SimLeek great questions - the answers are yes and yes, but not in version 0.5.2, relevant extra functionality including push constants is being added via #164. More specifically:

Support for Spec Constants has now been added in master, these can be provided when triggering the algorithm
Shader has been added also in master so you are able to use it in the current dev installation
Push constants are being added as part of Amend memory hierarchy to enable for push constants and functional interface for more flexible operations #164 as well now that algorithms can be created upfront and outside of operations

I will be finishing the work for the new interface today, but it would be fantastic to hear your thoughts - you are able to try it yourself if you clone the branch and install it with pip install . from the top level repo. I would be keen to also get further insights of what's the best way to trigger the creation / update of push constants, currently what I have is via an OpAlgoPush operation that basically allows push constants to be updated - however I also am exploring updating it with just youralgo.push(kp.Constant([1,2,3]).

Let me know your thoughts, very keen to hear what you think.

this is the main reason for the refactor in #164.

unexploredtest · 2021-02-28T07:43:39Z

Also, is there a way to keep the shader in memory as opposed to something like eval_async_algo_file_def? I see kp.Shader.compile_source in test_kompute.py, but that gives me a Cannot find reference 'Shader' in 'kp.py'

Oh sorry I forgot(you're using the latest version from master?) to implement the new argument to the kp.Shader.compile_source function(why didn't the python tests fail?) and thus kp.Shader.compile_source doesn't work(it returns segfault in my case), working on implementing the new arguement.

axsaucedo · 2021-02-28T08:00:36Z

@aliPMPAINT it would be good to confirm whether he's using master or 0.5.2, as his error seems to be more of an import issue. Although you're saying that there is a segfault? I am currently doing tests in the PR linked above, and it seems to work - the integration tests are passing now so that should address the fix. One thing to mention is that by default the python installation uses the in-repo build of glslang as opposed to any installed one.

unexploredtest · 2021-02-28T08:16:00Z

Although you're saying that there is a segfault

Yeah I now realize that it doesn't have to do with kp.Shader.compile_source, I get it when I import kp, but the code works fine and I only get segfault at the end of each execution.

axsaucedo · 2021-02-28T08:52:44Z

@aliPMPAINT hmm this sounds like an issue, if you can replicate can you open a gh issue? We can also continue the discussion there

axsaucedo · 2021-02-28T17:57:13Z

As an update, I have merged #164 which introduces quite a lot features including support for push constants and specialization constants, so that will help some of the discussions in this thread. Just as a heads up I am also going to start exploring the development of a OpAlgoFactory class with the initial objective of speeding up the work from @alexander-g on vkjax by exploring how it can be made possible to provide both compiled shaders as well as allow for a one-time processing of shaders on initialisation with storage in same folder / home folder. I'll have a prototype soon, but any ideas are welcomed as performance optimizations will be the main focus towards the road for 1.0

axsaucedo · 2021-09-12T16:09:43Z

We now have a (very basic) VGG7 example in the repo with VGG7 #227

axsaucedo changed the title ~~Are there any vulkan convolution implementations available?~~ Deep Learning Convolutional Neural Network (CNN) example implementation Sep 12, 2021

axsaucedo closed this as completed Sep 12, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Deep Learning Convolutional Neural Network (CNN) example implementation #162

Deep Learning Convolutional Neural Network (CNN) example implementation #162

SimLeek commented Feb 24, 2021 •

edited

Loading

axsaucedo commented Feb 24, 2021

SimLeek commented Feb 24, 2021 •

edited

Loading

axsaucedo commented Feb 24, 2021

SimLeek commented Feb 24, 2021 •

edited

Loading

alexander-g commented Feb 24, 2021

SimLeek commented Feb 24, 2021

axsaucedo commented Feb 24, 2021

SimLeek commented Feb 25, 2021 •

edited

Loading

axsaucedo commented Feb 26, 2021

SimLeek commented Feb 28, 2021 •

edited

Loading

axsaucedo commented Feb 28, 2021

unexploredtest commented Feb 28, 2021 •

edited

Loading

axsaucedo commented Feb 28, 2021

unexploredtest commented Feb 28, 2021 •

edited

Loading

axsaucedo commented Feb 28, 2021

axsaucedo commented Feb 28, 2021

axsaucedo commented Sep 12, 2021

Deep Learning Convolutional Neural Network (CNN) example implementation #162

Deep Learning Convolutional Neural Network (CNN) example implementation #162

Comments

SimLeek commented Feb 24, 2021 • edited Loading

axsaucedo commented Feb 24, 2021

SimLeek commented Feb 24, 2021 • edited Loading

axsaucedo commented Feb 24, 2021

SimLeek commented Feb 24, 2021 • edited Loading

alexander-g commented Feb 24, 2021

SimLeek commented Feb 24, 2021

axsaucedo commented Feb 24, 2021

SimLeek commented Feb 25, 2021 • edited Loading

axsaucedo commented Feb 26, 2021

SimLeek commented Feb 28, 2021 • edited Loading

axsaucedo commented Feb 28, 2021

unexploredtest commented Feb 28, 2021 • edited Loading

axsaucedo commented Feb 28, 2021

unexploredtest commented Feb 28, 2021 • edited Loading

axsaucedo commented Feb 28, 2021

axsaucedo commented Feb 28, 2021

axsaucedo commented Sep 12, 2021

SimLeek commented Feb 24, 2021 •

edited

Loading

SimLeek commented Feb 24, 2021 •

edited

Loading

SimLeek commented Feb 24, 2021 •

edited

Loading

SimLeek commented Feb 25, 2021 •

edited

Loading

SimLeek commented Feb 28, 2021 •

edited

Loading

unexploredtest commented Feb 28, 2021 •

edited

Loading

unexploredtest commented Feb 28, 2021 •

edited

Loading