Split batched solver compilation #1629

MarcelKoch · 2024-06-24T08:46:31Z

This PR splits up the compilation of the batched solvers in order to reduce the compilation times. It splits up the instantiations of the kernel launches depending on the number of vectors in shared memory. This is based on the same CMake mechanism as for the csr and fbcsr kernels.

upsj · 2024-06-27T19:44:19Z

This should have a huge impact, excerpt from the HIP 5.14 debug build log

6534.89 hip/CMakeFiles/ginkgo_hip.dir/solver/batch_bicgstab_kernels.hip.cpp.o

MarcelKoch · 2024-06-28T10:58:40Z

core/solver/batch_dispatch.hpp

+#define GKO_BATCH_INSTANTIATE_STOP(macro, ...)                          \
+    macro(__VA_ARGS__,                                                  \
+          ::gko::batch::solver::device::batch_stop::SimpleAbsResidual); \
+    template macro(                                                     \


the template here (and in the other macros below) could be removed, if the value/index type instantiation macros would accept variable number or arguments.

That doesn't work until C++20. A macro with (arg, ...) requires two arguments before c++20.

pratikvn

In general, the idea looks good, but the pipelines are failing.

One thing against this approach is the readability and maintainability is seriously affected. The already complex batched code is even more complex and annoying to read now. We should maybe see if instead we dont do this split approach and instead maybe do what Jacobi does and have fewer cases as default, and only have full instantiations when necessary.

cuda/solver/batch_bicgstab_kernels.cuh

MarcelKoch · 2024-07-05T12:59:55Z

IMO the Jacobi instantiation is more complex than what is here. The kernel and the instantiations are directly together, instead of being generated by CMake, which makes it easier to follow for me.
I also merged the two .cpp files per solver, perhaps that can simplify things a bit again.

But I agree that the batch system needs an overhaul in general.

If the assert condition contains a `%` weird things could happen. It was interpreted as a format specifier, which leads to errors/warnings.

pratikvn · 2024-07-15T08:32:56Z

An alternative approach: https://github.com/ginkgo-project/ginkgo/tree/batch-optim

MarcelKoch · 2024-07-15T10:53:43Z

An alternative approach: https://github.com/ginkgo-project/ginkgo/tree/batch-optim

This seems to be quite orthogonal to this PR. With full optimizations enabled, there would be the same issue as before, so the fix from this PR is still needed. I don't see a reason why we should burden people that want the full optimizations enabled with those long compile times, for which we already have a fix available.
But we could add this into this PR.

MarcelKoch self-assigned this Jun 24, 2024

MarcelKoch force-pushed the split-batched-solver-compilation branch from 259f2c1 to 8c25a83 Compare June 24, 2024 11:24

MarcelKoch commented Jun 28, 2024

View reviewed changes

pratikvn requested changes Jul 5, 2024

View reviewed changes

cuda/solver/batch_bicgstab_kernels.cuh Outdated Show resolved Hide resolved

MarcelKoch force-pushed the split-batched-solver-compilation branch from 8c25a83 to 870ad69 Compare July 5, 2024 12:47

MarcelKoch force-pushed the split-batched-solver-compilation branch 3 times, most recently from ff4777c to d04f06c Compare July 8, 2024 10:46

MarcelKoch added 4 commits July 9, 2024 09:41

[batch] provide default index type for matrix device types

6cb7fd8

[batch] handle constness of index type same as value type

fe81b65

[batch] add macro to instantiate batched solver

0481324

[core] fix print formatting in (dumb) assert

de5346c

If the assert condition contains a `%` weird things could happen. It was interpreted as a format specifier, which leads to errors/warnings.

MarcelKoch force-pushed the split-batched-solver-compilation branch from d04f06c to fa6d091 Compare July 9, 2024 07:42

MarcelKoch requested a review from pratikvn July 9, 2024 07:42

MarcelKoch added 4 commits July 10, 2024 09:32

[batch] split bicgstab compilation (hip)

6b5b15d

[batch] split bicgstab compilation (cuda)

b05def9

[batch] split cg compilation (hip)

e5bb9aa

[batch] split cg compilation (cuda)

e59ab55

MarcelKoch force-pushed the split-batched-solver-compilation branch from fa6d091 to e59ab55 Compare July 10, 2024 07:36

pratikvn mentioned this pull request Aug 5, 2024

Temporarily disable optimized batched solver instantiations #1652

Merged

pratikvn mentioned this pull request Aug 5, 2024

Enable batched optimizations and split solver instantiations. #1661

Open

MarcelKoch marked this pull request as draft August 9, 2024 09:31

MarcelKoch added the 1:ST:WIP This PR is a work in progress. Not ready for review. label Aug 15, 2024

MarcelKoch added this to the Ginkgo 1.9.0 milestone Aug 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Split batched solver compilation #1629

Split batched solver compilation #1629

MarcelKoch commented Jun 24, 2024

upsj commented Jun 27, 2024

MarcelKoch Jun 28, 2024

MarcelKoch Jul 5, 2024

pratikvn left a comment

MarcelKoch commented Jul 5, 2024

pratikvn commented Jul 15, 2024

MarcelKoch commented Jul 15, 2024

Split batched solver compilation #1629

Are you sure you want to change the base?

Split batched solver compilation #1629

Conversation

MarcelKoch commented Jun 24, 2024

upsj commented Jun 27, 2024

MarcelKoch Jun 28, 2024

Choose a reason for hiding this comment

MarcelKoch Jul 5, 2024

Choose a reason for hiding this comment

pratikvn left a comment

Choose a reason for hiding this comment

MarcelKoch commented Jul 5, 2024

pratikvn commented Jul 15, 2024

MarcelKoch commented Jul 15, 2024