Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add specific communicator for neighborhood communication #1588

Open
wants to merge 10 commits into
base: index-map-pgm
Choose a base branch
from

Conversation

MarcelKoch
Copy link
Member

@MarcelKoch MarcelKoch commented Apr 4, 2024

This PR adds a communicator that only handles neighborhood all-to-all communication. It implements the new interface collective_communicator, which provides different implementations for a selected set of collective mpi routines. Currently, this only includes the non-blocking all-to-all.

The communication uses a fixed pattern, i.e. the send/recv sizes are fixed when the neighborhood communicator is constructed. I would have liked to decouple that, but this would require some knowledge of how the sizes are stored at the interface level. If someone has an idea for that, please let me know.

This is the first part of splitting up #1546.

The neighborhood all-to-all has a bug in OpenMPI < v4.1.0, which makes it necessary to disable the neighborhood communicator in this case. As replacement, there is also a dense all-to-all communicator.

Todo:

  • documentation

PR Stack:

@MarcelKoch MarcelKoch self-assigned this Apr 4, 2024
@ginkgo-bot ginkgo-bot added reg:build This is related to the build system. reg:testing This is related to testing. mod:core This is related to the core module. labels Apr 4, 2024
@MarcelKoch MarcelKoch requested a review from pratikvn April 4, 2024 10:41
@MarcelKoch MarcelKoch force-pushed the read-distributed-with-index-map branch from b42ab92 to 8f104fd Compare April 4, 2024 11:00
@MarcelKoch MarcelKoch modified the milestone: Ginkgo 1.8.0 Apr 5, 2024
@MarcelKoch MarcelKoch requested a review from upsj April 19, 2024 09:19
@MarcelKoch MarcelKoch mentioned this pull request Apr 19, 2024
7 tasks
@MarcelKoch MarcelKoch force-pushed the read-distributed-with-index-map branch from 8f104fd to a0824a8 Compare April 19, 2024 14:39
@MarcelKoch MarcelKoch force-pushed the read-distributed-with-index-map branch from a0824a8 to 8ad3f2f Compare April 22, 2024 11:11
@MarcelKoch MarcelKoch mentioned this pull request Apr 30, 2024
6 tasks
@MarcelKoch MarcelKoch force-pushed the read-distributed-with-index-map branch from 8ad3f2f to 26678b3 Compare April 30, 2024 13:41
@MarcelKoch MarcelKoch force-pushed the read-distributed-with-index-map branch from 26678b3 to 006d67d Compare April 30, 2024 15:20
@MarcelKoch MarcelKoch force-pushed the read-distributed-with-index-map branch from 006d67d to b295b11 Compare May 2, 2024 10:04
@MarcelKoch
Copy link
Member Author

Looks good to me. But there is an issue with OpenMPI that needs to be resolved. It seems that OpenMPI versions < 5.0.x still have this bug

I'm not sure what you mean here. I already have this workaround to use non-null pointers, which should fix the issue.

@pratikvn
Copy link
Member

The null test is still failing here. I thought that was because of the null arguments ?

@MarcelKoch
Copy link
Member Author

Thanks for bringing that up. I guess I've not looked really at the tests in quite a while.
Just a remark to the OpenMPI issue you linked, that one is for the persistent all-to-all-v call, which I'm not using (yet). So the issue shouldn't be relevant to the test failures.

@MarcelKoch
Copy link
Member Author

So right now, I'm leaning to just deactivate the neighborhood communicator if the openmpi version isn't sufficient. The CollectiveCommunicator interface is meant to provide different implementations of the all-to-all call, so using a dense all-to-all here feels odd. I think I would rather have a derived class that only does the dense all-to-all and use that instead.

@pratikvn
Copy link
Member

Yes, but the fix is missing in older versions of 4.0.x as well. I am not entirely sure which exact version of OpenMPI our container uses, but it is 4.0.x, and 4.0.7 is still missing the fix. I think the fix was added only in 4.1.x.

But in general, I agree. I think we can just disable the neighborhood communicator for older versions of OpenMPI.

Copy link
Member

@pratikvn pratikvn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly looks good. Some missing tests and doc updates needed

core/distributed/dense_communicator.cpp Show resolved Hide resolved
include/ginkgo/core/distributed/dense_communicator.hpp Outdated Show resolved Hide resolved
- fix include guards
- update docs
- implement copy/move constructors/assignment with tests
- add equality test for collective communicators (needed for testing)
- always enable neighborhood comm, just throw if openmpi is too old

Co-authored-by: Pratik Nayak <pratik.nayak@kit.edu>
@MarcelKoch MarcelKoch requested review from upsj and removed request for upsj August 27, 2024 12:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
1:ST:ready-for-review This PR is ready for review mod:core This is related to the core module. reg:build This is related to the build system. reg:testing This is related to testing.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants