Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds distributed row gatherer #1589

Open
wants to merge 8 commits into
base: neighborhood-communicator
Choose a base branch
from

Conversation

MarcelKoch
Copy link
Member

@MarcelKoch MarcelKoch commented Apr 4, 2024

This PR adds a distributed row gatherer. This operator essentially provides the communication required in our matrix apply.

Besides the normal apply (which is blocking), it also provides two asynchronous calls. One version has an additional workspace parameter which is used as send buffer. This version can be called multiple times without restrictions, if different workspaces are used for each call. The other version doesn't have a workspace parameter, and instead uses an internal buffer. As a consequence, this function can only be called a second time, if the request of the previous call has been waited on. Otherwise, this function will throw.

This is the second part of splitting up #1546.

It also introduces some intermediate changes, which could be extracted out beforehand:

PR Stack:

@MarcelKoch MarcelKoch self-assigned this Apr 4, 2024
@ginkgo-bot ginkgo-bot added reg:build This is related to the build system. reg:testing This is related to testing. mod:core This is related to the core module. type:matrix-format This is related to the Matrix formats labels Apr 4, 2024
@MarcelKoch MarcelKoch requested a review from pratikvn April 4, 2024 10:49
@MarcelKoch MarcelKoch force-pushed the distributed-row-gatherer branch 2 times, most recently from 49557f1 to 4a79442 Compare April 5, 2024 08:18
@MarcelKoch MarcelKoch modified the milestone: Ginkgo 1.8.0 Apr 5, 2024
@MarcelKoch MarcelKoch requested a review from upsj April 19, 2024 09:20
@MarcelKoch MarcelKoch mentioned this pull request Apr 19, 2024
7 tasks
@MarcelKoch MarcelKoch force-pushed the distributed-row-gatherer branch 2 times, most recently from 98fa10a to 79de4c3 Compare April 19, 2024 16:19
@MarcelKoch
Copy link
Member Author

One issue that I have is the constructor. It takes a collective_communicator and an index_map. The index_map already defines the communication pattern, so the collective_communicator has to match that.
One option might be to have a virtual function like

std::unique_ptr<collective_communicator> create_with_same_type(communicator, index_map);

If I can't come up with anything better, I guess I will use that.

@pratikvn
Copy link
Member

Do we need to have the std::future setup for the release ? Can we remove that for now and just use a normal synchronous approach ? I think that is a significant change that maybe needs more thought and probably a separate PR.

@MarcelKoch MarcelKoch force-pushed the distributed-row-gatherer branch 2 times, most recently from 8ec2b02 to 58d06ed Compare July 10, 2024 13:03
Copy link
Member

@pratikvn pratikvn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments regarding synchronizations.

core/distributed/row_gatherer.cpp Outdated Show resolved Hide resolved
core/distributed/row_gatherer.cpp Outdated Show resolved Hide resolved
core/distributed/row_gatherer.cpp Outdated Show resolved Hide resolved
@MarcelKoch MarcelKoch force-pushed the neighborhood-communicator branch 2 times, most recently from c9d5643 to 8e3e932 Compare July 16, 2024 07:43
@MarcelKoch MarcelKoch requested a review from pratikvn July 16, 2024 07:43
@MarcelKoch MarcelKoch force-pushed the distributed-row-gatherer branch 2 times, most recently from 98dcc4f to 5e970e9 Compare July 18, 2024 17:00
Copy link
Member

@pratikvn pratikvn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really nice work! LGTM!

namespace distributed {


#if GINKGO_HAVE_OPENMPI_POST_4_1_X
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be PRE and the other way around ? I dont think there is a macro like this defined.

int is_inactive;
MPI_Status status;
GKO_ASSERT_NO_MPI_ERRORS(
MPI_Request_get_status(req_listener_, &is_inactive, &status));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we maybe move this MPI function into mpi.hpp and create a wrapper for it ?


mutable array<char> send_workspace_;

mutable MPI_Request req_listener_{MPI_REQUEST_NULL};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be of type mpi::request ?

template <typename LocalIndexType>
void RowGatherer<LocalIndexType>::apply_impl(const LinOp* alpha, const LinOp* b,
const LinOp* beta, LinOp* x) const
GKO_NOT_IMPLEMENTED;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can also implement the advanced apply by replacing b_local->row_gather(idxs, buffer) by b_local->row_gather(alpha, idxs, beta, buffer) ?

@MarcelKoch MarcelKoch requested review from upsj and removed request for upsj August 27, 2024 12:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
1:ST:ready-for-review This PR is ready for review mod:core This is related to the core module. reg:build This is related to the build system. reg:testing This is related to testing. type:matrix-format This is related to the Matrix formats
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants