-
Notifications
You must be signed in to change notification settings - Fork 418
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(ClientGoalHandle): Made mutex recursive to prevent deadlocks #2267
Merged
Merged
Changes from all commits
Commits
Show all changes
2 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alternative idea, do we need to hold the lock here? What if we hold the lock in a scope
{...}
only long enough to copyfeedback_callback_
to a local variable, then release the lock before calling the callback? I think that solves the issue and avoids (probably small) overhead of using a recursive mutex.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure that will work.
There are a couple of assumptions at work here:
shared_this
being passed in here is directly the same asthis
. That is a weird API, but OK. In that case, we can consider that everything that would need to be protected from concurrent access tothis
also applies toshared_this
.handle_mutex_
here is protecting all members of this object from concurrent modification/access.Taken together, I think it means that we need to hold the mutex across the entire operation (since
feedback_callback_
is passedshared_this
). Does that make sense? Or am I maybe missing something?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If
shared_this
isn'tthis
then holding a lock onthis->handle_mutex_
isn't going to help as the feedback callback is givenshared_this
, notthis
. I think we can ignore this though because there is a check thatshared_this.get() == this
above.IIUC holding the lock during the feedback's entire call would protect against a situation where the goal handle changed state between two feedback calls. Just getting an idea for what could change, it looks like all the public methods are:
All of those methods only "read" data from the class, so the concern would be a "write" somewhere else at the same time as the feedback callback. "writes" could come from the
Client
as it's a friend of this class.I'm choosing to ignore
get_goal_id()
andget_goal_stamp()
as they're not even protected by locks. They assumeinfo_
is constant (side note: I think this could be enforced by adding const here).get_status
is protected by a lock. If we wrap thefeedback_callback_
in the lock then it's possible the feedback callback could call get status twice and get different results (executing one call, then complete the next).is_feedback_aware()
is protected by a lock and means "does this goal handle have a feedback callback"? If the callback is being executed, it already knows the answer and doesn't have a need to call it. I suppose it's possible the feedback callback could get set to anullptr
while this one is running andis_feedback_aware()
might returnfalse
if this function called it, but I wasn't able to find any calls toclient_goal_handle->set_feedback_callback()
anywhere (side note: if it's private and no one calls it, can we delete it?).is_result_aware()
is protected by a lock and means "does someone want the result of this goal"? It's possible that someone could write a feedback callback that they gave access to theClient
and only asked for the result if the feedback reports a certain value. If they were using multithreaded executor and put the Client into a Reentrant callback group and feedback messages came in fast enough then two feedback callbacks could be executed simultaneously and both would call client->async_get_result(). There can only be one result callback, so the second's feedback callback would replace the first's.Affecting decision making in the
feedback_callback
based onis_result_aware()
seems like the only potential negative consequence to me, and it requires the user passing in theClient
to the callback some other way.All that to say, I'm fine with the current approach, but I don't see any significant problems with releasing the lock during the feedback callback either. All holding the lock seems to do is make sure two feedback callbacks on one goal handle can't be run in parallel regardless of the executor or callback group.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking the same thing, i guess we can just remove this. but originally this should have been exposed to user application to change the feedback callback? i was not sure about that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think releasing the mutex is a bad idea.
If a reentrant cb group is used, not holding the mutex could lead to strange stuff like the handle->getStatus() function changing the result in the middle of the feedback callback.
I guess this is the case, why the call_feedback_callback function exists in the first place.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sloretz Thanks for the detailed analysis, that is very helpful. Here are a few comments:
To be clear, we are already holding the lock across the
feedback_callback_
right now. The difference in this PR is that we are switching that to arecursive_mutex
. But your overall point stands that we could consider dropping the mutex before making that call.Considering your points, I agree that given the code today we could probably drop the lock without too many negative consequences. But I think I still prefer to keep holding the lock for two reasons:
rclcpp
, particularly around parameters, so we also hold a lock around that for similar reasons.So all of that is to say that I'm good with this PR as-is. I'm going to do a review here, and then we can run CI on it.