Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix callback group logic in executor #2476

Closed
wants to merge 2 commits into from

Conversation

mjcarroll
Copy link
Member

@mjcarroll mjcarroll commented Apr 2, 2024

Fixes #2474

CC: @jmachowinski

Copy link
Collaborator

@alsora alsora left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it easy to create a unit-test that would fail without this PR and succeeds now?

Copy link
Collaborator

@fujitatomoya fujitatomoya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mjcarroll do we need to apply the same fix for, (where is irrelevant from #2142)

if (group_info && !group_info->can_be_taken_from().load()) {

callback_group is null means that entity is not valid. AFAIS, rclcpp does not use ready_executables but user would?

CC: @jmachowinski

@jmachowinski
Copy link
Contributor

jmachowinski commented Apr 2, 2024

Is it easy to create a unit-test that would fail without this PR and succeeds now?

Should be straight forward.
A custom executor,
Add a cbg
call collect,
then delete the cbg
Call get next executable.
This failed for me, I got a any executable with an empty cbg

@mjcarroll mjcarroll self-assigned this Apr 3, 2024
@mjcarroll mjcarroll force-pushed the mjcarroll/fix_cbg_condition branch from 7a009e0 to 16911a8 Compare April 3, 2024 15:40
@@ -0,0 +1,132 @@
// Copyright 2017 Open Source Robotics Foundation, Inc.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

C&P ?

@mjcarroll
Copy link
Member Author

@jmachowinski can you take a look at my test here? I can't get a scenario where the executable (in this case a timer) is valid, while the callback group is not.

In my case, even when resetting the callback group and the node, there are still 2 strong references to the callback group.

@mjcarroll mjcarroll force-pushed the mjcarroll/fix_cbg_condition branch from 16911a8 to b4f61ec Compare April 3, 2024 16:07
Copy link
Member

@wjwwood wjwwood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changes lgtm, had a question about the tests

Also a general comment on the test, I'd love to see the use of spin() instead of spin_all(), or even spin_once() in a loop. And I've found that using a custom waitable that can be explicitly triggered rather than a timer can avoid unintentional race conditions and sleeps. Though we've been copying that custom waitable around a lot. It might be a good idea to create a new entity type that is like a TriggeredCallback class which would inherit from Waitable and would differ from a GuardCondition in that it has a callback to be called when the executor executes it, rather than when it is triggered (which the GuardCondition class already has). It would definitely be useful in tests and may even be useful for users, allowing them to manually schedule a callback to be run in the executor on demand.

If others agree I could work on that.

Comment on lines +105 to +108
// Add the callback group to the executor
auto executor = CustomExecutor();
executor.add_node(node);
executor.spin_all(std::chrono::milliseconds(10));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious why is this needed?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was to clear out all of the node startup transients.

@jmachowinski
Copy link
Contributor

And I've found that using a custom waitable that can be explicitly triggered rather than a timer can avoid unintentional race conditions and sleeps

In this special case, the timer is not racy, as we just need it to be ready. Also I would like to keep the timer, as it uncovered a bug causing an endless loop.

mjcarroll and others added 2 commits April 5, 2024 15:44
Signed-off-by: Michael Carroll <mjcarroll@intrinsic.ai>
Co-authored-by: Janosch Machowinski <j.machowinski@nospam.org>
Signed-off-by: Michael Carroll <mjcarroll@intrinsic.ai>
Co-authored-by: Janosch Machowinski <j.machowinski@nospam.org>
@mjcarroll mjcarroll force-pushed the mjcarroll/fix_cbg_condition branch from b4f61ec to 06f99cc Compare April 5, 2024 15:45
@mjcarroll
Copy link
Member Author

mjcarroll commented Apr 5, 2024

@jmachowinski I have incorporated your changes from: https://github.com/cellumation/rclcpp/tree/fix_cbg_condition

They have been squashed and I marekd you with co-authored-by

  • Linux Build Status
  • Linux-aarch64 Build Status
  • Windows Build Status

@wjwwood wjwwood mentioned this pull request Apr 10, 2024
@jmachowinski
Copy link
Contributor

@mjcarroll I picked the test over to #2494

2494 also fixes the tests fails that came up in the CI.

I think we can close this PR in favor for 2494

@wjwwood
Copy link
Member

wjwwood commented Apr 11, 2024

Ok, let's go ahead and close this then. We can reopen it if needed, and I think the feed back was addressed in the new pr as far as I can tell.

For future reference, it's usually better to make prs to other prs rather than replacing them, though sometimes that doesn't work out due to needing to rebase. Or like me you mess up the pr or branch some how and have to close it 🙃.

@wjwwood wjwwood closed this Apr 11, 2024
@wjwwood wjwwood deleted the mjcarroll/fix_cbg_condition branch April 11, 2024 12:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Bug in Executor::get_next_ready_executable ?
6 participants