Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix fuse optimizer unit test #369

Merged
merged 2 commits into from
May 1, 2024
Merged

Conversation

svwilliams
Copy link
Contributor

Fix fuse optimizer unit test. The rclcpp::wait_for_message call was throwing an exception 'subscription already associated with a wait set'. I have no idea why. My instinct is that the node in this unit test is using a single-threaded executor with a dedicated spin thread, and that maybe the rclcpp::wait_for_message is also trying to spin that same subscriber? 🤷

I switched the test to use a normal subscriber + callback function. It's not the prettiest thing, but it's not terrible either. I also needed to add a delay between publishing pose_msg1 and pose_msg2. Again, I have no idea why this is needed. I have the publisher queue set to 5, and the pose subscriber has a default queue size of 10. I'm not sure why the first message is getting lost.

@ayrton04
Copy link
Contributor

Looks like you're not the only one:

ros-controls/ros2_controllers#1101

Slightly different use case, but the net result is the same. Looking at other issues linked there, I see a comment from William Woodall:

ros2/rclcpp#2142 (comment)

It feels like there has to be some kind of dangling reference to the sub somewhere. You are using this overload:

https://github.com/ros2/rclcpp/blob/2cd8900dd20713f54a2a82d189a5e3372dca70be/rclcpp/include/rclcpp/wait_for_message.hpp#L96

The sub should go out of scope, but if something else is keeping a copy of the shared pointer, then it will obviously persist.

Your solution is obviously find, but I wonder if you used the other overload, which forces you to create the sub yourself, then you could manually call reset() on the sub shared pointer within the loop every time. Shouldn't be necessary, but maybe it would work? If you did use that overload, you could also call use_count on the sub pointer to see if someone is maintaining a reference to it somewhere.

Copy link

@carlos-m159 carlos-m159 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds like the wait_for_message should be creating a subscription with a callback group created with automatically_add_to_executor_with_node = false so that a wait_set can be later created without raising that exception?

rclcpp::SubscriptionOptions opts;​
opts.callback_group = ​
node->create_callback_group(rclcpp::CallbackGroupType::MutuallyExclusive, false);

auto sub = node->create_subscription<MsgT>(topic, 1, [](const std::shared_ptr<const MsgT>) {}, opts);
  return wait_for_message<MsgT, Rep, Period>(
    out, sub, node->get_node_options().context(), time_to_wait);

@ayrton04
Copy link
Contributor

Maybe? It's hard to know if that test is throwing a fault on the first call to wait_for_message or a subsequent call. It's in a loop, so I am wondering if it's the second call that results in the error.

@svwilliams
Copy link
Contributor Author

Just for fun I added some tracing print statements.
It looks like the error throws on the very first call to wait_for_message().

    [test_fixed_lag_ignition-1] Waiting for message (0)...
    [test_fixed_lag_ignition-1] terminate called after throwing an instance of 'std::runtime_error'
    [test_fixed_lag_ignition-1]   what():  subscription already associated with a wait set
    [test_fixed_lag_ignition-1] Aborted (core dumped)

@svwilliams
Copy link
Contributor Author

But as Tom/Carlos suggest:

  • Use the other overload that forces me to create the subscriber manually
  • Create that subscriber with the automatically_add_to_executor_with_node parameter set to false (which is undocumented)
  while ((odom_msg.header.stamp != rclcpp::Time(3, 0,
    RCL_ROS_TIME)) && (node->now() < result_timeout))
  {
    rclcpp::SubscriptionOptions opts;
    opts.callback_group =
      node->create_callback_group(rclcpp::CallbackGroupType::MutuallyExclusive, false);
    auto sub = node->create_subscription<nav_msgs::msg::Odometry>(
      "/odom", 1, [](const std::shared_ptr<const nav_msgs::msg::Odometry>) {}, opts);
    rclcpp::wait_for_message(odom_msg, sub, node->get_node_options().context(),
      std::chrono::seconds(1));
  }

This does work.

However, now that I see what is happening under the hood, I'm not in love with the loop creating a subscriber every time. That seems like it could lead to missed/dropped messages? If the external node published a message at just the wrong time?

@svwilliams
Copy link
Contributor Author

I think I'm going to leave the unit test as-is -- create a normal subscriber and wait for the expected message. This feels like it requires far less explanation for how the test works.

But I did realize I introduced a threading problem. The last commit adds a mutex to guard access to the stored odometry message.

Base automatically changed from rolling-format-changes to rolling April 30, 2024 22:52
…hrowing an exception 'subscription already associated with a wait set'. Switched to a standard subscriber instead.
@svwilliams svwilliams force-pushed the rolling-fix-optimizer-unit-test branch from 9d09337 to bd8e60c Compare April 30, 2024 22:54
@svwilliams
Copy link
Contributor Author

Checking if adding a comment will trigger the upstream CI build to run

@svwilliams svwilliams closed this Apr 30, 2024
@svwilliams svwilliams reopened this Apr 30, 2024
@svwilliams svwilliams closed this May 1, 2024
@svwilliams svwilliams reopened this May 1, 2024
@svwilliams
Copy link
Contributor Author

I don't understand. The last PR magically triggered CI jobs after I added a comment. This one refuses. Merging anyway.

@svwilliams svwilliams merged commit f1a1f42 into rolling May 1, 2024
@svwilliams svwilliams deleted the rolling-fix-optimizer-unit-test branch May 1, 2024 03:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

3 participants