Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nested wall timers are not being executed on FastRTPS #606

Closed
AlexeyMerzlyakov opened this issue May 13, 2022 · 3 comments
Closed

Nested wall timers are not being executed on FastRTPS #606

AlexeyMerzlyakov opened this issue May 13, 2022 · 3 comments

Comments

@AlexeyMerzlyakov
Copy link

AlexeyMerzlyakov commented May 13, 2022

In the following snippet of code only outer wall timer1_ is being executed with FastRTPS while nested timer2_ lambda is never being called:

  TimerNode(std::string timer_node)
  : Node(timer_node)
  {
    callback_group_ = create_callback_group(rclcpp::CallbackGroupType::MutuallyExclusive, false);

    timer1_ = this->create_wall_timer(
      0s,
      [this]() -> void {
        timer1_->cancel();
        RCLCPP_INFO(get_logger(), "Hey from outer wall timer");
        timer2_ = this->create_wall_timer(
          0s,
          [this]() -> void {
            timer2_->cancel();
            RCLCPP_INFO(get_logger(), "Hey from nested wall timer");
          },
          callback_group_);
      });

    executor_ = std::make_shared<rclcpp::executors::SingleThreadedExecutor>();
    executor_->add_callback_group(callback_group_, get_node_base_interface());
    service_thread_ = std::make_unique<std::thread>([&]() {executor_->spin();});

    std::this_thread::sleep_for(std::chrono::milliseconds(1000));
  }

This code is creating service_thread_ and adding there a new custom callback group executor. Later, inside timer1_ lambda, new timer2_ is being created with this custom callback group, but never being executed when using large delays at the end of the constructor.
Our expectations that both timers: outer and nested to executed as it being when running e.g. on CycloneDDS. Outer timer should be executed within main thread, nested timer - inside service_thread_ after it was added. From this point it seems to be unusual behavior.

Bug report

Required Info:

  • Operating System:
    • Ubuntu 20.04
  • Installation type (confirmed on):
    • ROS2 Rolling built from sources
    • ROS2 Rolling installed from binaries
    • ROS2 Galactic installed from binaries
  • Version or commit hash:
  • DDS implementation:
    • Fast-RTPS
  • Client library (if applicable):
    • rclcpp

Steps to reproduce issue

  1. Build attached (dev_ws.zip) simple repro-case workspace:
[leha@leha-PC dev_ws]$ colcon build --symlink-install
  1. Run the testcase with selected RMW implementation
[leha@leha-PC dev_ws]$ ./build/timer/timer_node

Expected behavior

Both outer and nested timers are being executed.

Actual behavior

Only outer timer is being executed for FastRTPS:

[leha@leha-PC dev_ws]$ RMW_IMPLEMENTATION=rmw_fastrtps_cpp ./build/timer/timer_node 
[INFO] [1652426582.979583067] [timer_node]: Hey from outer wall timer
^C[INFO] [1652426586.339303409] [rclcpp]: signal_handler(signum=2)
[leha@leha-PC dev_ws]$ RMW_IMPLEMENTATION=rmw_cyclonedds_cpp ./build/timer/timer_node
[INFO] [1652426591.814484323] [timer_node]: Hey from outer wall timer
[INFO] [1652426591.815313078] [timer_node]: Hey from nested wall timer
^C[INFO] [1652426593.976046693] [rclcpp]: signal_handler(signum=2)

Additional information

The similar to given in the testcase code is contained in Nav2 LifecycleManager constructor, which makes it inoperable for Nav2 code in the current revision. More details in ros-navigation/navigation2#2917

@afrixs
Copy link

afrixs commented May 13, 2022

Comparing to other similar scenarios we got a bit closer to the root cause of the problem: the guard condition being triggered inside the add_timer method does not reach all the executors, if the node has its callback groups assigned to multiple executors

@MiguelCompany
Copy link
Collaborator

MiguelCompany commented May 13, 2022

@AlexeyMerzlyakov @afrixs I think this may be related to ros2/rclcpp#1611, which seems to state the bug is in rclcpp, not in rmw implementations.

If this is the case, there PR ros2/rclcpp#1640 deals with the issue, but seems to have been pending for a long time and it now has conflicts.

If the root cause is that rclcpp is violating the contract in RMW API, as stated in this comment, then this issue must be closed.

Note that it may work on other RMW implementations that support waiting on the same entity using different wait sets in two or more threads concurrently, but is not safe according to the RMW API doc.

@AlexeyMerzlyakov
Copy link
Author

AlexeyMerzlyakov commented May 16, 2022

@afrixs, @MiguelCompany, thank you for the analysis of the root cause and reference to the origin documentation provided! I've checked the patch from ros2/rclcpp#1640 (slightly modified for back-port to the latest revision) and it works for the attached testcase. This confirms, that the problem is related to the ros2/rclcpp#1611 and not concerned with FastRTPS. Closing the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants