Get rid of service client nodes #816

mjeronimo · 2019-06-10T19:24:41Z

There are several nodes in the system that are created and used solely for interfacing to services. These "client nodes" are used because spin_until_future_complete cannot accept a node that is already on an executor. Instead, we can use future.wait_until instead of spin_until_future_complete. This will allow us to remove several of these client nodes in the system, which will hopefully ease discovery congestion.

mjeronimo · 2019-06-11T15:27:20Z

@orduno See #808 for some background about this issue and a quick code sample.

SteveMacenski · 2019-10-10T18:33:48Z

@mjeronimo is this still an issue?

crdelsey · 2019-10-14T19:39:48Z

It is still an issue. Having too many nodes impacts startup times and seems to overwhelm the current DDS implementations.

Possible fixes to this issue are:

Fix the DDS implementations to better handle many nodes.
Fix rcl to allow re-entrant spinning on the current node.
Fix rcl/rclcpp to allow continuations to be returned from service callbacks.
Re-architect our service calls to cleverly use callback groups and multi-threaded executors.
Re-architect nav2 to not wait for network activity in the context of a callback.

SteveMacenski · 2019-10-14T19:58:33Z

@crdelsey can you provide a few examples in the stack where we use "client nodes", other than costmap spinning, I had a hard time finding any

crdelsey · 2019-10-14T20:49:33Z

I think the big one is here in ServiceClient. https://github.com/ros-planning/navigation2/blob/7652518af52f257d097269309986943771d98683/nav2_util/include/nav2_util/service_client.hpp#L38

Then, there's another use in LifecycleManagerClient (https://github.com/ros-planning/navigation2/blob/a095038da9f4e3729679e5d20b39e1d95582fb16/nav2_lifecycle_manager/src/lifecycle_manager_client.cpp#L30).

There may be other places.

SteveMacenski · 2019-10-14T22:10:59Z

Got it, I didn't see that hidden in there. "Fix DDS" I feel like is a never ending battle, I'd say 2 or 3 seem good general long term fixes but 4 seems more practically in our control.
I'll need to do more reading on these topics

SteveMacenski · 2020-01-23T06:47:53Z

Any objection to using multithreaded executors? It seems that way we can have these all going at once with one node. Do you think there are applications here sensitive enough (or otherwise not well formatted) for multithreaded processing? Most places have mutex where we might want to use a multithreaded spinner. I wouldn't think it would be too much of a hassle to update the few places required.

shivaang12 · 2020-04-16T22:42:25Z

@mjeronimo Do you have any updates on this? Let me know even if you have any unfinished work on this.

SteveMacenski · 2020-04-17T03:58:02Z

@mjeronimo Shivang is taking a look at this topic. Carl mentioned that you had a branch and started to work on it that either may be done or had some good progress. Can you give us a run down on where that branch is and what still needs to be completed to support?

mjeronimo · 2020-04-17T15:45:07Z

@shivaang12 @SteveMacenski Sorry, guys I haven't been on github recently (we're using gitlab internally for my current project). Anyways, @orduno did the most exploration on this topic. He prototyped at least one alternative approach that, I believe, used multi-threaded executors. I've also brought up the topic directly to folks at Open Robotics and they agree that this is an issue (but I don't think there are any plans to allow re-entrant spinning on the current node anytime soon). In general, I think it's a good idea to not proliferate nodes like this because of the discovery traffic introduced and the possibility of dropping messages.

SteveMacenski · 2020-04-18T07:44:22Z

this is an issue (but I don't think there are any plans to allow re-entrant spinning on the current node anytime soon

Can confirm on the TSC side, doesn't seem likely for Foxy (or G-turtle... bigger fish to fry). On my wish list of things for OR people to work on that effect navigation users, this isn't high at the moment.

Awesome, hopefully Carlos can chime in. Ignoring Foxy, I'd like 1 server to 1 node mapping in the next LTS distribution. Multithreaded executors seems to be the lowest-effort clearest way to handle this, though that does create issues with the components API.

The question then becomes: do we value using components or minimizing nodes more? Currently, I think that answer is nodes from how inefficient it is to have multiple. However, that's improving by the week, so I think long term that answer is components.

BriceRenaudeau · 2021-03-08T13:56:49Z

I am currently making a PR to use callback groups for service clients instead of nodes #2216.

On the Rolling distribution, there are callback groups that can be attach to a node and added to executors. This allows to use of spin_until_future_complete() while the node is still spinning. It makes nodes multithreaded.

It can be used to execute synchronous service/action call in timer/subscriber callback.

SteveMacenski · 2021-03-08T20:56:30Z

#2216 (comment)

I think this comment gives a good summary of what should be done for each case.

SteveMacenski · 2021-03-16T23:05:50Z

Unfortunately, from deeper looking, each new executor creates a new context (unless one given on construction) therefore , with the new design https://design.ros2.org/articles/Node_to_Participant_mapping.html each context now maps to a DDS participant so this is really no different than just having multiple client nodes.

open question then: if we gave the node's context to a new separate executor_ on construction, is that the same DDS participant and also can be processed in parallel?

I think this probably deserves a dedicated redesign session to think about the executor and threading options we have available and go through our server list to see what makes the most sense in order to reduce the number of DDS participants and nodes. We should also analyze what actions+services+subscribers exist in each server and add them to separate callback groups if necessary so spinning has a deterministic output to processing the "right" information in each context (not-ROS-context, software one). Right now everywhere are just automatically added to one. Especially for planner / controller / waypoint servers, it may make sense to separate these callback groups and all processing in parallel or spinning a particular one.

The right answer is probably a mix of multi-threaded executors and multiple single-threaded executors depending on the server's needs. Also need to determine the context question above whether we can share that across multiple executor objects to run in parallel but not add additional participants on the network.

SteveMacenski · 2021-04-15T18:36:58Z

I spoke with Chris from OR @clalancette and he gave me the key to my understanding on this -- while a new context is constructed on a new executor, when none is given as an input, it uses the default context which stores a static variable to use https://github.com/ros2/rclcpp/blob/master/rclcpp/src/rclcpp/contexts/default_context.cpp#L23. Therefore, any new executor, node, etc will use that same context regardless, unless you give a context to the object on construction.

Thusly, we can continue this work but the question remains how necessary this is. Essentially, this is a refactoring effort now. We're working on turning N nodes into 1 node with N threads as a multi-threaded executor: "whats better: N single threaded exeuctors or 1 multi-threaded executor with N threads"? There's not a clear answer to that, but there shouldn't be insane performance differentials now.

It would be good to do this to refactor and make the best use of ROS2, but it is no longer a critical requirement. I believe we now have sufficient information to make decisions and execute on this task when someone has time to work on it.

BriceRenaudeau · 2021-05-10T15:22:45Z

I see several nodes that can still be removed.

WaypointFollower client_node: easy, need to use callbackgroup instead
LifecycleManager bond_client_node_: easy if we don't do waitUntilFormed
Costmap2DROS client_node_: needed but not used by layers and filters
Costmap2DROS rclcpp_node_: Difficult, needed to pub/sub TF

gezp · 2021-07-08T02:05:53Z

SteveMacenski · 2022-01-20T20:47:45Z

@gezp I saw another of your PRs was merged, is this now good to move forward on the AMCL / costmap PRs?

gezp · 2022-01-21T01:09:25Z

@SteveMacenski the upstream PR ros2/geometry2#447 is still blocked.

SteveMacenski · 2022-06-02T01:56:37Z

Final PR imminent to merge

mjeronimo added the 1 - High High Priority label Jun 10, 2019

mjeronimo added this to the D Turtle Release milestone Jun 10, 2019

crdelsey added 2 - Medium Medium Priority and removed 1 - High High Priority labels Jun 10, 2019

crdelsey assigned orduno Jun 10, 2019

orduno mentioned this issue Jun 27, 2019

Removing service client nodes from navfn #897

Closed

rotu mentioned this issue Jul 3, 2019

recoveries node registers multiple subnodes with same name #909

Closed

crdelsey modified the milestones: D Turtle Release, E Turtle Release Jul 8, 2019

orduno removed this from the E Turtle Release milestone Oct 28, 2019

SteveMacenski added the summer sprint label Apr 28, 2020

ruffsl mentioned this issue Aug 6, 2020

Update for Foxy ros-swg/turtlebot3_demo#34

Merged

SteveMacenski removed the summer sprint label Oct 8, 2020

SteveMacenski mentioned this issue Mar 4, 2021

ServiceClient use callback group #2216

Merged

SteveMacenski mentioned this issue Mar 17, 2021

Update BT nodes to use callback groups / executors to process #2251

Closed

11 tasks

BriceRenaudeau mentioned this issue Mar 24, 2021

Remove service nodes in lifecycle_manager #2267

Merged

6 tasks

BriceRenaudeau mentioned this issue Apr 29, 2021

Open option to use parent node in lifecycle Manger Client #2328

Merged

6 tasks

BriceRenaudeau mentioned this issue May 10, 2021

Update BT nodes to use callback groups / executors to process #2334

Merged

6 tasks

SteveMacenski mentioned this issue Jun 16, 2021

Allow separate naming of client action server nodes in bt navigator #2410

Merged

6 tasks

This was referenced Jul 8, 2021

Reduce nodes for nav2_waypoint_follower #2441

Merged

add bool arg use_default_callback_group for ServiceClient #2448

Closed

This was referenced Jul 16, 2021

Reduce map saver nodes #2454

Merged

Reduce lifecycle manager nodes #2456

Merged

Reduce lifecycle service client nodes #2469

Merged

This was referenced Jul 27, 2021

Reduce controller server nodes #2479

Merged

Reduce planner server nodes #2480

Merged

Reduce amcl_node nodes #2483

Merged

Reduce TransformListener nodes ros2/geometry2#440

Closed

gezp mentioned this issue Aug 4, 2021

Reduce costmap_2d_ros nodes #2489

Merged

6 tasks

SteveMacenski closed this as completed Jun 2, 2022

This was referenced Jun 2, 2022

remove LifecycleNode's argument use_rclcpp_node #2993

Merged

clear up params file #3004

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Get rid of service client nodes #816

Get rid of service client nodes #816

mjeronimo commented Jun 10, 2019

mjeronimo commented Jun 11, 2019

SteveMacenski commented Oct 10, 2019

crdelsey commented Oct 14, 2019

SteveMacenski commented Oct 14, 2019

crdelsey commented Oct 14, 2019

SteveMacenski commented Oct 14, 2019 •

edited

Loading

SteveMacenski commented Jan 23, 2020

shivaang12 commented Apr 16, 2020

SteveMacenski commented Apr 17, 2020

mjeronimo commented Apr 17, 2020

SteveMacenski commented Apr 18, 2020 •

edited

Loading

BriceRenaudeau commented Mar 8, 2021

SteveMacenski commented Mar 8, 2021

SteveMacenski commented Mar 16, 2021 •

edited

Loading

SteveMacenski commented Apr 15, 2021 •

edited

Loading

BriceRenaudeau commented May 10, 2021

gezp commented Jul 8, 2021 •

edited

Loading

SteveMacenski commented Jan 20, 2022

gezp commented Jan 21, 2022

SteveMacenski commented Jun 2, 2022

Get rid of service client nodes #816

Get rid of service client nodes #816

Comments

mjeronimo commented Jun 10, 2019

mjeronimo commented Jun 11, 2019

SteveMacenski commented Oct 10, 2019

crdelsey commented Oct 14, 2019

SteveMacenski commented Oct 14, 2019

crdelsey commented Oct 14, 2019

SteveMacenski commented Oct 14, 2019 • edited Loading

SteveMacenski commented Jan 23, 2020

shivaang12 commented Apr 16, 2020

SteveMacenski commented Apr 17, 2020

mjeronimo commented Apr 17, 2020

SteveMacenski commented Apr 18, 2020 • edited Loading

BriceRenaudeau commented Mar 8, 2021

SteveMacenski commented Mar 8, 2021

SteveMacenski commented Mar 16, 2021 • edited Loading

SteveMacenski commented Apr 15, 2021 • edited Loading

BriceRenaudeau commented May 10, 2021

gezp commented Jul 8, 2021 • edited Loading

SteveMacenski commented Jan 20, 2022

gezp commented Jan 21, 2022

SteveMacenski commented Jun 2, 2022

SteveMacenski commented Oct 14, 2019 •

edited

Loading

SteveMacenski commented Apr 18, 2020 •

edited

Loading

SteveMacenski commented Mar 16, 2021 •

edited

Loading

SteveMacenski commented Apr 15, 2021 •

edited

Loading

gezp commented Jul 8, 2021 •

edited

Loading