Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Get rid of service client nodes #816

Closed
mjeronimo opened this issue Jun 10, 2019 · 20 comments
Closed

Get rid of service client nodes #816

mjeronimo opened this issue Jun 10, 2019 · 20 comments
Assignees
Labels
2 - Medium Medium Priority

Comments

@mjeronimo
Copy link

There are several nodes in the system that are created and used solely for interfacing to services. These "client nodes" are used because spin_until_future_complete cannot accept a node that is already on an executor. Instead, we can use future.wait_until instead of spin_until_future_complete. This will allow us to remove several of these client nodes in the system, which will hopefully ease discovery congestion.

@mjeronimo mjeronimo added the 1 - High High Priority label Jun 10, 2019
@mjeronimo mjeronimo added this to the D Turtle Release milestone Jun 10, 2019
@crdelsey crdelsey added 2 - Medium Medium Priority and removed 1 - High High Priority labels Jun 10, 2019
@mjeronimo
Copy link
Author

@orduno See #808 for some background about this issue and a quick code sample.

@SteveMacenski
Copy link
Member

@mjeronimo is this still an issue?

@crdelsey
Copy link
Contributor

It is still an issue. Having too many nodes impacts startup times and seems to overwhelm the current DDS implementations.

Possible fixes to this issue are:

  1. Fix the DDS implementations to better handle many nodes.
  2. Fix rcl to allow re-entrant spinning on the current node.
  3. Fix rcl/rclcpp to allow continuations to be returned from service callbacks.
  4. Re-architect our service calls to cleverly use callback groups and multi-threaded executors.
  5. Re-architect nav2 to not wait for network activity in the context of a callback.

@SteveMacenski
Copy link
Member

@crdelsey can you provide a few examples in the stack where we use "client nodes", other than costmap spinning, I had a hard time finding any

@crdelsey
Copy link
Contributor

I think the big one is here in ServiceClient. https://github.com/ros-planning/navigation2/blob/7652518af52f257d097269309986943771d98683/nav2_util/include/nav2_util/service_client.hpp#L38

Then, there's another use in LifecycleManagerClient (https://github.com/ros-planning/navigation2/blob/a095038da9f4e3729679e5d20b39e1d95582fb16/nav2_lifecycle_manager/src/lifecycle_manager_client.cpp#L30).

There may be other places.

@SteveMacenski
Copy link
Member

SteveMacenski commented Oct 14, 2019

Got it, I didn't see that hidden in there. "Fix DDS" I feel like is a never ending battle, I'd say 2 or 3 seem good general long term fixes but 4 seems more practically in our control.
I'll need to do more reading on these topics

@orduno orduno removed this from the E Turtle Release milestone Oct 28, 2019
@SteveMacenski
Copy link
Member

Any objection to using multithreaded executors? It seems that way we can have these all going at once with one node. Do you think there are applications here sensitive enough (or otherwise not well formatted) for multithreaded processing? Most places have mutex where we might want to use a multithreaded spinner. I wouldn't think it would be too much of a hassle to update the few places required.

@shivaang12
Copy link
Collaborator

@mjeronimo Do you have any updates on this? Let me know even if you have any unfinished work on this.

@SteveMacenski
Copy link
Member

@mjeronimo Shivang is taking a look at this topic. Carl mentioned that you had a branch and started to work on it that either may be done or had some good progress. Can you give us a run down on where that branch is and what still needs to be completed to support?

@mjeronimo
Copy link
Author

@shivaang12 @SteveMacenski Sorry, guys I haven't been on github recently (we're using gitlab internally for my current project). Anyways, @orduno did the most exploration on this topic. He prototyped at least one alternative approach that, I believe, used multi-threaded executors. I've also brought up the topic directly to folks at Open Robotics and they agree that this is an issue (but I don't think there are any plans to allow re-entrant spinning on the current node anytime soon). In general, I think it's a good idea to not proliferate nodes like this because of the discovery traffic introduced and the possibility of dropping messages.

@SteveMacenski
Copy link
Member

SteveMacenski commented Apr 18, 2020

this is an issue (but I don't think there are any plans to allow re-entrant spinning on the current node anytime soon

Can confirm on the TSC side, doesn't seem likely for Foxy (or G-turtle... bigger fish to fry). On my wish list of things for OR people to work on that effect navigation users, this isn't high at the moment.

Awesome, hopefully Carlos can chime in. Ignoring Foxy, I'd like 1 server to 1 node mapping in the next LTS distribution. Multithreaded executors seems to be the lowest-effort clearest way to handle this, though that does create issues with the components API.

The question then becomes: do we value using components or minimizing nodes more? Currently, I think that answer is nodes from how inefficient it is to have multiple. However, that's improving by the week, so I think long term that answer is components.

@BriceRenaudeau
Copy link
Contributor

I am currently making a PR to use callback groups for service clients instead of nodes #2216.

On the Rolling distribution, there are callback groups that can be attach to a node and added to executors. This allows to use of spin_until_future_complete() while the node is still spinning. It makes nodes multithreaded.

It can be used to execute synchronous service/action call in timer/subscriber callback.

@SteveMacenski
Copy link
Member

#2216 (comment)

I think this comment gives a good summary of what should be done for each case.

@SteveMacenski
Copy link
Member

SteveMacenski commented Mar 16, 2021

Unfortunately, from deeper looking, each new executor creates a new context (unless one given on construction) therefore , with the new design https://design.ros2.org/articles/Node_to_Participant_mapping.html each context now maps to a DDS participant so this is really no different than just having multiple client nodes.

open question then: if we gave the node's context to a new separate executor_ on construction, is that the same DDS participant and also can be processed in parallel?


I think this probably deserves a dedicated redesign session to think about the executor and threading options we have available and go through our server list to see what makes the most sense in order to reduce the number of DDS participants and nodes. We should also analyze what actions+services+subscribers exist in each server and add them to separate callback groups if necessary so spinning has a deterministic output to processing the "right" information in each context (not-ROS-context, software one). Right now everywhere are just automatically added to one. Especially for planner / controller / waypoint servers, it may make sense to separate these callback groups and all processing in parallel or spinning a particular one.

The right answer is probably a mix of multi-threaded executors and multiple single-threaded executors depending on the server's needs. Also need to determine the context question above whether we can share that across multiple executor objects to run in parallel but not add additional participants on the network.

@SteveMacenski
Copy link
Member

SteveMacenski commented Apr 15, 2021

I spoke with Chris from OR @clalancette and he gave me the key to my understanding on this -- while a new context is constructed on a new executor, when none is given as an input, it uses the default context which stores a static variable to use https://github.com/ros2/rclcpp/blob/master/rclcpp/src/rclcpp/contexts/default_context.cpp#L23. Therefore, any new executor, node, etc will use that same context regardless, unless you give a context to the object on construction.

Thusly, we can continue this work but the question remains how necessary this is. Essentially, this is a refactoring effort now. We're working on turning N nodes into 1 node with N threads as a multi-threaded executor: "whats better: N single threaded exeuctors or 1 multi-threaded executor with N threads"? There's not a clear answer to that, but there shouldn't be insane performance differentials now.

It would be good to do this to refactor and make the best use of ROS2, but it is no longer a critical requirement. I believe we now have sufficient information to make decisions and execute on this task when someone has time to work on it.

@BriceRenaudeau
Copy link
Contributor

I see several nodes that can still be removed.

  • WaypointFollower client_node: easy, need to use callbackgroup instead
  • LifecycleManager bond_client_node_: easy if we don't do waitUntilFormed
  • Costmap2DROS client_node_: needed but not used by layers and filters
  • Costmap2DROS rclcpp_node_: Difficult, needed to pub/sub TF

@gezp
Copy link
Contributor

gezp commented Jul 8, 2021

I have searched all internal nodes which need be removed,and i will do this work in this summer. Any suggestions are welcome.

Remove rclcpp_node_ in LifecycleNode

Other independent cases

Internal nodes that need remain :

  • client_node_ in class BtActionServer
  • client_node_ in class Nav2Panel

@SteveMacenski
Copy link
Member

@gezp I saw another of your PRs was merged, is this now good to move forward on the AMCL / costmap PRs?

@gezp
Copy link
Contributor

gezp commented Jan 21, 2022

@SteveMacenski the upstream PR ros2/geometry2#447 is still blocked.

@SteveMacenski
Copy link
Member

Final PR imminent to merge

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2 - Medium Medium Priority
Projects
None yet
Development

No branches or pull requests

7 participants