Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rmw_publish can block #209

Open
cleitner opened this issue Mar 30, 2020 · 0 comments
Open

rmw_publish can block #209

cleitner opened this issue Mar 30, 2020 · 0 comments
Assignees

Comments

@cleitner
Copy link

Bug report

Required Info:

  • Operating System:
    • Debian and Ubuntu
  • Installation type:
    • From source
  • Version or commit hash:
    • Eloquent and Foxy HEAD
  • DDS implementation:
    • Fast-RTPS and CycloneDDS
  • Client library (if applicable):
    • Tested with rclpy, problem is based in RMW

Steps to reproduce issue

The fast (1kHz) publisher and slow (10Hz) subscriber
https://gist.github.com/cleitner/93decfa79a99a8a3b59a795df02b99e7
https://gist.github.com/cleitner/e5eb35f6bcf6639425c06c03dd13fbff
expose problems with buffer bloat and, more relevant here, issues with the rmw_publish interface.

As described in #176, the RMW implementations can cause rmw_publish to block when history is set to KEEP_ALL

Calling the two scripts with

$ ./pub_back_pressure.py
$ ./sub_back_pressure.py

will sometimes exhibit blocking times > 20ms in rmw_publish and more often RMW_RET_ERROR when the internal RESOURCE_LIMITS setting is hit.

...
Failed to publish at 10032 with Failed to publish: cannot publish data, at .../src/ros2/rmw_fastrtps/rmw_fastrtps_shared_cpp/src/rmw_publish.cpp:53, at .../src/ros2/rcl/rcl/src/rcl/publisher.c:290
Failed to publish at 10032 with Failed to publish: cannot publish data, at .../src/ros2/rmw_fastrtps/rmw_fastrtps_shared_cpp/src/rmw_publish.cpp:53, at .../src/ros2/rcl/rcl/src/rcl/publisher.c:290
Slowed down at 10032 for 41.456 ms
Slowed down at 10035 for 101.639 ms
Failed to publish at 10039 with Failed to publish: cannot publish data, at .../src/ros2/rmw_fastrtps/rmw_fastrtps_shared_cpp/src/rmw_publish.cpp:53, at .../src/ros2/rcl/rcl/src/rcl/publisher.c:290
Failed to publish at 10032 with Failed to publish: cannot publish data, at .../src/ros2/rmw_fastrtps/rmw_fastrtps_shared_cpp/src/rmw_publish.cpp:53, at .../src/ros2/rcl/rcl/src/rcl/publisher.c:290
...

With

$ RMW_IMPLEMENTATION=rmw_cyclonedds_cpp ./pub_back_pressure.py
$ RMW_IMPLEMENTATION=rmw_cyclonedds_cpp ./sub_back_pressure.py

the publisher blocks with (seemingly) random timeouts, but no errors:

...
Slowed down at 3205 for 21.131 ms
Slowed down at 3207 for 21.206 ms
Slowed down at 3209 for 209.976 ms
Slowed down at 3210 for 275.831 ms
Slowed down at 3211 for 173.350 ms
Slowed down at 3212 for 97.646 ms
Slowed down at 3213 for 43.757 ms
...

The subscriber crashes with OoM because CycloneDDS doesn't seem to limit the incoming buffer.

Expected behavior

At least consistent behavior regarding returning errors or blocking.

The IMHO correct result would be an indication of a need for blocking, akin to the taken parameter to rmw_take to allow pure polling operation of rmw_publish, i.e. in cyclic realtime context.

No crashing in the subscriber with default settings.

Actual behavior

rmw_publish blocks when some unknown internal (and external) limit is hit. Fast-RTPS also returns RMW_RET_ERROR, without having experiences a real error.

CycloneDDS subscriber can be crashed with OoM if the publisher exceeds the subscribers processing bandwidth.

Additional information

The write function of the DDS DataWriter interface has a OUT_OF_RESOURCES return code and specifies a max_block_time parameter for the RELIABILITY QoS, which would preferrably be set to 0.

The interface of rmw_publish could be made symmetric to rmw_take:

rmw_ret_t
rmw_publish(
  const rmw_publisher_t * publisher,
  const void * ros_message,
  bool * published,
  rmw_publisher_allocation_t * allocation);
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants