Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🧑‍🌾 connext test regressions in test_subscription_topic_statistics #2588

Closed
Crola1702 opened this issue Jul 24, 2024 · 13 comments · Fixed by #2650
Closed

🧑‍🌾 connext test regressions in test_subscription_topic_statistics #2588

Crola1702 opened this issue Jul 24, 2024 · 13 comments · Fixed by #2650
Assignees
Labels
more-information-needed Further information is required

Comments

@Crola1702
Copy link
Contributor

Crola1702 commented Jul 24, 2024

Bug report

Required Info:

  • Operating System:
    • Linux Noble
  • Installation type:
    • Source
  • Version or commit hash:
    • Rolling, Jammy
  • DDS implementation:
    • RTI Connext
  • Client library (if applicable):
    • rclcpp

Steps to reproduce issue

  1. Run a build in Rci__nightly-connext_ubuntu_noble_amd64
  2. See test regressions

Additional information

Reference build: Rci__nightly-connext_ubuntu_noble_amd64#94

Test regressions:

Log output:

[ RUN      ] TestSubscriptionTopicStatisticsFixture.test_receive_stats_for_message_no_header
RTI Connext DDS Non-commercial license is for academic, research, evaluation and personal use only. USE FOR COMMERCIAL PURPOSES IS PROHIBITED. See RTI_LICENSE.TXT for terms. Download free tools at rti.com/ncl. License issued to Non-Commercial User license@rti.com For non-production use only.
Expires on 00-jan-00 See www.rti.com for more information.
/tmp/ws/src/ros2/rclcpp/rclcpp/test/rclcpp/topic_statistics/test_subscription_topic_statistics.cpp:355: Failure
Expected equality of these values:
  kNumExpectedMessageAgeMessages
    Which is: 4
  message_age_count
    Which is: 0

/tmp/ws/src/ros2/rclcpp/rclcpp/test/rclcpp/topic_statistics/test_subscription_topic_statistics.cpp:356: Failure
Expected equality of these values:
  kNumExpectedMessagePeriodMessages
    Which is: 4
  message_period_count
    Which is: 8

[  FAILED  ] TestSubscriptionTopicStatisticsFixture.test_receive_stats_for_message_no_header (8331 ms)

...

[ RUN      ] TestSubscriptionTopicStatisticsFixture.test_receive_stats_include_window_reset
RTI Connext DDS Non-commercial license is for academic, research, evaluation and personal use only. USE FOR COMMERCIAL PURPOSES IS PROHIBITED. See RTI_LICENSE.TXT for terms. Download free tools at rti.com/ncl. License issued to Non-Commercial User license@rti.com For non-production use only.
Expires on 00-jan-00 See www.rti.com for more information.
/tmp/ws/src/ros2/rclcpp/rclcpp/test/rclcpp/topic_statistics/test_subscription_topic_statistics.cpp:404: Failure
Expected: (stats_point.data) >= (message_age_offset), actual: 100.035641 vs 1000

/tmp/ws/src/ros2/rclcpp/rclcpp/test/rclcpp/topic_statistics/test_subscription_topic_statistics.cpp:404: Failure
Expected: (stats_point.data) >= (message_age_offset), actual: 99.960970000000003 vs 1000

[  FAILED  ] TestSubscriptionTopicStatisticsFixture.test_receive_stats_include_window_reset (8331 ms)

Failures are pointing to this source code: https://github.com/ros2/rclcpp/blob/rolling/rclcpp/test/rclcpp/topic_statistics/test_subscription_topic_statistics.cpp

Crola1702 added a commit to osrf/buildfarm-tools that referenced this issue Jul 24, 2024
- Reports from: ros2/rclcpp#2588

Signed-off-by: Crola1702 <cristobal.arroyo@ekumenlabs.com>
@fujitatomoya
Copy link
Collaborator

this problem can be reproducible with current rolling today, as reported this only happens with https://github.com/ros2/rmw_connextdds.

source timestamp is assigned by RTI Connext to libstatistic allocator, and confirmed that topic statistics on the subscription publishes the data for both message_age and message_period. there seems to be no problem at this statistics publisher process.

for (auto & msg : msgs) {
publisher_->publish(msg);
}

on the other hand, MetricsMessageCallback is only called with source name message_period.

void MetricsMessageCallback(const MetricsMessage & msg)
{
std::unique_lock<std::mutex> ulock{mutex_};
++num_messages_received_;
received_messages_.push_back(msg);
if (num_messages_received_ >= number_of_messages_to_receive_) {
PromiseSetter::SetPromise();
}
}

that fails the test since it expects the same number of messages for each source name.
it just looks like source name message_age metric message is never delivered to MetricsMessageSubscriber during this test.
i would like to hear from RTI developer for this behavior.

IMO, assumption that the same exact number of messages are delivered to each source name would be bad, since there is no guarantee that multiple source name messages come in order, that i think depends on the RMW implementation.

@mjcarroll
Copy link
Member

i would like to hear from RTI developer for this behavior.

@asorbini any chance that you or someone else from the RTI team may be able to shed some light here?

@MichaelOrlov
Copy link
Contributor

@clalancette You mentioned to assign this issue to you and add label more information needed.

@asorbini A friendly ping for comments about this issue.

@fujitatomoya
Copy link
Collaborator

@trubio-rti
CC: @asorbini

this comes up with priority in this week's PMC meeting.
according to #2588 (comment), it just cannot receive the data from RTI connextdds from rmw perspective. could you taka a look at this why we are missing the data? (this only happens with RTI connextdds, but cyclone nor fastdds.)

@trubio-rti
Copy link

@fujitatomoya: @fgallegosalido is the one working now on the Connext RMW

@fujitatomoya
Copy link
Collaborator

@trubio-rti thanks! @fgallegosalido let us know if you need any help.

@fgallegosalido
Copy link

@fujitatomoya we are trying to reproduce the issue. Could you clarify which Connext version is being used or is it just the one in the CI? Thanks.

@fujitatomoya
Copy link
Collaborator

@fgallegosalido that is 6.0.1.25-1.

@fgallegosalido
Copy link

I was able to reproduce it. Working on it

@fgallegosalido
Copy link

After a long investigation, we've found out that the issue was produced by the KEEP_LAST QoS on the topic statistics publisher. Basically, as types are unkeyed in ROS2, when two samples were sent so close to each other, the second sample would arrive to the publisher queue before the first one could be sent, therefore discarding the first one.

Changing the QoS in the test to use KEEP_ALL for the statistics publisher (in the class SubscriberWithTopicStatistics), fixes both tests (haven't had a failure in 100 runs). Basically change the constructor of the class SubscriberWithTopicStatistics by setting the history QoS to keep_all:

auto options = rclcpp::SubscriptionOptions();
options.topic_stats_options.state = rclcpp::TopicStatisticsState::Enable;
options.topic_stats_options.publish_period = publish_period;
options.topic_stats_options.qos.keep_all();

@fujitatomoya
Copy link
Collaborator

@fgallegosalido thanks for the analysis, i will try and confirm it right now.

@fujitatomoya
Copy link
Collaborator

@fgallegosalido i would like to confirm the connextdds behavior, can you check the comments in #2650?

@fujitatomoya
Copy link
Collaborator

@Crola1702 do we need to backport #2650 to downstream distributions?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
more-information-needed Further information is required
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants