Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test_time sleep_until_basic_steady sometimes times out on aarch64 #1829

Closed
sloretz opened this issue Nov 24, 2021 · 3 comments · Fixed by #1830
Closed

test_time sleep_until_basic_steady sometimes times out on aarch64 #1829

sloretz opened this issue Nov 24, 2021 · 3 comments · Fixed by #1830
Assignees

Comments

@sloretz
Copy link
Contributor

sloretz commented Nov 24, 2021

Bug report

Required Info:

  • Operating System:
    • Ubuntu Focal aarch64
  • Installation type:
    • CI jobs
  • Version or commit hash:
    • latest
  • DDS implementation:
    • n/a
  • Client library (if applicable):
    • rclcpp

Steps to reproduce issue

colcon test --event-handlers console_direct+ --packages-select rclcpp --ctest-args -R test_time$

Expected behavior

The test should always pass

Actual behavior

Sometimes on aarch64 the test [ RUN ] TestClockSleep.sleep_until_basic_steady times out.

Additional information

Happened on this job: https://ci.ros2.org/view/nightly/job/nightly_linux-aarch64_release/1760/testReport/junit/projectroot.test/rclcpp/test_time/

Probably introduced by #1814

Maybe related to this gcc bug https://gcc.gnu.org/bugzilla/show_bug.cgi?id=41861

I'd like to figure out if _GLIBCXX_USE_PTHREAD_COND_CLOCKWAIT is set on the aarch64 machines or not. IIUC if that is not defined then condition_variable may use the system clock, in which case a time jump of the system clock could cause sleeping on the steady clock to be affected.

https://github.com/gcc-mirror/gcc/blob/e1d4359264585acc8210ba60abb6dfb15bf1fa57/libstdc%2B%2B-v3/include/std/condition_variable#L71-L75

Maybe condition_variable isn't the thing to use?

@sloretz sloretz self-assigned this Nov 24, 2021
@sloretz
Copy link
Contributor Author

sloretz commented Nov 24, 2021

_GLIBCXX_USE_PTHREAD_COND_CLOCKWAIT is not defined in amd64 and aarch64 builds. That means sleeping on a steady clock can be affected by system time jumps - which is not ideal. I'm not sure there's a good fix for this. Waiting for gcc > 10 (which should happen in Ubuntu 22.04) seems like the best option.

https://ci.ros2.org/job/ci_linux/15660/console
https://ci.ros2.org/job/ci_linux-aarch64/10338/console

12:06:37 79: _GLIBCXX_USE_PTHREAD_COND_CLOCKWAIT is not defined
12:06:38 79: Using GCC version 9.3.0

I'm still not sure why this test only sometimes hangs on aarch64. Maybe the steady clock resolution is a lot higher and the conversion to the system clock experiences overflow?

@sloretz
Copy link
Contributor Author

sloretz commented Nov 25, 2021

Very similar discussion in ros2/ros2#1080

@sloretz
Copy link
Contributor Author

sloretz commented Nov 25, 2021

The problem seems to be Clock::now() when used with RCL_STEADY_TIME has a different epoch than std::chrono::steady_clock. It looks like this is because rcutils uses CLOCK_MONOTONIC_RAW (related issue ros2/rcutils#43) while chrono uses CLOCK_MONOTONIC

If I synchronize the clocks in Clock::sleep_until() when using steady time, then the issue goes away (https://ci.ros2.org/job/ci_linux-aarch64/10344/console)

   if (this_clock_type == RCL_STEADY_TIME) {
    // Synchronize because RCL steady clock epoch might differ from chrono::steady_clock epoch
    const Time rcl_entry = now();
    const std::chrono::steady_clock::time_point chrono_entry = std::chrono::steady_clock::now();
    const Duration delta_t = until - rcl_entry;
    const std::chrono::steady_clock::time_point chrono_until =
      chrono_entry + std::chrono::nanoseconds(delta_t.nanoseconds());

    // XXX vvv
#ifdef _GLIBCXX_USE_PTHREAD_COND_CLOCKWAIT
    std::cerr << "_GLIBCXX_USE_PTHREAD_COND_CLOCKWAIT is defined\n";
#else
    std::cerr << "_GLIBCXX_USE_PTHREAD_COND_CLOCKWAIT is not defined\n";
#endif


#ifdef _GLIBCXX_USE_CLOCK_MONOTONIC
    std::cerr << "_GLIBCXX_USE_CLOCK_MONOTONIC is defined\n";
#else
    std::cerr << "_GLIBCXX_USE_CLOCK_MONOTONIC is not defined\n";
#endif


#ifdef _GLIBCXX_USE_CLOCK_GETTIME_SYSCALL
    std::cerr << "_GLIBCXX_USE_CLOCK_GETTIME_SYSCALL is defined\n";
#else
    std::cerr << "_GLIBCXX_USE_CLOCK_GETTIME_SYSCALL is not defined\n";
#endif

#ifdef CLOCK_MONOTONIC_RAW
    std::cerr << "CLOCK_MONOTONIC_RAW is defined\n";
#else
    std::cerr << "CLOCK_MONOTONIC_RAW is not defined\n";
#endif

    std::cerr << "Using GCC version "  << __GNUC__ << "."  << __GNUC_MINOR__ << "."  << __GNUC_PATCHLEVEL__  << "\n";


    typedef std::chrono::steady_clock _Clock;
    typedef std::chrono::system_clock __clock_t;
    const auto __atime = chrono_until;

	  const typename _Clock::time_point __c_entry = _Clock::now();
	  const __clock_t::time_point __s_entry = __clock_t::now();
	  const auto __delta = __atime - __c_entry;
	  const auto __s_atime = __s_entry + __delta;
    std::cerr << "__atime: " << __atime.time_since_epoch().count() << "\n";
    std::cerr << "__c_entry: " << __c_entry.time_since_epoch().count() << "\n";
    std::cerr << "__s_entry: " << __s_entry.time_since_epoch().count() << "\n";
    std::cerr << "__delta: " << __delta.count() << "\n";
    std::cerr << "__s_atime: " << __s_atime.time_since_epoch().count() << "\n";
    // XXX ^^^

    // loop over spurious wakeups but notice shutdown
    std::unique_lock lock(impl_->clock_mutex_);
    while (now() < until && context->is_valid()) {
      cv.wait_until(lock, chrono_until);
    }
16:38:55 79: [ RUN      ] TestClockSleep.sleep_until_basic_steady
16:38:55 79: _GLIBCXX_USE_PTHREAD_COND_CLOCKWAIT is not defined
16:38:55 79: _GLIBCXX_USE_CLOCK_MONOTONIC is defined
16:38:55 79: _GLIBCXX_USE_CLOCK_GETTIME_SYSCALL is not defined
16:38:55 79: CLOCK_MONOTONIC_RAW is defined
16:38:55 79: Using GCC version 9.3.0
16:38:55 79: __atime: 22974212450763212
16:38:55 79: __c_entry: 22974212150812909
16:38:55 79: __s_entry: 1637800735753372958
16:38:55 79: __delta: 299950303
16:38:55 79: __s_atime: 1637800736053323261
16:38:56 79: [       OK ] TestClockSleep.sleep_until_basic_steady (315 ms)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant