Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve time sync between simulation and controller #36

Closed
chapulina opened this issue Oct 26, 2021 · 5 comments
Closed

Improve time sync between simulation and controller #36

chapulina opened this issue Oct 26, 2021 · 5 comments

Comments

@chapulina
Copy link
Contributor

MBARI's controller has some built-in timeouts which issue critical errors if certain data is not available in time. For example, AHRS will fail like this if we unpause simulation, and take too long to start the controller:

2021-10-26T03:59:47.169Z,1635220787.169 [AHRS_M2](FAULT): Failed to initialize within timeout.
2021-10-26T03:59:47.169Z,1635220787.169 [AHRS_M2] Communications Fault, FailCount= 1
2021-10-26T03:59:47.169Z,1635220787.169 [AHRS_M2](ERROR): Communications Fault
2021-10-26T03:59:47.169Z,1635220787.169 [MissionManager](ERROR): No startup, active, or default mission!
2021-10-26T03:59:47.169Z,1635220787.169 [CBIT](ERROR): Communications Fault in component: AHRS_M2
2021-10-26T03:59:47.169Z,1635220787.169 [CBIT](CRITICAL): Communications Fault in component: AHRS_M2

Possible solutions:

  • I think the ideal scenario would be to run the controller in lock-step with simulation. Simulation controls the time, so it could be the authority of time for the controller, which I don't think is happening right now.
  • The second-best option I can think of would be to set all of the controller's internal timestamps with respect to the time when simulation was initialized. So whenever the controller receives a state, it immediately subtracts the initialization time from the current sim time and uses the result.

I saw some code upstream that's supposed to set the slate's time based on timeIgn, but it's commented out, and I'm not sure why. Any hints, @mabelzhang / @braanan ?

    // Write time to Slate, so that latest state is recorded with latest time
    //timeIgnWriter_->write( Units::SECOND, results_.timeIgn_ );
@braanan
Copy link
Collaborator

braanan commented Oct 26, 2021

Hey @chapulina, we already took care of lockstep in one of the previous dev rounds, so the lrauv-app wall time should sync with Ign time every control cycle. The lrauv-app wall time is set from a callback that's subscribed to Ignition's world /stats channel (see implementation in WorldStatHandler.cpp).

The /stats listener is integrated into the main control loop at the thread handler level (see Handler.cpp).

If I had to guess, I'd say the timeout is triggered because you're running Ign faster-than-real-time before starting the lrauv-app. The way things are set up now, the app runs one control cycle (which starts the timers with the real wall time, etc.) and only then reaches the part where the time sync happens (at the end of the control cycle) — once the time is synced, the lrauv-app clock jumps forward to match Ign, and in the next control cycle the timeouts expire due to the jump.

Probably a few ways to address this, but I don't think adjusting the timeout in the driver will help us here, because if enough time passes in Ign we'll still violate the timeout. Maybe we should try to time sync at startup before the control thread kicks off?

Hope that made sense... I'll take a closer look in the AM.

Thanks!

@mabelzhang
Copy link
Collaborator

if we unpause simulation, and take too long to start the controller

That case has never been taken care of, and I think is a lot more complicated than fixing the AHRS timeout. Something like what Ben described above is probably better. This digs into very MBARI-specific code and how they set up their control loop. Ben has the best knowledge if we want to do that.

@chapulina
Copy link
Contributor Author

Thanks for the pointers, @braanan and @mabelzhang! I understand the time sync a bit more now. I poked around the code and if I understand correctly, the controller is (guaranteed to be?) running at a much lower rate than the simulation, and blocks waiting for the next simulation time. I also saw that it handles pause well, which I didn't know until now!

Maybe we should try to time sync at startup before the control thread kicks off?

Yup, it sounds like the missing piece is the initial sync.

@braanan
Copy link
Collaborator

braanan commented Oct 27, 2021

OK, I updated the lrauv-app to wait for initial time sync from Ign before running the first control cycle. PR is here: https://bitbucket.org/mbari/lrauv-application/pull-requests/319/add-ign-time-sync-on-startup

These changes should resolve any issues with synchronous components like the AHRS_M2 but might not work for async components — I'm looking into that next.

Thanks

@chapulina
Copy link
Contributor Author

@braanan 's PR fixed it for me! I'm removing the notes about the need to keep simulation paused in #149.

@chapulina chapulina transferred this issue from another repository Nov 2, 2021
@caguero caguero mentioned this issue Nov 2, 2021
40 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants