Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add validation to initialization configuration commands #27459

Conversation

lashVN
Copy link
Contributor

@lashVN lashVN commented Jul 5, 2024

Currently, all configuration commands sent to the VN unit are not validated for a response, leading to difficult-to-diagnose errors when a command doesn't go through or is rejected by the VN. All commands send to the unit (particularly setup commands) should have their acceptance confirmation checked before the driver begins to use the data.

@lashVN lashVN closed this Jul 5, 2024
@lashVN lashVN force-pushed the Add_validation_to_initialization_configuration_commands branch from 70c0c87 to a346323 Compare July 5, 2024 16:35
@lashVN lashVN reopened this Jul 5, 2024
@lashVN lashVN marked this pull request as ready for review July 5, 2024 17:51
@lashVN lashVN force-pushed the Add_validation_to_initialization_configuration_commands branch 2 times, most recently from 18239c8 to 3c8e78e Compare July 5, 2024 22:14
@IamPete1
Copy link
Member

IamPete1 commented Jul 5, 2024

I'm not a fan of the formatting changes in the last commit. Its significantly worse in some places. It also hides the core functional changes in this PR, and makes the history more complex if we ever need to look back. If your very keen you could revert the cases where its worse and open it in a new PR with only those changes. Then you can run size_compare_branches to prove there is no difference to compiled output.

@lashVN lashVN force-pushed the Add_validation_to_initialization_configuration_commands branch from 3c8e78e to 646f637 Compare July 5, 2024 23:45
@lashVN
Copy link
Contributor Author

lashVN commented Jul 6, 2024

@IamPete1

Thanks for taking a look. Which spots are you talking about? I generally tried to come up with a clangformat from what I found in the guide and looking around the codebase. Alternatively, if there is a different clangformat I should use, please let me know.

Most of the changes seem to be

  1. Preceeding underscore for private members
  2. Correcting brace newline convention
  3. Pointer alignment
  4. Assignment operator alignment
  5. operator and parenthesis whitespace
  6. Statement newline wrapping (from what I could tell the codebase largely has a 100 character column limit? That's smaller than I tend to prefer, but it looked to be used)
  7. Case label tabbing (happy to revert, just noticed style guide accepts unindented case labels)

Copy link
Member

@IamPete1 IamPete1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Highlighting where some the formatting changes have made the code harder to read.

Comment on lines 491 to 492
state.gyro =
Vector3f{pkt1.uncompAngRate[0], pkt1.uncompAngRate[1], pkt1.uncompAngRate[2]};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is worse.

Comment on lines 498 to 499
state.quat = Quaternion{pkt1.quaternion[3], pkt1.quaternion[0], pkt1.quaternion[1],
pkt1.quaternion[2]};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, the wrap here is worse.

Comment on lines 505 to 507
state.location =
Location{int32_t(pkt1.positionLLA[0] * 1.0e7), int32_t(pkt1.positionLLA[1] * 1.0e7),
int32_t(pkt1.positionLLA[2] * 1.0e2), Location::AltFrame::ABSOLUTE};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is worse.

Comment on lines 579 to 581
state.origin =
Location{int32_t(pkt2.GPS1posLLA[0] * 1.0e7), int32_t(pkt2.GPS1posLLA[1] * 1.0e7),
int32_t(pkt2.GPS1posLLA[2] * 1.0e2), Location::AltFrame::ABSOLUTE};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is worse.

Comment on lines 610 to 611
state.quat =
Quaternion{pkt.quaternion[3], pkt.quaternion[0], pkt.quaternion[1], pkt.quaternion[2]};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is worse.

Comment on lines 670 to 677
AP::logger().WriteStreaming(
"EAH3", "TimeUS,Temp,Pres,MX,MY,MZ,AX,AY,AZ,GX,GY,GZ,Q1,Q2,Q3,Q4", "sdPGGGoooEEE----",
"F000000000000000", "Qfffffffffffffff", AP_HAL::micros64(), pkt.temp, pkt.pressure * 1e3,
use_uncomp ? pkt.uncompMag[0] : pkt.mag[0], use_uncomp ? pkt.uncompMag[1] : pkt.mag[1],
use_uncomp ? pkt.uncompMag[2] : pkt.mag[2], state.accel[0], state.accel[1], state.accel[2],
state.gyro[0], state.gyro[1], state.gyro[2], state.quat[0], state.quat[1], state.quat[2],
state.quat[3]);

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is much worse.

Comment on lines 808 to 811
const float vel_gate = 5;
const float pos_gate = 5;
const float hgt_gate = 5;
const float mag_var = 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is worse.

Comment on lines 91 to 92
uint8_t term_offset; // offset within the _term buffer where the next character should be
// placed
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is worse.

@IamPete1
Copy link
Member

IamPete1 commented Jul 6, 2024

Thanks for taking a look. Which spots are you talking about? I generally tried to come up with a clangformat from what I found in the guide and looking around the codebase. Alternatively, if there is a different clangformat I should use, please let me know.

We do have a guide, but we don't enforce all of it strictly on new code. Its generally left up to the person writing the code what they prefer. Unless the formatting is particularity offensive then its better to just leave it rather than having lots of code churn. As such not all libraries use the same pattern, underscores for private members for example.

I'm not saying that we would not accept a formatting improvement for this library but it should be in a separate PR to any functional changes.

@lashVN
Copy link
Contributor Author

lashVN commented Jul 6, 2024

@IamPete1

Thanks for pointing those out.

A lot of those I definitely agree to be worse. I'm happy to resolve those (locally I'd just disable the line wrapping in my clangformat specifiers) and proceed with the merge. If you'd still prefer though, I can pull the formatting changes altogether and open a new PR for the rest of the formatting changes.

Edit:

Sorry, I just reread your comment and it sounds like it clearly should be option 2.

@lashVN lashVN force-pushed the Add_validation_to_initialization_configuration_commands branch 2 times, most recently from 0f39760 to e918727 Compare July 6, 2024 01:02
Comment on lines 378 to 379
if (strncmp(nmea.term, message_to_send, nmea.term_offset) !=
0) { // ignore sync byte in comparison
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (strncmp(nmea.term, message_to_send, nmea.term_offset) !=
0) { // ignore sync byte in comparison
if (strncmp(nmea.term, message_to_send, nmea.term_offset) != 0) {
// ignore sync byte in comparison

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

complete

return false;
} else if (strncmp(nmea.term, "VNERR", nmea.term_offset) == 0) {
AP_HAL::panic("VectorNav received VN error response");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Panicing is not a great idea, it effectively just gives up. Would you expect to see this in normal operation? In which case it should trigger a pre-arm. Or is it only if there is some edge case and very unlikely? I which case you could throw a internal error.

Would it be worth trying a reset of the vector nav to see if we could recover?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@IamPete1

The only cases I'd expect to see an error in response is

  1. If something is catastrophically wrong with the VN sensor (hardware failure, etc)
  2. If the code was incorrectly edited and sent a command that isn't acceptable to the sensor
  3. If the AP-configured IMU Rate (via get_rate()) is set too high, we will reply with insufficient baudrate VNERR

I don't think any of those cases would be resolved by resetting the sensor, but I agree especially since (3) is very viable with normal use panicing may not be the not the correct response. Given the failure cases, what would you suggest instead?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, in that case, I would print the error specific (GCS_SEND_TEXT) so the user will have some feedback. Then you need to make sure that they can't arm. Hopefully the existing health checks are sufficient, if not then update those so they give a "init failed" error.

Really there are three things that you need to acheave:

  • Not prevent the rest of the code running.
  • Give the user a clue so they can track down how to resolve
  • Prevent the vehicle flying in a unsafe state

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just pushed a change to resolve this issue.

In this change, the system will continue to send the message to the VectorNav every READ_REQUEST_RETRY_MS (500ms). The pre-arm check should continue to show as a failure, because the system will never pass initialization (assuming the VNERR is never resolved) THis is consistent with previous behavior: block initialization and retry until a valid response is received. It will result in the received error message being sent to GCS every 500ms.

libraries/AP_ExternalAHRS/AP_ExternalAHRS_VectorNav.cpp Outdated Show resolved Hide resolved
libraries/AP_ExternalAHRS/AP_ExternalAHRS_VectorNav.cpp Outdated Show resolved Hide resolved
Expands wait_register_response (and renames to send_command_blocking)
to accept any arbitrary command, and uses the new method for all sent
commands
…her than just active port

This change prevents a commonly seen baudrate overflow error on the unused port.
Also pause asynchronous communications during initial configuration.
@lashVN lashVN force-pushed the Add_validation_to_initialization_configuration_commands branch from e918727 to d4722b1 Compare July 6, 2024 20:00
Print received VNERR response to console and continually retry sending the configuration message, instead of panicing
Add clarity to send_command_blocking by improving comment and directly returning instead of breaking
Copy link
Contributor

@peterbarker peterbarker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs to pass CI. ALSO needs to not access an uninitalised string on failure:
image

Please also see https://github.com/peterbarker/ardupilot/pull/new/pr/vectornav-run-command-format on how run_command could be done rather more neatly. The method would need to be marked as taking varargs, however. I'm still not really happy with the way that command is passed into decode, but I think you'd agree the varargs message_format is rather nicer.

request_sent = now;
}

int16_t nbytes = uart->available();
while (nbytes-- > 0) {
char c = uart->read();
if (decode(c)) {
if (nmea.error_response && nmea.sentence_done) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might be problematic as the driver evolves. There's an assumption in here that whatever is in the pipeline to/from the device is the command we're currently trying to execute.

That may not be the case outside of the initialisation phase of the driver.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@peterbarker

I don't think I understand this comment.

I agree that the driver assumes that a blocking call doesn't have an interrupting call. Is that what you mean?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I was getting at here is that your UART might contain incoming data that isn't related to the command you just sent. If you parse a message out of that buffer you may attribute it to the command you sent rather than a previous command.

It's not an easy thing to resolve (flushing the input buffer isn't actually sufficient as there may be outstanding commands on the way t into the VectorNav.

If you're not fussed with this I don't particular mind; it's not a major safety issue for the vehicle.

case 1:
if (nmea.error_response) {
GCS_SEND_TEXT(MAV_SEVERITY_CRITICAL, "VectorNav received VNERR code: %s", nmea.term);
} else if (strncmp(nmea.term, message_to_send + 6,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
} else if (strncmp(nmea.term, message_to_send + 6,
} else if (strncmp(nmea.term, MIN(message_to_send + 6, strlen(message_to_send))

I'm not a fan of the structure of this driver, and I haven't even really understood it yet. But the code that's currently in here looks fall-out-of-sky dangerous here. It's probably not possible to get in here with strlen(message_to_send) less than 6, but if we do get here then bad things may happen.

Copy link
Contributor Author

@lashVN lashVN Jul 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@peterbarker

It should never be possible to hit this point with length < 6 (All responses must begin with "VNXXX," as we validated in the prior term, and to hit the term==1, there has to have been something received after the comma)

I'm also not clear that it would be particularly dangerous, would it not just worst case compare to unutilized memory allocated to message_to_send? Of course I'm assuming message_to_send capacity is greater than 6 + nmea.term_offset

default:
return false;
return true;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This deserves a comment - we ignore any terms past the third?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The prior iteration of this driver only received responses up to 3 terms, meaning it should mark any response it saw above 3 as invalid. This will now receive 5 or 6 term responses, so it shouldn't invalidate the responses longer than 3 terms.

libraries/AP_ExternalAHRS/AP_ExternalAHRS_VectorNav.cpp Outdated Show resolved Hide resolved
"VNERR" does not match beginning of message_to_send, so have to prevent it from returning due to the string compare. Also must prevent exiting the decode before the sentence has completed so that we can go on to print the error code term.
Fix bug preventing disabling of ASCII measurements
Instead of snprintf the command to message_to_send then executing run_command, allow run_command to accept a string and format spec
Add check to ensure message_to_send is greater than length 6 before attempting to read past 6
…d commands

Aside from the RRG,1 call, the simulation should reply with exactly what was received as a receipt confirmation to allow the driver to proceed past init
@lashVN lashVN force-pushed the Add_validation_to_initialization_configuration_commands branch from e00f620 to a8370f3 Compare July 16, 2024 05:56
Copy link
Contributor

@peterbarker peterbarker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

request_sent = now;
}

int16_t nbytes = uart->available();
while (nbytes-- > 0) {
char c = uart->read();
if (decode(c)) {
if (nmea.error_response && nmea.sentence_done) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I was getting at here is that your UART might contain incoming data that isn't related to the command you just sent. If you parse a message out of that buffer you may attribute it to the command you sent rather than a previous command.

It's not an easy thing to resolve (flushing the input buffer isn't actually sufficient as there may be outstanding commands on the way t into the VectorNav.

If you're not fussed with this I don't particular mind; it's not a major safety issue for the vehicle.

libraries/SITL/SIM_VectorNav.h Outdated Show resolved Hide resolved
@peterbarker
Copy link
Contributor

@IamPete1 are you happy for this to be merged in its current state?

@IamPete1
Copy link
Member

@IamPete1 are you happy for this to be merged in its current state?

Yeah, looks OK.

Move receive_buf to method scope allocation rather than class scope
@tridge tridge merged commit 3f00280 into ArduPilot:master Jul 17, 2024
93 checks passed
@lashVN lashVN deleted the Add_validation_to_initialization_configuration_commands branch July 29, 2024 19:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants