Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix read timeout for split message responses #75

Closed
wants to merge 1 commit into from

Conversation

Fabbi
Copy link

@Fabbi Fabbi commented Sep 20, 2022

This fixes an sendTimeout issue caused by split responses.

When the second part of a message was >= 8 bytes (so almost always I would assume) it would be handled as new message and potentially thrown away.

Should fix #71 and #52

@CLAassistant
Copy link

CLAassistant commented Sep 20, 2022

CLA assistant check
All committers have signed the CLA.

@dinesharjani
Copy link
Contributor

dinesharjani commented Sep 20, 2022

Checking. Give me a minute. But thanks for the submission!

@dinesharjani
Copy link
Contributor

@Fabbi I don't think this is a proper fix - the culprit, following your PR, has to be the guard statement in McuMgrBleTransportWriteState's received(), which is the one triggering the 'open'. So, this is the function not properly detecting that the chunked response has been received.

Furthermore, the McuMgrResponse.getExpectedLength(scheme: .ble, responseData: data) call is returning a bad result? Meaning we're expecting more or less bytes and that's why the validation state does not continue?

@Fabbi
Copy link
Author

Fabbi commented Sep 20, 2022

In my case the McuMgrResponse.getExpedtedLength did work as intended but the following bytes were interpreted as another message because data.readMcuMgrHeaderSequenceNumber() always returns a number as long as data.count >= 8.
But those missing bytes came in raw with no real header!

received() can't be the Problem as it get's called from McuMgrBleTransport+CBPerip[heralDelegate:peripheral with the wrong sequenceNumber

I agree, that McuMgrResponse.getExpectedLength seems to miss some implementation and thus can trigger bad behaviour there, but that's the case before and after the PR. This PR only fixes the sendTimeout when messages are split.

I was able to upgrade my firmware with this code active.

@dinesharjani
Copy link
Contributor

Let me think about it a bit better.

By the way - would you be able to provide me with a firmware file to flash so that I may test and see this happening myself?

@dinesharjani
Copy link
Contributor

@Fabbi do you see something like "OOD Packet: Received Seq No. 4 instead of expected Seq No. 2" in the logs when sendTimeout happens?

@Fabbi
Copy link
Author

Fabbi commented Sep 20, 2022

Let me think about it a bit better.

By the way - would you be able to provide me with a firmware file to flash so that I may test and see this happening myself?

I'm afraid not, as it's a proprietary bin of a client.

@Fabbi do you see something like "OOD Packet: Received Seq No. 4 instead of expected Seq No. 2" in the logs when sendTimeout happens?

No, the timeout doesn't happen on upload but on receiving data from the ImageManager.list command.

That's the log I get on the current version.
I stepped through the code and ended up in the methods I changed in this PR.

dfu  -︎  Upgrade started with 1 images using 'Test And Confirm' mode
dfu  -︎  Requesting device capabilities...
default  -︎  Sending read command (Group: default, seq: 0, ID: McuMgrParameters): nil
transport  -︎  Connecting...
transport  -︎  Peripheral connected
transport  -︎  Discovering services...
transport  -︎  Services discovered: 8D53DC1D-1DB7-4CD3-868B-8A527460AA84
transport  -︎  Discovering characteristics...
transport  -︎  Characteristics discovered: DA2E7828-FBCE-4E01-AE9E-261174997C48
transport  -︎  Enabling notifications...
transport  -︎  Notifications enabled
transport  -︎  Device ready
transport  -︎  -> 0x0000000100000006a0
transport  -︎  <- 0x0100000600000006bf62726308ff
default  -︎  Response (Group: default, seq: 0, ID: 6): {"rc" : 8}
dfu  -︎  Device capabilities not supported
dfu  -︎  Sending Image List command...
---------------------------------------------------------------------
image  -︎  Sending read command (Group: image, seq: 0, ID: State): nil
---------------------------------------------------------------------
transport  -︎  -> 0x0000000100010000a0
transport  -︎  Sending the request timed out.
image  -︎  Request (Group: image, seq: 0) failed: Sending the request timed out.)
dfu  -︎  Sending the request timed out.

the Sending read command (Group: image, seq: 0, ID: State): nil (highlighted above) results in McuMgrBleTransportWriteState.state[0] with totalChunkSize: 259.
The first time McuMgrBleTransportWriteState.received(0, ...) is invoked with 144 bytes. Then another 15 bytes come in which will be interpreted as new Message with sequenceNumber=116, so McuMgrBleTransportWriteState.received(116, <15bytes>) is invoked which throws everything away without error because self.state[116] == nil.

@dinesharjani
Copy link
Contributor

@Fabbi thanks for the feedback.

I was able to reproduce the issue, I think. So I've been working on properly understanding the cause. I have an idea that might help, but I'm not sure.

In any case, while I'm working on it - you have your own fix that works for you, so at least your customer doesn't have the issue. Let me see if I can make good progress today.

@dinesharjani
Copy link
Contributor

Hi @Fabbi - can you check out the master branch and see if the issue keeps happening ?

Thanks.

@dinesharjani
Copy link
Contributor

Implemented here #84

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Stuck at "validate" step on 1.2.5, works on 1.1.0
3 participants