Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Insteon: ALDB_i2 Link Scan, Link Data Received out of Order May Cause Queue to Stall #258

Closed
krkeegan opened this issue Sep 26, 2013 · 0 comments
Assignees
Labels

Comments

@krkeegan
Copy link
Collaborator

This issue was discovered by @pmatis.

The sequence of events appears to be as follows, these events occur in the scanning of an ALDB_i2 link table:

  1. MH issues a read command for link address 0FE7
  2. ACK Received
  3. Link Data for 0FE7 is received
  4. MH issues a read command for link address 0FDF
  5. Ack Received
  6. Link Data for 0FE7 is received <-- duplicate out of order packet
    6a. Because of the long delay, this packet is not caught as a duplicate packet
  7. This message is passed to the ALDB_i2 Link parser which perceives this as a corrupt response.
  8. The parser then tries to queue a request to read link address 0FDF again
  9. MH catches this as an attempt to queue a command already in the queue and ignores it
  10. Another ACK is received
    10a. It is unclear what this is in response to
  11. The ALDB_i2 Link parser ignores the ACK because it can't correlate it to a sent message
  12. Link Data for 0FDF is finally received
  13. The ALDB_i2 Link parser ignores the link data claiming that an ACK was not received.
  14. At this point, the queue for the device stalls and nothing else happens.

Quick Diagnosis:

  1. At step 8, I don't think the parser should be trying to queue a new message request. Instead, the parser should just fail to acknowledge receiving anything, this should result in the message handler sending a message retry in its normal course of action.
  2. Steps 11 and 13. It is unclear to me why the parser initially claims that it cannot correlate the ACK to anything, but then subsequently claims that an ACK was never received that it was expecting.
  3. It is also unclear why the queue timer is being cleared and never reset. This is what causes the entire process to stall.
@ghost ghost assigned krkeegan Sep 26, 2013
krkeegan added a commit to krkeegan/misterhouse that referenced this issue Sep 27, 2013
on_read_write_aldb now returns a 1/0 corresponding to whether the current message should be cleared.

When a bad message arrived, on_read_write_aldb attempted to requeue the message that was currently pending.  However, _process_message did not clear the pending message until after this routine was run.  As a result, a new message was not queued because it was duplicative, but then the current message was cleared.  This resulted in stalling the message queue.

Fixes bug hollie#258
pmatis pushed a commit to pmatis/misterhouse that referenced this issue Sep 27, 2013
on_read_write_aldb now returns a 1/0 corresponding to whether the current message should be cleared.

When a bad message arrived, on_read_write_aldb attempted to requeue the message that was currently pending.  However, _process_message did not clear the pending message until after this routine was run.  As a result, a new message was not queued because it was duplicative, but then the current message was cleared.  This resulted in stalling the message queue.

Fixes bug hollie#258
krkeegan added a commit to krkeegan/misterhouse that referenced this issue Sep 28, 2013
Missed one instance in which the queued message should not be cleared.
Should not be cleared on an unhandled mem action either.

Further Fix to hollie#258
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant