Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: use provision watcher logic for discovered devices #42

Merged
merged 2 commits into from
Jul 19, 2021

Conversation

ajcasagrande
Copy link
Contributor

Closes #39

Signed-off-by: Anthony Casagrande anthony.j.casagrande@intel.com

PR Checklist

Please check if your PR fulfills the following requirements:

  • Tests for the changes have been added (for bug fixes / features)
  • Docs have been added / updated (for bug fixes / features)

What is the current behavior?

A workaround for adding devices directly to core-metadata due to legacy bugs in EdgeX's device-sdk-go.

Issue Number:

#39

What is the new behavior?

Use the updated provision watcher code provided in EdgeX Hanoi.

Does this PR introduce a breaking change?

  • No

New Imports

  • No

edgexfoundry#39

Signed-off-by: Anthony Casagrande <anthony.j.casagrande@intel.com>
@codecov-commenter
Copy link

codecov-commenter commented Jun 24, 2021

Codecov Report

Merging #42 (688b535) into main (b1162a6) will increase coverage by 0.27%.
The diff coverage is 82.60%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main      #42      +/-   ##
==========================================
+ Coverage   40.77%   41.05%   +0.27%     
==========================================
  Files          21       21              
  Lines        9094     9081      -13     
==========================================
+ Hits         3708     3728      +20     
+ Misses       4211     4181      -30     
+ Partials     1175     1172       -3     
Impacted Files Coverage Δ
internal/driver/driver.go 23.81% <0.00%> (+1.30%) ⬆️
internal/driver/discover.go 65.61% <85.71%> (+1.40%) ⬆️
internal/driver/types.go 80.00% <100.00%> (ø)
internal/retry/retry.go 93.12% <0.00%> (ø)
internal/llrp/reader.go 41.24% <0.00%> (+1.10%) ⬆️
internal/driver/types_string.go 72.72% <0.00%> (+9.09%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b1162a6...688b535. Read the comment docs.

Copy link
Member

@lenny-goodell lenny-goodell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but will let @tonyespy & @siggiskulason have final say

Copy link
Member

@tonyespy tonyespy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, the changes look good to me, however I do want to raise a few points that came up during this review.

This PR addresses issue #39, which in turn lists four issues in the device-sdk-go which were fixed:

I just wanted to note that 598 is listed twice, and double check whether there's actually a fourth issue, or whether the list should've only had three issues?

I'll also point out that none of the issue descriptions for the device-sdk-go issues include version numbers, so I had to do a bit of detective work to figure out which version the SDK actually had these bugs.

Finally, the locking issue (edgexfoundry/device-sdk-go#609) is a bit puzzling. I believe the original design was correct and that releasing the lock when the SDK read a list of devices from the channel, was intentional. Discovery() was never meant to be a long-running operation that trickled up devices over a long period of time, it was intended to be called and result in a batch of discovered devices to be returned. That said, it clearly wasn't well-documented in the device-sdk-go's godoc for the ProtocolDiscovery.Discover() method:

// Discover triggers protocol specific device discovery, asynchronously
// writes the results to the channel which is passed to the implementation
// via ProtocolDriver.Initialize(). The results may be added to the device service
// based on a set of acceptance criteria (i.e. Provision Watchers).

One could argue that details about locking/synchronization in shouldn't be exposed in a public API, however there's no guidance to developers as to the fact that only a single instance of discovery can be running at a given time and that the Discover() method must write data to the channel in order signal completion of the operation. If this had been clearly communicated, there would've been no reason to re-factor the SDK to use a wrapper function to release the lock on completion of the Discovery() method. The original implementation was much cleaner/correct Go IMHOP.

And finally, I'll mention that when the SDK was re-factored to "fix" this issue, the godoc still wasn't updated. I think it would be a good idea for us to do so... @iain-anderson @cloudxxx8

@ajcasagrande
Copy link
Contributor Author

@tonyespy A few comments:

  1. Apologies for the bugs not having versions. They were all logged against the current main branches at the time of creation, which was Geneva/pre-Hanoi release.

    1. device-sdk-go does not have a template for New Issues. Maybe its time to add one?
  2. You are correct in 598 being listed twice. The list was scraped from the source code and that bug 598 was mentioned twice as it affected two pieces of code.

  3. As far as the discovery code goes, that is multi-faceted, and I will reply in another comment.

@ajcasagrande
Copy link
Contributor Author

@tonyespy I can't comment on the re-factored version as I did not code-review it, but I can attest that the original code was not very pragmatic Go IMHO.

As a developer I ran into many issues implementing the Discovery method due to the SDK (not just related to the locking), at which point I looked to the source to see why my code was not working as expected. From an outsider looking in, I was confused by the implementation as it did not seem to make sense.

  1. Ask yourself which of these statements sound like what the SDK wants to accomplish:

    1. I want subsequent calls to Discover() to be ignored as long as the first call to Discover() is still running.
    2. I want subsequent calls to Discover() to be ignored unless I have received data on a long-lived semi-global deviceCh that I gave them upon Initialization?
  2. Why would you release a lock inside of a select statement that is nowhere near where the lock was obtained (or even same go package)? There is zero traceability and opens up the gates for resource leaking and/or deadlocks, etc.

  3. Discovery ... was intended to be called and result in a batch of discovered devices to be returned ...

    If the SDK wanted me to pass them as a batch why doesn't the Discover() method just expect a return value of DiscoveredDevice slice? Why would I be given a deviceCh at time of Initialize() if the SDK did not want me to use it to asynchronously pass discovered devices the same way i use asyncCh to send Readings?

  4. Discovery() was never meant to be a long-running operation that trickled up devices over a long period of time ...

    In the specific case of this service, even if the discovery only takes 2 seconds and I discover a device in the first 50 milliseconds, why do I have to wait the whole 2 seconds to notify an asynchronous channel that I have found a device? This means needlessly buffering up an array when I could just immediately pass it to the channel.

    As an aside, I do think there are use-cases where a long-running discovery is a good idea. In internal discussions we even contemplated the idea of discovery being its own container/process itself.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

todo(edgex): Align with latest EdgeX Hanoi provision watcher code
4 participants