Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[core-metadata] continuously logs error if support-notifications isn't running #2524

Closed
tonyespy opened this issue Apr 27, 2020 · 13 comments
Closed
Assignees
Labels
1-low priority denoting isolated changes bug Something isn't working core-services hanoi Hanoi release help wanted Extra attention is needed
Milestone

Comments

@tonyespy
Copy link
Member

🐞 Bug Report

By definition, the support-* services are optional, however if support-notifications is disabled, core-metadata continuously logs an error about a missing endpoint every 10-15s.

This was originally discovered in the snap (which disabled support-notifications by default) built from my latest snap development branch.

I tried reproducing with docker, however I'm getting redis AUTH failures with the latest docker-compose-nexus-redis.yml file from developer-scripts.

Affected Services

The issue is located in: core-metadata

Is this a regression?

Maybe

Description and Minimal Reproduction

If support-notifications is not running, core-metadata logs the following error every 10-15s:

Apr 27 17:40:27 xen edgexfoundry.core-metadata[4831]: unable to get service endpoint for edgex-support-notifications: no matching service endpoint found

🌍 Your Environment

Deployment Environment:
Snap installed on Ubuntu 16.04 Desktop

EdgeX Version:
Geneva built from master

@tonyespy tonyespy added the bug Something isn't working label Apr 27, 2020
@AnthonyMBonafide AnthonyMBonafide added 1-low priority denoting isolated changes core-services geneva labels Apr 28, 2020
@AnthonyMBonafide AnthonyMBonafide added this to the EdgeX Geneva milestone Apr 28, 2020
@akramtexas
Copy link
Contributor

Investigating this issue, starting with setting up Snap build, installed on Ubuntu 16.04 Desktop in a Virtual Box. Will report findings 👍

@akramtexas
Copy link
Contributor

  • The Deployment Environment was Snap installed on Ubuntu 16.04 Desktop, but after reaching out to @tonyespy, got it clarified that we are pretty sure that the bug has nothing to do with the snap.
  • Focusing for now on running the latest docker-compose.yaml after commenting-out the entry for support-notifications.

@tonyespy
Copy link
Member Author

espy@xen$ pwd
~/edgexfoundry/developer-scripts/releases/nightly-build/compose-files
espy@xen$ git show HEAD
commit acba6e17684a167e6efd3a55305b0fcd364470a3
Merge: 9917225 6e0b15c
Author: Lenny Goodell 44779287+lenny-intel@users.noreply.github.com
Date: Tue Apr 28 09:20:11 2020 -0700
Merge pull request #266 from lenny-intel/RemoveOverrides
refact: Remove unneeded overrides for DB username & password
espy@xen$ git diff

diff --git a/releases/nightly-build/compose-files/docker-compose-nexus-redis.yml b/releases/nightly-build/compose-files/docker-compose-nexus-redis.yml
index fc4ce69..60d25bc 100644
--- a/releases/nightly-build/compose-files/docker-compose-nexus-redis.yml
+++ b/releases/nightly-build/compose-files/docker-compose-nexus-redis.yml
@@ -333,24 +333,6 @@ services:
   depends_on:
      - logging
- notifications:
-  image: nexus3.edgexfoundry.org:10004/docker-support-notifications-go:master
-  ports:
-   - "48060:48060"
-  container_name: edgex-support-notifications
-  hostname: edgex-support-notifications
-  networks:
-   - edgex-network
-  environment:
-   <<: *common-variables
-   Service_Host: edgex-support-notifications
-   SecretStore_TokenFile: /tmp/edgex/secrets/edgex-support-notifications/secrets-token.json
-  volumes:
-   - /tmp/edgex/secrets/ca:/tmp/edgex/secrets/ca:ro,z
-   - /tmp/edgex/secrets/edgex-support-notifications:/tmp/edgex/secrets/edgex-support-notifications:ro,z
-  depends_on:
-   - logging

espy@xen$ docker-compose -f docker-compose-nexus-redis.yml up
.
.
.
edgex-core-metadata | unable to get service endpoint for edgex-support-notifications: no matching service endpoint found

^^ using the latest developer-scripts

I also reproduced it using docker-compose-nexus-redis-no-secty.yml.

@akramtexas
Copy link
Contributor

akramtexas commented Apr 30, 2020

  • Even with support-notifications not running, I see repeatedly that core-metadata stays up and not log the error message reported in this issue:
level=INFO ts=2020-04-30T18:21:42.8114369Z app=edgex-core-metadata source=database.go:146 msg="Database connected"
level=INFO ts=2020-04-30T18:21:42.8204501Z app=edgex-core-metadata source=telemetry.go:86 msg="Telemetry starting"
level=INFO ts=2020-04-30T18:21:42.8206424Z app=edgex-core-metadata source=httpserver.go:89 msg="Web server starting (edgex-core-metadata:48081)"
level=INFO ts=2020-04-30T18:21:42.8207578Z app=edgex-core-metadata source=message.go:50 msg="Service dependencies resolved..."
level=INFO ts=2020-04-30T18:21:42.8208429Z app=edgex-core-metadata source=message.go:51 msg="Starting edgex-core-metadata master "
level=INFO ts=2020-04-30T18:21:42.8209097Z app=edgex-core-metadata source=message.go:55 msg="This is the EdgeX Core Metadata Microservice"
level=INFO ts=2020-04-30T18:21:42.8211112Z app=edgex-core-metadata source=message.go:58 msg="Service started in: 6.1459512s"
  • I monitored for 20 minutes while core-metadata stayed up and did not log any error message.

@akramtexas
Copy link
Contributor

akramtexas commented Apr 30, 2020

  • Adding that docker-compose will evidently not pull the new instances of the master images, so we must first run the docker-compose -f docker-compose.yml pull command, which resolved the issue I had previously seen:
level=WARN ts=2020-04-30T18:03:16.9625073Z app=edgex-core-metadata source=config.go:82 msg="could not get configuration from Registry: the Configuration service (Consul) doesn't contain configuration for edgex/core/1.0/edgex-core-metadata/"
  • The error message reported in this issue (i.e. how core-metadata continuously logs error if support-notifications isn't running) will occur when support-notifications hasn't registered with Consul before the dependent service (core-metadata) is pinging Consul for its status. This may occur, but then go away once the dependency, i.e. the service (support-notifications) has registered.

@akramtexas
Copy link
Contributor

So I do not see the issue when using docker-compose-nexus-redis-no-secty.yml. Similarly, I also do not see the issue when using docker-compose-nexus-redis.yml either:

level=INFO ts=2020-04-30T19:08:29.5354558Z app=edgex-core-metadata source=secret.go:73 msg="Attempting to create secretclient"
level=WARN ts=2020-04-30T19:08:29.8141518Z app=edgex-core-metadata source=secret.go:82 msg="Retryable failure while creating SecretClient: HTTP response with status code 403, message: failed to lookup token"
level=INFO ts=2020-04-30T19:08:30.8151405Z app=edgex-core-metadata source=secret.go:67 msg="Reading secret store configuration and authentication token"
level=INFO ts=2020-04-30T19:08:30.8188405Z app=edgex-core-metadata source=secret.go:73 msg="Attempting to create secretclient"
level=INFO ts=2020-04-30T19:08:30.8722054Z app=edgex-core-metadata source=secret.go:77 msg="Created SecretClient"
level=INFO ts=2020-04-30T19:08:30.8775565Z app=edgex-core-metadata source=database.go:146 msg="Database connected"
level=INFO ts=2020-04-30T19:08:30.885495Z app=edgex-core-metadata source=telemetry.go:86 msg="Telemetry starting"
level=INFO ts=2020-04-30T19:08:30.8856528Z app=edgex-core-metadata source=httpserver.go:89 msg="Web server starting (edgex-core-metadata:48081)"
level=INFO ts=2020-04-30T19:08:30.8868427Z app=edgex-core-metadata source=message.go:50 msg="Service dependencies resolved..."
level=INFO ts=2020-04-30T19:08:30.8887011Z app=edgex-core-metadata source=message.go:51 msg="Starting edgex-core-metadata master "
level=INFO ts=2020-04-30T19:08:30.8887629Z app=edgex-core-metadata source=message.go:55 msg="This is the EdgeX Core Metadata Microservice"
level=INFO ts=2020-04-30T19:08:30.8888073Z app=edgex-core-metadata source=message.go:58 msg="Service started in: 16.2716966s"

@tonyespy Please feel free to close out this issue after you've reviewed my findings above and are satisfied 👍 Thanks!

@tonyespy
Copy link
Member Author

@akramtexas so you're saying you if you comment out support-notifications in either of those compose files and then bring everything up you're not seeing the same log message that I see? In other words, you can't reproduce the issue?

@akramtexas
Copy link
Contributor

@akramtexas so you're saying you if you comment out support-notifications in either of those compose files and then bring everything up you're not seeing the same log message that I see? In other words, you can't reproduce the issue?

Yes @tonyespy that is correct 👍

@tonyespy
Copy link
Member Author

tonyespy commented Apr 30, 2020

But did you disable support-notifications as part of your test (ie. so it never starts)?

Yes, I sure did ✅ I had disabled support-notifications so that it never started.

@tonyespy
Copy link
Member Author

OK, then we still have a mystery to solve... I'll look at this again later.

@tonyespy
Copy link
Member Author

tonyespy commented May 7, 2020

Alright, I finally had some more cycles to look into this, and once again this was super easy to reproduce using any of the latest nightly-build docker-compose files. Here's how I reproduced:

  • install the docker snap on an Ubuntu 18.04 Desktop system
  • edit your docker-compose file of choice and disable support-notifications (my edited file can be found here)
  • bring EdgeX up (e.g. docker-compose -f docker-compose-nexus-redis.yml up)
  • watch the log output and notice that eventually, core-metadata will periodically log the same message as described in the original issue description:

Apr 27 17:40:27 xen edgexfoundry.core-metadata[4831]: unable to get service endpoint for edgex-support-notifications: no matching service endpoint found

Furthermore after examining the core-metadata code, the BootstrapHandler adds a NotificationsHandler. A quick review of the metadata rest functions show that this client is used by rest_device.go and rest_deviceservice.go. My suspicion is that the metadata service may be using notifications to invoke device service callbacks when devices are added/updated/removed? I checked device-sdk-go and its callback handler only supports callbacks for devices, device profiles, and provwatchers, so the device-service notifications must be for some other service?

The other mystery was why does metadata continuously log that the notifications endpoint can't be found. @tsconn23 mentioned something about our client implementations implementing health checks, which I finally got a chance to review this morning. Apparently when a new client instance is set up, a goroutine is spawned to monitor the state of the endpoint.

I'd like this mechanism to be included in the Hanoi discussion about the SMA, which also hard-codes a set of required dependencies and implements its own health check logic (see issue #2486). As I understand it, we now have at least three overlapping health check mechanisms:

  • Consul's health checks
  • SMA (if running) health checks
  • health checks for every service client

As these are all implemented using REST calls, there's an awful lot of HTTP traffic happening to implement all of this logic which consumes compute resources, and in turn has a downward drag on the overall performance of the system. This might be OK for a cloud-based deployment, but on real devices, all this overhead can have a big impact. We have a service registry (Consul), why not use it for what it was designed for an monitor required endpoints using Consul? AFAIK, it's possible to get setup watchers in order to get notified when service health checks start failing...

@tsconn23 tsconn23 modified the milestones: EdgeX Geneva, EdgeX Hanoi May 7, 2020
@tsconn23 tsconn23 added hanoi Hanoi release and removed geneva labels Jun 17, 2020
@tsconn23 tsconn23 added the help wanted Extra attention is needed label Jun 17, 2020
@lenny-goodell
Copy link
Member

@tonyespy , I think this issue will go away when we remove Client Monitoring. Agree?

@FelixTing
Copy link
Member

FelixTing commented Nov 4, 2020

This issue has been resolved by #2595.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
1-low priority denoting isolated changes bug Something isn't working core-services hanoi Hanoi release help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

7 participants