[core-metadata] continuously logs error if support-notifications isn't running #2524

tonyespy · 2020-04-27T22:17:00Z

🐞 Bug Report

By definition, the support-* services are optional, however if support-notifications is disabled, core-metadata continuously logs an error about a missing endpoint every 10-15s.

This was originally discovered in the snap (which disabled support-notifications by default) built from my latest snap development branch.

I tried reproducing with docker, however I'm getting redis AUTH failures with the latest docker-compose-nexus-redis.yml file from developer-scripts.

Affected Services

The issue is located in: core-metadata

Is this a regression?

Maybe

Description and Minimal Reproduction

If support-notifications is not running, core-metadata logs the following error every 10-15s:

Apr 27 17:40:27 xen edgexfoundry.core-metadata[4831]: unable to get service endpoint for edgex-support-notifications: no matching service endpoint found

🌍 Your Environment

Deployment Environment:
Snap installed on Ubuntu 16.04 Desktop

EdgeX Version:
Geneva built from master

The text was updated successfully, but these errors were encountered:

akramtexas · 2020-04-29T15:37:23Z

Investigating this issue, starting with setting up Snap build, installed on Ubuntu 16.04 Desktop in a Virtual Box. Will report findings 👍

akramtexas · 2020-04-29T18:58:38Z

The Deployment Environment was Snap installed on Ubuntu 16.04 Desktop, but after reaching out to @tonyespy, got it clarified that we are pretty sure that the bug has nothing to do with the snap.
Focusing for now on running the latest docker-compose.yaml after commenting-out the entry for support-notifications.

tonyespy · 2020-04-29T19:23:29Z

espy@xen$ pwd
~/edgexfoundry/developer-scripts/releases/nightly-build/compose-files
espy@xen$ git show HEAD
commit acba6e17684a167e6efd3a55305b0fcd364470a3
Merge: 9917225 6e0b15c
Author: Lenny Goodell 44779287+lenny-intel@users.noreply.github.com
Date: Tue Apr 28 09:20:11 2020 -0700
Merge pull request #266 from lenny-intel/RemoveOverrides
refact: Remove unneeded overrides for DB username & password
espy@xen$ git diff

diff --git a/releases/nightly-build/compose-files/docker-compose-nexus-redis.yml b/releases/nightly-build/compose-files/docker-compose-nexus-redis.yml
index fc4ce69..60d25bc 100644
--- a/releases/nightly-build/compose-files/docker-compose-nexus-redis.yml
+++ b/releases/nightly-build/compose-files/docker-compose-nexus-redis.yml
@@ -333,24 +333,6 @@ services:
   depends_on:
      - logging
- notifications:
-  image: nexus3.edgexfoundry.org:10004/docker-support-notifications-go:master
-  ports:
-   - "48060:48060"
-  container_name: edgex-support-notifications
-  hostname: edgex-support-notifications
-  networks:
-   - edgex-network
-  environment:
-   <<: *common-variables
-   Service_Host: edgex-support-notifications
-   SecretStore_TokenFile: /tmp/edgex/secrets/edgex-support-notifications/secrets-token.json
-  volumes:
-   - /tmp/edgex/secrets/ca:/tmp/edgex/secrets/ca:ro,z
-   - /tmp/edgex/secrets/edgex-support-notifications:/tmp/edgex/secrets/edgex-support-notifications:ro,z
-  depends_on:
-   - logging

espy@xen$ docker-compose -f docker-compose-nexus-redis.yml up
.
.
.
edgex-core-metadata | unable to get service endpoint for edgex-support-notifications: no matching service endpoint found

^^ using the latest developer-scripts

I also reproduced it using docker-compose-nexus-redis-no-secty.yml.

akramtexas · 2020-04-30T18:48:58Z

Even with support-notifications not running, I see repeatedly that core-metadata stays up and not log the error message reported in this issue:

level=INFO ts=2020-04-30T18:21:42.8114369Z app=edgex-core-metadata source=database.go:146 msg="Database connected"
level=INFO ts=2020-04-30T18:21:42.8204501Z app=edgex-core-metadata source=telemetry.go:86 msg="Telemetry starting"
level=INFO ts=2020-04-30T18:21:42.8206424Z app=edgex-core-metadata source=httpserver.go:89 msg="Web server starting (edgex-core-metadata:48081)"
level=INFO ts=2020-04-30T18:21:42.8207578Z app=edgex-core-metadata source=message.go:50 msg="Service dependencies resolved..."
level=INFO ts=2020-04-30T18:21:42.8208429Z app=edgex-core-metadata source=message.go:51 msg="Starting edgex-core-metadata master "
level=INFO ts=2020-04-30T18:21:42.8209097Z app=edgex-core-metadata source=message.go:55 msg="This is the EdgeX Core Metadata Microservice"
level=INFO ts=2020-04-30T18:21:42.8211112Z app=edgex-core-metadata source=message.go:58 msg="Service started in: 6.1459512s"

I monitored for 20 minutes while core-metadata stayed up and did not log any error message.

akramtexas · 2020-04-30T19:01:18Z

Adding that docker-compose will evidently not pull the new instances of the master images, so we must first run the docker-compose -f docker-compose.yml pull command, which resolved the issue I had previously seen:

level=WARN ts=2020-04-30T18:03:16.9625073Z app=edgex-core-metadata source=config.go:82 msg="could not get configuration from Registry: the Configuration service (Consul) doesn't contain configuration for edgex/core/1.0/edgex-core-metadata/"

The error message reported in this issue (i.e. how core-metadata continuously logs error if support-notifications isn't running) will occur when support-notifications hasn't registered with Consul before the dependent service (core-metadata) is pinging Consul for its status. This may occur, but then go away once the dependency, i.e. the service (support-notifications) has registered.

akramtexas · 2020-04-30T19:12:36Z

So I do not see the issue when using docker-compose-nexus-redis-no-secty.yml. Similarly, I also do not see the issue when using docker-compose-nexus-redis.yml either:

level=INFO ts=2020-04-30T19:08:29.5354558Z app=edgex-core-metadata source=secret.go:73 msg="Attempting to create secretclient"
level=WARN ts=2020-04-30T19:08:29.8141518Z app=edgex-core-metadata source=secret.go:82 msg="Retryable failure while creating SecretClient: HTTP response with status code 403, message: failed to lookup token"
level=INFO ts=2020-04-30T19:08:30.8151405Z app=edgex-core-metadata source=secret.go:67 msg="Reading secret store configuration and authentication token"
level=INFO ts=2020-04-30T19:08:30.8188405Z app=edgex-core-metadata source=secret.go:73 msg="Attempting to create secretclient"
level=INFO ts=2020-04-30T19:08:30.8722054Z app=edgex-core-metadata source=secret.go:77 msg="Created SecretClient"
level=INFO ts=2020-04-30T19:08:30.8775565Z app=edgex-core-metadata source=database.go:146 msg="Database connected"
level=INFO ts=2020-04-30T19:08:30.885495Z app=edgex-core-metadata source=telemetry.go:86 msg="Telemetry starting"
level=INFO ts=2020-04-30T19:08:30.8856528Z app=edgex-core-metadata source=httpserver.go:89 msg="Web server starting (edgex-core-metadata:48081)"
level=INFO ts=2020-04-30T19:08:30.8868427Z app=edgex-core-metadata source=message.go:50 msg="Service dependencies resolved..."
level=INFO ts=2020-04-30T19:08:30.8887011Z app=edgex-core-metadata source=message.go:51 msg="Starting edgex-core-metadata master "
level=INFO ts=2020-04-30T19:08:30.8887629Z app=edgex-core-metadata source=message.go:55 msg="This is the EdgeX Core Metadata Microservice"
level=INFO ts=2020-04-30T19:08:30.8888073Z app=edgex-core-metadata source=message.go:58 msg="Service started in: 16.2716966s"

@tonyespy Please feel free to close out this issue after you've reviewed my findings above and are satisfied 👍 Thanks!

tonyespy · 2020-04-30T19:23:08Z

@akramtexas so you're saying you if you comment out support-notifications in either of those compose files and then bring everything up you're not seeing the same log message that I see? In other words, you can't reproduce the issue?

akramtexas · 2020-04-30T19:26:20Z

@akramtexas so you're saying you if you comment out support-notifications in either of those compose files and then bring everything up you're not seeing the same log message that I see? In other words, you can't reproduce the issue?

Yes @tonyespy that is correct 👍

tonyespy · 2020-04-30T19:28:48Z

But did you disable support-notifications as part of your test (ie. so it never starts)?

Yes, I sure did ✅ I had disabled support-notifications so that it never started.

tonyespy · 2020-04-30T19:32:39Z

OK, then we still have a mystery to solve... I'll look at this again later.

tonyespy · 2020-05-07T14:55:41Z

Alright, I finally had some more cycles to look into this, and once again this was super easy to reproduce using any of the latest nightly-build docker-compose files. Here's how I reproduced:

install the docker snap on an Ubuntu 18.04 Desktop system
edit your docker-compose file of choice and disable support-notifications (my edited file can be found here)
bring EdgeX up (e.g. docker-compose -f docker-compose-nexus-redis.yml up)
watch the log output and notice that eventually, core-metadata will periodically log the same message as described in the original issue description:

Apr 27 17:40:27 xen edgexfoundry.core-metadata[4831]: unable to get service endpoint for edgex-support-notifications: no matching service endpoint found

Furthermore after examining the core-metadata code, the BootstrapHandler adds a NotificationsHandler. A quick review of the metadata rest functions show that this client is used by rest_device.go and rest_deviceservice.go. My suspicion is that the metadata service may be using notifications to invoke device service callbacks when devices are added/updated/removed? I checked device-sdk-go and its callback handler only supports callbacks for devices, device profiles, and provwatchers, so the device-service notifications must be for some other service?

The other mystery was why does metadata continuously log that the notifications endpoint can't be found. @tsconn23 mentioned something about our client implementations implementing health checks, which I finally got a chance to review this morning. Apparently when a new client instance is set up, a goroutine is spawned to monitor the state of the endpoint.

I'd like this mechanism to be included in the Hanoi discussion about the SMA, which also hard-codes a set of required dependencies and implements its own health check logic (see issue #2486). As I understand it, we now have at least three overlapping health check mechanisms:

Consul's health checks
SMA (if running) health checks
health checks for every service client

As these are all implemented using REST calls, there's an awful lot of HTTP traffic happening to implement all of this logic which consumes compute resources, and in turn has a downward drag on the overall performance of the system. This might be OK for a cloud-based deployment, but on real devices, all this overhead can have a big impact. We have a service registry (Consul), why not use it for what it was designed for an monitor required endpoints using Consul? AFAIK, it's possible to get setup watchers in order to get notified when service health checks start failing...

lenny-goodell · 2020-06-19T20:52:19Z

@tonyespy , I think this issue will go away when we remove Client Monitoring. Agree?

FelixTing · 2020-11-04T03:33:50Z

This issue has been resolved by #2595.

tonyespy added the bug Something isn't working label Apr 27, 2020

AnthonyMBonafide assigned akramtexas Apr 28, 2020

AnthonyMBonafide added 1-low priority denoting isolated changes core-services geneva labels Apr 28, 2020

AnthonyMBonafide added this to the EdgeX Geneva milestone Apr 28, 2020

tsconn23 modified the milestones: EdgeX Geneva, EdgeX Hanoi May 7, 2020

tsconn23 added hanoi Hanoi release and removed geneva labels Jun 17, 2020

tsconn23 unassigned akramtexas Jun 17, 2020

tsconn23 added the help wanted Extra attention is needed label Jun 17, 2020

cloudxxx8 assigned FelixTing Nov 3, 2020

cloudxxx8 closed this as completed Nov 4, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[core-metadata] continuously logs error if support-notifications isn't running #2524

[core-metadata] continuously logs error if support-notifications isn't running #2524

tonyespy commented Apr 27, 2020

akramtexas commented Apr 29, 2020

akramtexas commented Apr 29, 2020

tonyespy commented Apr 29, 2020

akramtexas commented Apr 30, 2020 •

edited

Loading

akramtexas commented Apr 30, 2020 •

edited

Loading

akramtexas commented Apr 30, 2020

tonyespy commented Apr 30, 2020

akramtexas commented Apr 30, 2020

tonyespy commented Apr 30, 2020 •

edited by akramtexas

Loading

tonyespy commented Apr 30, 2020

tonyespy commented May 7, 2020

lenny-goodell commented Jun 19, 2020

FelixTing commented Nov 4, 2020 •

edited

Loading

[core-metadata] continuously logs error if support-notifications isn't running #2524

[core-metadata] continuously logs error if support-notifications isn't running #2524

Comments

tonyespy commented Apr 27, 2020

🐞 Bug Report

Affected Services

Is this a regression?

Description and Minimal Reproduction

🌍 Your Environment

akramtexas commented Apr 29, 2020

akramtexas commented Apr 29, 2020

tonyespy commented Apr 29, 2020

akramtexas commented Apr 30, 2020 • edited Loading

akramtexas commented Apr 30, 2020 • edited Loading

akramtexas commented Apr 30, 2020

tonyespy commented Apr 30, 2020

akramtexas commented Apr 30, 2020

tonyespy commented Apr 30, 2020 • edited by akramtexas Loading

tonyespy commented Apr 30, 2020

tonyespy commented May 7, 2020

lenny-goodell commented Jun 19, 2020

FelixTing commented Nov 4, 2020 • edited Loading

akramtexas commented Apr 30, 2020 •

edited

Loading

akramtexas commented Apr 30, 2020 •

edited

Loading

tonyespy commented Apr 30, 2020 •

edited by akramtexas

Loading

FelixTing commented Nov 4, 2020 •

edited

Loading