Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TestFleetMode is failing #6308

Closed
barkbay opened this issue Jan 11, 2023 · 5 comments
Closed

TestFleetMode is failing #6308

barkbay opened this issue Jan 11, 2023 · 5 comments

Comments

@barkbay
Copy link
Contributor

barkbay commented Jan 11, 2023

TestFleetMode is failing in cloud-on-k8s-e2e-tests-snapshot-versions:

13:31:25  {"Time":"2023-01-04T12:31:25.72786109Z","Action":"output","Package":"github.com/elastic/cloud-on-k8s/v2/test/e2e/agent","Test":"TestFleetMode/ES_data_should_pass_validations","Output":"    --- FAIL: TestFleetMode/ES_data_should_pass_validations (1800.00s)\n"}
13:31:25  {"Time":"2023-01-04T12:31:25.727873411Z","Action":"fail","Package":"github.com/elastic/cloud-on-k8s/v2/test/e2e/agent","Test":"TestFleetMode/ES_data_should_pass_validations","Elapsed":1800}
13:31:25  {"Time":"2023-01-04T12:31:25.727880965Z","Action":"fail","Package":"github.com/elastic/cloud-on-k8s/v2/test/e2e/agent","Test":"TestFleetMode","Elapsed":1893.41}
13:31:25  {"Time":"2023-01-04T12:31:25.72789171Z","Action":"output","Package":"github.com/elastic/cloud-on-k8s/v2/test/e2e/agent","Output":"FAIL\n"}
@barkbay
Copy link
Contributor Author

barkbay commented Jan 11, 2023

I'm running all the Agent tests right now using ECK 2.6.0 to open a PR soonish. I might also need to disable TestFleetAgentWithoutTLS which is failing for no obvious reasons:

image

@barkbay
Copy link
Contributor Author

barkbay commented Jan 11, 2023

TestAgentVersionUpgradeToLatest8x was eventually ✅

image

@thbkrkr
Copy link
Contributor

thbkrkr commented Jan 17, 2023

I took a quick look to see if this is similar to:

I see the same pattern: "enrollment and shutdown" but then the agent is restarted.

>  k -n e2e-mercury logs test-agent-fleet-fs-nlnt-agent-696889fb48-8qj7g | grep '{' | jq .message | less
"Fleet Server - Starting"
"Updating running component model"
"Running on policy with Fleet Server integration: eck-fleet-server; missing config fleet.agent.id (expected during bootstrap process)"
"Unit state changed fleet-server-default-fleet-server (STARTING->DEGRADED): Running on policy with Fleet Server integration: eck-fleet-server; missing config fleet.agent.id (expected during bootstrap process)"
"Unit state changed fleet-server-default (STARTING->DEGRADED): Running on policy with Fleet Server integration: eck-fleet-server; missing config fleet.agent.id (expected during bootstrap process)"
"Elastic Agent successfully enrolled"
"Running on policy with Fleet Server integration: eck-fleet-server; missing config fleet.agent.id (expected during bootstrap process)"
"applying new components data"
"force load on unknown policyId"
"Elastic Agent successfully enrolled"
"Ignore policy that has not passed through coordinator"
"Ignore policy that has not passed through coordinator"
"Elastic Agent successfully enrolled"
"Fleet Server - Running on policy with Fleet Server integration: eck-fleet-server; missing config fleet.agent.id (expected during bootstrap process)"
"Starting enrollment to URL: https://test-agent-fleet-fs-nlnt-agent-http.e2e-mercury.svc:8220/"
"applying new components data"
"Elastic Agent successfully enrolled"
"Elastic Agent has been enrolled; start Elastic Agent"
"service.HandleSignals invoked stop function. Shutting down"
"Shutting down Elastic Agent and sending last events..."
"On signal"
"Shutting down completed."
...
"Stats endpoint (127.0.0.1:6791) finished: accept tcp 127.0.0.1:6791: use of closed network connection"
"APM instrumentation disabled"
"Gathered system information"
"Detected available inputs and outputs"
"Capabilities file not found in /usr/share/elastic-agent/capabilities.yml"
"Determined allowed capabilities"
"Parsed configuration and determined agent is managed by Fleet"
"Starting stats endpoint"
...

Looking at log matching *filebeat*, there is a first panic closing a closed channel and then it restarts and reports:

"Filebeat is unable to load the ingest pipelines for the configured modules because the Elasticsearch output is not configured/enabled. If you have already loaded the ingest pipelines or are using Logstash pipelines, you can ignore this warning."
"Loading Inputs: 0"
# all filebeat log messages
> k -n e2e-mercury logs test-agent-fleet-fs-nlnt-agent-696889fb48-8qj7g | grep 'filebeat' | jq .message
"Home path: [/usr/share/elastic-agent/data/elastic-agent-b79a5d/components] Config path: [/usr/share/elastic-agent/data/elastic-agent-b79a5d/components] Data path: [/usr/share/elastic-agent/state/data/run/filestream-monitoring] Logs path: [/usr/share/elastic-agent/data/elastic-agent-b79a5d/components/logs]"
"Beat ID: d36ea9e6-dd3e-47ba-82e9-e4c3f9e9ce6d"
"Output reload is enabled, the beat will restart as needed on change of output config"
"Set gc percentage to: 100"
"running under elastic-agent, per-beat lockfiles disabled"
"Starting stats endpoint"
"Syscall filter successfully installed"
"Beat info"
"Build info"
"Go runtime info"
"Host info"
"Process info"
"Setup Beat: filebeat; Version: 8.6.0"
"Output is configured through Central Management"
"Metrics endpoint listening on: /usr/share/elastic-agent/state/data/tmp/filestream-monitoring.sock (configured: unix:///usr/share/elastic-agent/state/data/tmp/filestream-monitoring.sock)"
"Beat name: test-agent-fleet-fs-nlnt-agent-696889fb48-8qj7g"
"Enabled modules/filesets: "
"Filebeat is unable to load the ingest pipelines for the configured modules because the Elasticsearch output is not configured/enabled. If you have already loaded the ingest pipelines or are using Logstash pipelines, you can ignore this warning."
"Starting metrics logging every 30s"
"filebeat start running."
"Finished loading transaction log file for '/usr/share/elastic-agent/state/data/run/filestream-monitoring/registry/filebeat'. Active transaction id=0"
"Filebeat is unable to load the ingest pipelines for the configured modules because the Elasticsearch output is not configured/enabled. If you have already loaded the ingest pipelines or are using Logstash pipelines, you can ignore this warning."
"States Loaded from registrar: 0"
"Loading Inputs: 0"
"Loading and starting Inputs completed. Enabled inputs: 0"
"elasticsearch url: http://localhost:9200"
"error when check request status for getting IMDSv2 token: http request status 405. No token in the metadata request will be used."
"Input 'filestream' starting"
"add_kubernetes_metadata: kubernetes env detected, with version: v1.24.5-gke.600"
"kubernetes: Node gke-thb-dev-default-pool-b2e32065-fzrr discovered by in cluster pod node query"
"add_cloud_metadata: hosting provider type detected as gcp, metadata={\"cloud\":{\"account\":{\"id\":\"elastic-cloud-dev\"},\"availability_zone\":\"europe-west6-b\",\"instance\":{\"id\":\"7118450474260642427\",\"name\":\"gke-thb-dev-default-pool-b2e32065-fzrr\"},\"machine\":{\"type\":\"n1-standard-8\"},\"project\":{\"id\":\"elastic-cloud-dev\"},\"provider\":\"gcp\",\"service\":{\"name\":\"GCE\"}},\"orchestrator\":{\"cluster\":{\"name\":\"thb-dev\",\"url\":\"https://10.111.176.2\"}}}"
"Connecting to backoff(elasticsearch(http://localhost:9200))"
"Error dialing dial tcp 127.0.0.1:9200: connect: connection refused"
"Failed to connect to backoff(elasticsearch(http://localhost:9200)): Get \"http://localhost:9200\": dial tcp 127.0.0.1:9200: connect: connection refused"
"Attempting to reconnect to backoff(elasticsearch(http://localhost:9200)) with 1 reconnect attempt(s)"
"Error dialing dial tcp 127.0.0.1:9200: connect: connection refused"
"beat is restarting because output changed"
"Stopping filebeat"
"Stopping 1 runners ..."
"Reader was closed. Closing."
"Reader was closed. Closing."
"panic: close of closed channel"
"goroutine 216 [running]:"
"github.com/elastic/beats/v7/libbeat/processors/add_kubernetes_metadata.(*cache).stop(...)"
"github.com/elastic/beats/v7/libbeat/processors/add_kubernetes_metadata/cache.go:97"
"github.com/elastic/beats/v7/libbeat/processors/add_kubernetes_metadata.(*kubernetesAnnotator).Close(0xc000dee700?)"
"github.com/elastic/beats/v7/libbeat/processors/add_kubernetes_metadata/kubernetes.go:311 +0x4f"
"github.com/elastic/beats/v7/libbeat/processors.Close(...)"
"github.com/elastic/beats/v7/libbeat/processors/processor.go:58"
"github.com/elastic/beats/v7/libbeat/publisher/processing.(*group).Close(0xc00016ac28?)"
"github.com/elastic/beats/v7/libbeat/publisher/processing/processors.go:95 +0x159"
"github.com/elastic/beats/v7/libbeat/processors.Close(...)"
"github.com/elastic/beats/v7/libbeat/processors/processor.go:58"
"github.com/elastic/beats/v7/libbeat/publisher/processing.(*group).Close(0x0?)"
"github.com/elastic/beats/v7/libbeat/publisher/processing/processors.go:95 +0x159"
"github.com/elastic/beats/v7/libbeat/processors.Close(...)"
"github.com/elastic/beats/v7/libbeat/processors/processor.go:58"
"github.com/elastic/beats/v7/libbeat/publisher/pipeline.(*client).Close.func1()"
"github.com/elastic/beats/v7/libbeat/publisher/pipeline/client.go:167 +0x2df"
"sync.(*Once).doSlow(0x55e550adf090?, 0x0?)"
"sync/once.go:68 +0xc2"
"sync.(*Once).Do(...)"
"sync/once.go:59"
"github.com/elastic/beats/v7/libbeat/publisher/pipeline.(*client).Close(0xc000dfec60?)"
"github.com/elastic/beats/v7/libbeat/publisher/pipeline/client.go:148 +0x59"
"github.com/elastic/beats/v7/libbeat/publisher/pipeline.(*Pipeline).runSignalPropagation(0xc000a90fb8?)"
"github.com/elastic/beats/v7/libbeat/publisher/pipeline/pipeline.go:363 +0x226"
"created by github.com/elastic/beats/v7/libbeat/publisher/pipeline.(*Pipeline).registerSignalPropagation.func1"
"github.com/elastic/beats/v7/libbeat/publisher/pipeline/pipeline.go:315 +0x96"

"Home path: [/usr/share/elastic-agent/data/elastic-agent-b79a5d/components] Config path: [/usr/share/elastic-agent/data/elastic-agent-b79a5d/components] Data path: [/usr/share/elastic-agent/state/data/run/filestream-monitoring] Logs path: [/usr/share/elastic-agent/data/elastic-agent-b79a5d/components/logs]"
"Beat ID: d36ea9e6-dd3e-47ba-82e9-e4c3f9e9ce6d"
"Output reload is enabled, the beat will restart as needed on change of output config"
"Set gc percentage to: 100"
"running under elastic-agent, per-beat lockfiles disabled"
"Starting stats endpoint"
"Syscall filter successfully installed"
"Metrics endpoint listening on: /usr/share/elastic-agent/state/data/tmp/filestream-monitoring.sock (configured: unix:///usr/share/elastic-agent/state/data/tmp/filestream-monitoring.sock)"
"Beat info"
"Build info"
"Go runtime info"
"Host info"
"Process info"
"Setup Beat: filebeat; Version: 8.6.0"
"Output is configured through Central Management"
"Beat name: test-agent-fleet-fs-nlnt-agent-696889fb48-8qj7g"
"Enabled modules/filesets: "
"Filebeat is unable to load the ingest pipelines for the configured modules because the Elasticsearch output is not configured/enabled. If you have already loaded the ingest pipelines or are using Logstash pipelines, you can ignore this warning."
"Starting metrics logging every 30s"
"filebeat start running."
"Finished loading transaction log file for '/usr/share/elastic-agent/state/data/run/filestream-monitoring/registry/filebeat'. Active transaction id=4"
"Filebeat is unable to load the ingest pipelines for the configured modules because the Elasticsearch output is not configured/enabled. If you have already loaded the ingest pipelines or are using Logstash pipelines, you can ignore this warning."
"States Loaded from registrar: 0"
"Loading Inputs: 0"
"Loading and starting Inputs completed. Enabled inputs: 0"
"Non-zero metrics in the last 30s"
"Non-zero metrics in the last 30s"
"Non-zero metrics in the last 30s"
"Non-zero metrics in the last 30s"
"Non-zero metrics in the last 30s"
"Non-zero metrics in the last 30s"
"Non-zero metrics in the last 30s"
"Non-zero metrics in the last 30s"
"Non-zero metrics in the last 30s"
"Non-zero metrics in the last 30s"
"Non-zero metrics in the last 30s"
"Non-zero metrics in the last 30s"

@thbkrkr
Copy link
Contributor

thbkrkr commented Jan 23, 2023

#6331 (comment):

There is at least one bug we know of in 8.6.0 that will cause Filebeat not to be able to write events basically at random: elastic/elastic-agent#2086. You will see output unit has no config in the agent logs if this is the case.

output unit has no config is present in the fleet-server pod log (8 occurences in e2e-rjx72-mercury/pod/test-agent-fleet-fs-95sg-agent-8fd957f5-qgjrc/logs.txt in the eck-diagnostic attached to the failing build).

It should the proof that TestFleetModefailed due to:

@thbkrkr
Copy link
Contributor

thbkrkr commented Jan 31, 2023

Closing because TestFleetMode seems resolved in 8.6.1.

I created a new issue for TestFleetAgentWithoutTLSwhich is still failing but for another reason.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants