Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[receiver/hostmetrics] Network scraper unit test failure #27034

Closed
crobert-1 opened this issue Sep 20, 2023 · 6 comments
Closed

[receiver/hostmetrics] Network scraper unit test failure #27034

crobert-1 opened this issue Sep 20, 2023 · 6 comments

Comments

@crobert-1
Copy link
Member

Component(s)

receiver/hostmetrics

Describe the issue you're reporting

Unit tests are failing in unrelated PR:

--- FAIL: TestScrape (0.04s)
    --- FAIL: TestScrape/Standard (0.01s)
        network_scraper_test.go:232: 
            	Error Trace:	/home/runner/work/opentelemetry-collector-contrib/opentelemetry-collector-contrib/receiver/hostmetricsreceiver/internal/scraper/networkscraper/network_scraper_test.go:232
            	            				/home/runner/work/opentelemetry-collector-contrib/opentelemetry-collector-contrib/receiver/hostmetricsreceiver/internal/scraper/networkscraper/network_scraper_test.go:201
            	Error:      	Not equal: 
            	            	expected: 12
            	            	actual  : 13
            	Test:       	TestScrape/Standard
    --- FAIL: TestScrape/Standard_with_direction_removed (0.01s)
        network_scraper_test.go:232: 
            	Error Trace:	/home/runner/work/opentelemetry-collector-contrib/opentelemetry-collector-contrib/receiver/hostmetricsreceiver/internal/scraper/networkscraper/network_scraper_test.go:232
            	            				/home/runner/work/opentelemetry-collector-contrib/opentelemetry-collector-contrib/receiver/hostmetricsreceiver/internal/scraper/networkscraper/network_scraper_test.go:201
            	Error:      	Not equal: 
            	            	expected: 12
            	            	actual  : 13
            	Test:       	TestScrape/Standard_with_direction_removed
    --- FAIL: TestScrape/Validate_Start_Time (0.01s)
        network_scraper_test.go:232: 
            	Error Trace:	/home/runner/work/opentelemetry-collector-contrib/opentelemetry-collector-contrib/receiver/hostmetricsreceiver/internal/scraper/networkscraper/network_scraper_test.go:232
            	            				/home/runner/work/opentelemetry-collector-contrib/opentelemetry-collector-contrib/receiver/hostmetricsreceiver/internal/scraper/networkscraper/network_scraper_test.go:201
            	Error:      	Not equal: 
            	            	expected: 12
            	            	actual  : 13
            	Test:       	TestScrape/Validate_Start_Time
    --- FAIL: TestScrape/Include_Filter_that_matches_nothing (0.01s)
        network_scraper_test.go:232: 
            	Error Trace:	/home/runner/work/opentelemetry-collector-contrib/opentelemetry-collector-contrib/receiver/hostmetricsreceiver/internal/scraper/networkscraper/network_scraper_test.go:232
            	            				/home/runner/work/opentelemetry-collector-contrib/opentelemetry-collector-contrib/receiver/hostmetricsreceiver/internal/scraper/networkscraper/network_scraper_test.go:201
            	Error:      	Not equal: 
            	            	expected: 12
            	            	actual  : 13
            	Test:       	TestScrape/Include_Filter_that_matches_nothing
    --- FAIL: TestScrape/Conntrack_error_ignored_if_metric_disabled (0.01s)
        network_scraper_test.go:232: 
            	Error Trace:	/home/runner/work/opentelemetry-collector-contrib/opentelemetry-collector-contrib/receiver/hostmetricsreceiver/internal/scraper/networkscraper/network_scraper_test.go:232
            	            				/home/runner/work/opentelemetry-collector-contrib/opentelemetry-collector-contrib/receiver/hostmetricsreceiver/internal/scraper/networkscraper/network_scraper_test.go:201
            	Error:      	Not equal: 
            	            	expected: 12
            	            	actual  : 13
            	Test:       	TestScrape/Conntrack_error_ignored_if_metric_disabled
FAIL

CI test failure

@crobert-1 crobert-1 added needs triage New item requiring triage receiver/hostmetrics labels Sep 20, 2023
@github-actions
Copy link
Contributor

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@github-actions
Copy link
Contributor

Pinging code owners for receiver/hostmetrics: @dmitryax. See Adding Labels via Comments if you do not have permissions to add labels yourself.

@crobert-1
Copy link
Member Author

crobert-1 commented Nov 1, 2023

Summary

I was able to do some investigation here, I believe the failing check is invalid.

Analysis

Here's the failing check:

func assertNetworkConnectionsMetricValid(t *testing.T, metric pmetric.Metric) {
        ...
	assert.Equal(t, 12, metric.Sum().DataPoints().Len())

This is asserting that gopsutil returned 12 unique network connection states (e.g. LISTEN, ESTABLISHED, etc.) in the test environment's history. gopsutil is getting connections by running lsof on the local system, then parsing line-by-line output, and returning all of the parsed connection information. The expected value 12 in the failing check comes from our internal definitive list of states.

From lsof man page: (Note ESTABLISHED is written twice, but that still leaves 13 examples of valid states)

              State names vary with UNIX dialects, so it's not possible
              to provide a complete list.  Some common TCP state names
              are: CLOSED, IDLE, BOUND, LISTEN, ESTABLISHED, SYN_SENT,
              SYN_RCDV, ESTABLISHED, CLOSE_WAIT, FIN_WAIT1, CLOSING,
              LAST_ACK, FIN_WAIT_2, and TIME_WAIT.

The test failed because there was an unexpected state in the GitHub runner's history of connections. Since we don't know what the state was, it may or may not have been valid. I believe from what the man page says that it's invalid to require exactly 12 states that match our internal list.

Proposed Solution

  1. I think the best option here is to simply remove the check since it's invalid, and also remove the referenced definitive list of states. Valid states returned from lsof vary across OSs, and the man page itself says it's impossible to know all possible states. The only usage at this point is to setup a map of connection states and their count. We can do this programmatically without relying on a preset list of expectations.
  2. The alternative is to log the unexpected state when the test fails so we can see if it's actually an invalid state or valid. If valid, we can then manually add it to the expected list. If invalid we'd need to file an issue against gopsutil or lsof.

@crobert-1 crobert-1 added bug Something isn't working and removed needs triage New item requiring triage labels Nov 1, 2023
Copy link
Contributor

github-actions bot commented Jan 1, 2024

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@github-actions github-actions bot added the Stale label Jan 1, 2024
@crobert-1 crobert-1 added flaky test a test is flaky and removed Stale labels Jan 2, 2024
Copy link
Contributor

github-actions bot commented Mar 4, 2024

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@github-actions github-actions bot added the Stale label Mar 4, 2024
Copy link
Contributor

github-actions bot commented May 3, 2024

This issue has been closed as inactive because it has been stale for 120 days with no activity.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale May 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant