Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[exporter/awsemf] NaN values are not handled properly #26267

Closed
bryan-aguilar opened this issue Aug 29, 2023 · 1 comment · Fixed by #26344
Closed

[exporter/awsemf] NaN values are not handled properly #26267

bryan-aguilar opened this issue Aug 29, 2023 · 1 comment · Fixed by #26344
Assignees
Labels
bug Something isn't working exporter/awsemf awsemf exporter priority:p2 Medium

Comments

@bryan-aguilar
Copy link
Contributor

Component(s)

exporter/awsemf

What happened?

Description

Metrics with NaN values will cause EMF Exporter JSON Marshaling to fail. This will cause the entire EMF Log to be discarded along with any other metrics it contains. This is not a desirable behavior and EMF Exporter should handle NaN values gracefully. Cloudwatch EMF does not support NaN values.

I would suggest as solution that metrics with a NaN value are dropped.

Steps to Reproduce

Scrape a prometheus endpoint that exports a NaN Value

Here is a unit test to confirm that an error is thrown during JSON Marshalling. This test belongs in emf_exporter_test.go

    func TestConsumeMetricsWithOutputDestinationNaN(t *testing.T) {
    	ctx, cancel := context.WithCancel(context.Background())
    	defer cancel()
    	factory := NewFactory()
    	expCfg := factory.CreateDefaultConfig().(*Config)
    	expCfg.Region = "us-west-2"
    	expCfg.MaxRetries = 0
    	expCfg.OutputDestination = "stdout"
    	exp, err := newEmfExporter(expCfg, exportertest.NewNopCreateSettings())
    	assert.Nil(t, err)
    	assert.NotNil(t, exp)
     
    	md := generateTestMetrics(testMetric{
    		metricNames:  []string{"metric_1"},
    		metricValues: [][]float64{{math.NaN()}},
    	})
    	require.NoError(t, exp.pushMetricsData(ctx, md))
    	require.NoError(t, exp.shutdown(ctx))
    }

Expected Result

All valid metrics are exported in EMF log.

Actual Result

Entire EMF log is discareded.

Collector version

v0.83.0

Environment information

Environment

OS: (e.g., "Ubuntu 20.04")
Compiler(if manually compiled): (e.g., "go 14.2")

OpenTelemetry Collector configuration

No response

Log output

No response

Additional context

No response

@bryan-aguilar bryan-aguilar added bug Something isn't working needs triage New item requiring triage priority:p2 Medium exporter/awsemf awsemf exporter and removed needs triage New item requiring triage labels Aug 29, 2023
@bryan-aguilar bryan-aguilar self-assigned this Aug 29, 2023
@bryan-aguilar
Copy link
Contributor Author

I'm going to take a stab at this today.

mx-psi pushed a commit that referenced this issue Sep 19, 2023
**Description:** Metrics with NaN values for float types would cause the
EMF Exporter to error out during JSON Marshaling. This PR introduces a
change to drop metric values that contain NaN.

**Link to tracking Issue:** Fixes #26267 

**Testing:** Added unit tests at several different points with varying
levels of specificity. Unit tests are quite verbose in this example but
I have followed the format of existing tests while doing very little
refactoring.

**Documentation:** Update README
jmsnll pushed a commit to jmsnll/opentelemetry-collector-contrib that referenced this issue Nov 12, 2023
**Description:** Metrics with NaN values for float types would cause the
EMF Exporter to error out during JSON Marshaling. This PR introduces a
change to drop metric values that contain NaN.

**Link to tracking Issue:** Fixes open-telemetry#26267 

**Testing:** Added unit tests at several different points with varying
levels of specificity. Unit tests are quite verbose in this example but
I have followed the format of existing tests while doing very little
refactoring.

**Documentation:** Update README
TylerHelmuth pushed a commit that referenced this issue Nov 16, 2023
**Description:** I have observed some behavior on a personal collector
deployment where the EMF Exporter is still returning errors for `NaN`
json marshalling. This was in a prometheus -> emf exporter metrics
pipeline.

I could not find the specific NaN value in the metrics when
troubleshooting the error. I curled the `/metrics` endpoint and also
tried using the logging exporter to try to get more information. I could
not find where the NaN value was coming from so I took another look into
the unit tests and found some possible code paths in which NaNs could
slip though.

**Link to tracking Issue:** Original issue
#26267

**Testing:** Added more unit tests. The summary unit tests got a slight
refactor for two reasons. So I could get ride of the unnecessary
typecasting and so that we could more easily test out different
combinations of quantile values.

I have also added a few more histogram unit tests to just verify that
all combinations of NaN values are being checked on their own.
graphaelli pushed a commit to graphaelli/opentelemetry-collector-contrib that referenced this issue Nov 17, 2023
…try#28894)

**Description:** I have observed some behavior on a personal collector
deployment where the EMF Exporter is still returning errors for `NaN`
json marshalling. This was in a prometheus -> emf exporter metrics
pipeline.

I could not find the specific NaN value in the metrics when
troubleshooting the error. I curled the `/metrics` endpoint and also
tried using the logging exporter to try to get more information. I could
not find where the NaN value was coming from so I took another look into
the unit tests and found some possible code paths in which NaNs could
slip though.

**Link to tracking Issue:** Original issue
open-telemetry#26267

**Testing:** Added more unit tests. The summary unit tests got a slight
refactor for two reasons. So I could get ride of the unnecessary
typecasting and so that we could more easily test out different
combinations of quantile values.

I have also added a few more histogram unit tests to just verify that
all combinations of NaN values are being checked on their own.
RoryCrispin pushed a commit to ClickHouse/opentelemetry-collector-contrib that referenced this issue Nov 24, 2023
…try#28894)

**Description:** I have observed some behavior on a personal collector
deployment where the EMF Exporter is still returning errors for `NaN`
json marshalling. This was in a prometheus -> emf exporter metrics
pipeline.

I could not find the specific NaN value in the metrics when
troubleshooting the error. I curled the `/metrics` endpoint and also
tried using the logging exporter to try to get more information. I could
not find where the NaN value was coming from so I took another look into
the unit tests and found some possible code paths in which NaNs could
slip though.

**Link to tracking Issue:** Original issue
open-telemetry#26267

**Testing:** Added more unit tests. The summary unit tests got a slight
refactor for two reasons. So I could get ride of the unnecessary
typecasting and so that we could more easily test out different
combinations of quantile values.

I have also added a few more histogram unit tests to just verify that
all combinations of NaN values are being checked on their own.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working exporter/awsemf awsemf exporter priority:p2 Medium
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant