Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove nil-zero metrics and linux-exclusive metrics from Metricbeat #21457

Merged

Conversation

fearful-symmetry
Copy link
Contributor

@fearful-symmetry fearful-symmetry commented Oct 2, 2020

What does this PR do?

This PR has two main parts:

  • If a metric is currently reporting a "null zero" (that is, the metric doesn't exist for an OS, but we report a 0 instead of nothing) for a given metric on a given OS, add some conditional logic so we don't report it.
  • If we're reporting a group of metrics that is highly OS-specific, add that metric group to the linux module, and don't report it if we're running under fleet. This is because we want to move some of these metrics, but don't want to break metricbeat. As Fleet is beta, we want to introduce some needed changes while we can.

Also note, I'm actively working on this PR, so it might be a tad rough right now, but I want eyes on it.

Where do we go from here?

  • A large part of this PR is essentially a holdover until we can make breaking changes to reported fields in metricbeat. Once we hit 8.0, we can remove the IsAgent logic we have to prevent breaking field changes in metricbeat.
  • Once this is merged, the additional linux metricsets will be added to the Fleet integration.

Open Questions

  • We don't want to pull metrics out from under user's feet. To accomplish this, any linux-exclusive metrics we're reporting in system now have their own module in linux. After this PR, we'll add these new linux modules to the Linux integration within fleet. From the perspective of metricbeat, we're now reporting identical data in two places: in the default system module, and in new metricsets within the linux integration. Is this the best route? Is there a better way to handle metrics we're trying to relocate?

Why is it important?

This is the result of a lot of work between @mukeshelastic and I, with two goals:

  • Create a leaner system Integration for fleet that contains only cross-compatible datasets
  • Clean up the metricsets so we don't report "null zeros" on operating systems where a given metric doesn't exist.

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in CHANGELOG.next.asciidoc or CHANGELOG-developer.next.asciidoc.

Author's Checklist

  • Find a way to run this under Agent
  • Test with the rest of the elastic stack
  • Test under Windows

How to test this PR locally

  • Pull PR and build via mage package to generate a system-specific build object. You can use the PLATFORMS environment variable to constrain to a given OS/arch pair like linux/amd64.
  • Running as metricbeat (no Fleet stack), test a few things depending on your OS:
    • Linux:
      • Make sure the cpu, memory, diskio, process, filesystem, fsstat fields haven't changed relative to data.json
      • Test the new memory and iostat linux metricsets for all the expected fields.
    • Windows:
      • Using the spreadsheet linked above, check the results of the cpu, memory, diskio, process, filesystem, andfsstat metricsets to make sure all the fields reported as Incorrect 0 value? on the spreadsheet are no longer reported.
  • If you can somehow connect your custom build to Agent/Fleet, check to make sure the IsAgent constraints are working and we're not reporting additional data.

@fearful-symmetry fearful-symmetry self-assigned this Oct 2, 2020
@botelastic botelastic bot added the needs_team Indicates that the issue/PR needs a Team:* label label Oct 2, 2020
@fearful-symmetry fearful-symmetry added the Team:Integrations Label for the Integrations team label Oct 2, 2020
@elasticmachine
Copy link
Collaborator

Pinging @elastic/integrations (Team:Integrations)

@botelastic botelastic bot removed the needs_team Indicates that the issue/PR needs a Team:* label label Oct 2, 2020
@elasticmachine
Copy link
Collaborator

elasticmachine commented Oct 2, 2020

💚 Build Succeeded

Pipeline View Test View Changes Artifacts preview

Expand to view the summary

Build stats

  • Build Cause: [Pull request #21457 updated]

  • Start Time: 2020-10-06T16:42:47.631+0000

  • Duration: 94 min 57 sec

Test stats 🧪

Test Results
Failed 0
Passed 16294
Skipped 1347
Total 17641

@fearful-symmetry
Copy link
Contributor Author

Alright, looks like I need to fix the python tests...

@fearful-symmetry fearful-symmetry requested review from narph and removed request for kvch October 2, 2020 16:28
Copy link
Contributor

@faec faec left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have the broader context to evaluate the migration approach so I hope other reviewers chime in, but taking the business logic as a given, the content of the change looks good to me

@fearful-symmetry fearful-symmetry requested a review from a team October 5, 2020 14:35
Copy link
Contributor

@kaiyan-sheng kaiyan-sheng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding the part to make sure host.cpu.pct exists.

@ph
Copy link
Contributor

ph commented Oct 6, 2020

@fearful-symmetry I've looked a core logic seems OK to me, but I am not the most familiar with the metric modules. @exekias or @andrewkroh might be good to ping.

@fearful-symmetry
Copy link
Contributor Author

New diskio document with Agent:

    "system": {
      "diskio": {
        "name": "sda2",
        "read": {
          "time": 27139320,
          "bytes": 109501018112,
          "count": 2889700
        },
        "write": {
          "count": 74365389,
          "time": 272761008,
          "bytes": 1864660139008
        },
        "io": {
          "time": 48258392
        }
      }

New memory document with Agent:

  "system": {
      "memory": {
        "swap": {
          "total": 7897870336,
          "used": {
            "pct": 0.0085,
            "bytes": 67371008
          },
          "free": 7830499328,
          "in": {
            "pages": 11339
          },
          "out": {
            "pages": 43449
          },
          "readahead": {
            "pages": 6855,
            "cached": 4852
          }
        },
        "total": 15620788224,
        "used": {
          "pct": 0.9865,
          "bytes": 15409446912
        },
        "free": 211341312,
        "actual": {
          "free": 9764274176,
          "used": {
            "pct": 0.3749,
            "bytes": 5856514048
          }
        }
      }
    },

@fearful-symmetry fearful-symmetry merged commit aed4831 into elastic:master Oct 6, 2020
fearful-symmetry added a commit to fearful-symmetry/beats that referenced this pull request Oct 6, 2020
…lastic#21457)

* refactor metricbeat to remove nil-zero metrics and linux-exclusive metrics

* update xpack docs

* fix non-linux diskstat builds

* fix linux test builds

* fix python tests

* move windows files for disk performance

* properly fix test_drop_fields

* try to fix different system test

* mage fmt

* fix windows filesystem tests

* fix platform test

* add changelog

(cherry picked from commit aed4831)
fearful-symmetry added a commit that referenced this pull request Oct 6, 2020
…e metrics from Metricbeat (#21597)

* Remove nil-zero metrics and linux-exclusive metrics from Metricbeat (#21457)

* refactor metricbeat to remove nil-zero metrics and linux-exclusive metrics

* update xpack docs

* fix non-linux diskstat builds

* fix linux test builds

* fix python tests

* move windows files for disk performance

* properly fix test_drop_fields

* try to fix different system test

* mage fmt

* fix windows filesystem tests

* fix platform test

* add changelog

(cherry picked from commit aed4831)

* fix fields
v1v added a commit to v1v/beats that referenced this pull request Oct 8, 2020
…ci-build-label-support

* upstream/master: (60 commits)
  Skip publisher flaky tests (elastic#21657)
  backport: add 7.10 branch (elastic#21635)
  [CI: Packaging] fix: push ubi8 images too (elastic#21621)
  Docker build resiliance with a retry (elastic#21587)
  Fix flaky FSWatch/FSScanner tests (elastic#21625)
  chore: add versions 7.1x (elastic#21670)
  [Elastic Agent] Reload fleet.kibana.hosts from policy change (elastic#21599)
  Fix cyberark/corepas pipeline (elastic#21643)
  Add openstack ssl provider in add_cloud_metadata (elastic#21590)
  Add fips_enabled into all aws filesets (elastic#21626)
  [Filebeat S3] Change log.file.path to be nested object (elastic#21624)
  [CI] Change notification channel (elastic#21559)
  Add `add_observer_metadata` `geo.name` to Quickstart (elastic#21501)
  [E2E Tests] fix: set versions ony for PRs (elastic#21608)
  Fix badger build in 386 (elastic#21613)
  docs: Update timestamp.asciidoc (elastic#20395)
  Remove nil-zero metrics and linux-exclusive metrics from Metricbeat (elastic#21457)
  [Metricbeat] Use timestamp from CloudWatch for events (elastic#21498)
  [Filebeat][S3 Input] Add support for FIPS endpoints (elastic#21585)
  [Ingest Manager] Use new form of fleet API paths (elastic#21478)
  ...
v1v added a commit to v1v/beats that referenced this pull request Oct 8, 2020
…012-2.0

* upstream/master: (110 commits)
  Skip publisher flaky tests (elastic#21657)
  backport: add 7.10 branch (elastic#21635)
  [CI: Packaging] fix: push ubi8 images too (elastic#21621)
  Docker build resiliance with a retry (elastic#21587)
  Fix flaky FSWatch/FSScanner tests (elastic#21625)
  chore: add versions 7.1x (elastic#21670)
  [Elastic Agent] Reload fleet.kibana.hosts from policy change (elastic#21599)
  Fix cyberark/corepas pipeline (elastic#21643)
  Add openstack ssl provider in add_cloud_metadata (elastic#21590)
  Add fips_enabled into all aws filesets (elastic#21626)
  [Filebeat S3] Change log.file.path to be nested object (elastic#21624)
  [CI] Change notification channel (elastic#21559)
  Add `add_observer_metadata` `geo.name` to Quickstart (elastic#21501)
  [E2E Tests] fix: set versions ony for PRs (elastic#21608)
  Fix badger build in 386 (elastic#21613)
  docs: Update timestamp.asciidoc (elastic#20395)
  Remove nil-zero metrics and linux-exclusive metrics from Metricbeat (elastic#21457)
  [Metricbeat] Use timestamp from CloudWatch for events (elastic#21498)
  [Filebeat][S3 Input] Add support for FIPS endpoints (elastic#21585)
  [Ingest Manager] Use new form of fleet API paths (elastic#21478)
  ...
@andresrc andresrc added test-plan Add this PR to be manual test plan test-plan-added This PR has been added to the test plan labels Oct 13, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Team:Integrations Label for the Integrations team test-plan Add this PR to be manual test plan test-plan-added This PR has been added to the test plan v7.10.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants