Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Metricbeat] Expand docker/blkio data #16638

Merged
merged 5 commits into from
Mar 6, 2020

Conversation

fearful-symmetry
Copy link
Contributor

@fearful-symmetry fearful-symmetry commented Feb 26, 2020

What does this PR do?

See #13798. This passes along additional cgroup metrics already gathered but not used by metricbeat. Those new fields are wait_time, service_time and queued.

I'm still not 100% sure about this. We treat the new time fields the same way we do everything else--we sum them and treat them as a total. I'm not sure if this is a way to go, if we should be using average time instead, or if there's a different field type we should be using of long. Also, I'm having trouble finding docker hosts where any of these new fields are even populated in the underlying docker stats, which isn't helping things.

How to test this PR locally

Pull down on a macos/linux docker host, run TestData

@fearful-symmetry fearful-symmetry added enhancement needs_backport PR is waiting to be backported to other branches. Team:Integrations Label for the Integrations team labels Feb 26, 2020
@fearful-symmetry fearful-symmetry requested review from unixsurfer and a team February 26, 2020 20:30
@fearful-symmetry fearful-symmetry self-assigned this Feb 26, 2020
@unixsurfer
Copy link

@fearful-symmetry what do you mean by we sum them and treat them as a total ?

According to the documentation of blkio-controller Linux kernel returns the total time for all IO operations for blkio.io_wait_time and blkio.io_service_time metrics, so I don't think we need to do any aggregation. Am I missing something here?

@fearful-symmetry
Copy link
Contributor Author

fearful-symmetry commented Feb 27, 2020

@unixsurfer docker reports these stats per device/driver, hence why we also have major and minor numbers:

{"major":259,"minor":2,"op":"Sync","value":4813430784},
{"major":259,"minor":1,"op":"Sync","value":4806131712}

Metricbeat sums these down to a total per op.

@jsoriano
Copy link
Member

jsoriano commented Feb 27, 2020

Well, if these times are quantities of time spent on something, I think it is ok to sum them, the same as we sum the bytes read or written.

@jsoriano
Copy link
Member

I'm having trouble finding docker hosts where any of these new fields are even populated in the underlying docker stats, which isn't helping things.

@fearful-symmetry they are not being populated in my system, on what distro do you see these metrics? maybe they require cgroups v2?

{"major":259,"minor":2,"op":"Sync","value":4813430784},
{"major":259,"minor":1,"op":"Sync","value":4806131712}

If these metrics are only exposed on the Sync op we will need to modify getNewStats to take this into account. Though for this metric probably the Total is enough.

Copy link
Member

@jsoriano jsoriano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Codewise it looks good to me, some possibly pending things:

  • Confirm that these metrics are reported at least with "op" Total, so they are counted in getNewStats.
  • Document in the description of the fields what are the requirements for these metrics to work. cgroups v2?
  • Changelog entry for the new fields.

@fearful-symmetry
Copy link
Contributor Author

@jsoriano

hey are not being populated in my system, on what distro do you see these metrics? maybe they require cgroups v2?

Those metrics are from a different field, I'm just going on documentation and code, which suggests they're all the same. It appears that the time fields are broken down the same as the other fields.

Document in the description of the fields what are the requirements for these metrics to work. cgroups v2?

Yah, I have...no idea.

@unixsurfer
Copy link

Well, if these times are quantities of time spent on something, I think it is ok to sum them, the same as we sum the bytes read or written.

To sum them per device? IMHO, we can simply return the time per operation and don't return the sum for all operations.
If we sum them then a huge spike in Write operation will be hidden and we wouldn't be able to see it.

@fearful-symmetry fearful-symmetry requested a review from a team March 2, 2020 15:02
Copy link
Contributor

@exekias exekias left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking good, any chance you can update existing tests to include the changes? It will also need a changelog

@zube zube bot closed this Mar 5, 2020
@zube zube bot removed the [zube]: In Review label Mar 5, 2020
@zube zube bot closed this Mar 5, 2020
@zube zube bot reopened this Mar 5, 2020
@zube zube bot closed this Mar 5, 2020
@zube zube bot reopened this Mar 5, 2020
@zube zube bot closed this Mar 5, 2020
@zube zube bot reopened this Mar 5, 2020
@fearful-symmetry
Copy link
Contributor Author

jenkins, test this

@fearful-symmetry fearful-symmetry merged commit ee2d287 into elastic:master Mar 6, 2020
fearful-symmetry added a commit to fearful-symmetry/beats that referenced this pull request Mar 6, 2020
* initial commit of blkio enhancements

* add test coverage

* add changelog

(cherry picked from commit ee2d287)
@fearful-symmetry fearful-symmetry added v7.7.0 and removed needs_backport PR is waiting to be backported to other branches. labels Mar 6, 2020
fearful-symmetry added a commit that referenced this pull request Mar 9, 2020
)

* [Metricbeat] Expand docker/blkio data (#16638)

* initial commit of blkio enhancements

* add test coverage

* add changelog

(cherry picked from commit ee2d287)

* fix changelog
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Team:Integrations Label for the Integrations team v7.7.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants