Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Log structured monitoring metrics #5915

Merged

Conversation

andrewkroh
Copy link
Member

@andrewkroh andrewkroh commented Dec 19, 2017

  • Use structured logging for the metrics that are periodically logged.

  • Add beat.info.uptime.ms to the list of gauges so that the total value is always reported rather than a difference.

  • Made a change to ensure that only non-zero counter values are reported at shutdown (this was bug introduced in my last refactoring). Note that zero-value gauges are reported which kind of makes the "Total non-zero metrics" message misleading.

Log samples:

2017-12-19T11:57:21.086-0500	INFO	[monitoring]	log/log.go:79	Starting metrics logging every 3s
2017-12-19T11:57:24.087-0500	INFO	[monitoring]	log/log.go:106	Non-zero metrics in the last 3s	{"monitoring": {"metrics": {"beat":{"info":{"uptime":{"ms":3004}},"memstats":{"gc_next":4194304,"memory_alloc":2298976,"memory_total":12648056}},"libbeat":{"config":{"module":{"running":0}},"output":{"type":"elasticsearch"},"pipeline":{"clients":1,"events":{"active":1,"published":1,"total":1}}},"metricbeat":{"file_integrity":{"file":{"events":1,"success":1}}}}}}
2017-12-19T11:57:25.305-0500	INFO	[monitoring]	log/log.go:114	Total non-zero metrics	{"monitoring": {"metrics": {"beat":{"info":{"uptime":{"ms":4222}},"memstats":{"gc_next":4194304,"memory_alloc":3336376,"memory_total":20548168}},"libbeat":{"config":{"module":{"running":0}},"output":{"type":"elasticsearch"},"pipeline":{"clients":0,"events":{"active":1,"published":1,"total":1}}}}}}
2017-12-19T11:57:25.305-0500	INFO	[monitoring]	log/log.go:115	Uptime: 4.226912514s
2017-12-19T11:57:25.305-0500	INFO	[monitoring]	log/log.go:92	Stopping metrics logging.

To show the key structure more clearly here is a pretty-printed version of the logged JSON metrics.

{
  "monitoring": {
    "metrics": {
      "beat": {
        "info": {
          "uptime": {
            "ms": 4222
          }
        },
        "memstats": {
          "gc_next": 4194304,
          "memory_alloc": 3336376,
          "memory_total": 20548168
        }
      },
      "libbeat": {
        "config": {
          "module": {
            "running": 0
          }
        },
        "output": {
          "type": "elasticsearch"
        },
        "pipeline": {
          "clients": 0,
          "events": {
            "active": 1,
            "published": 1,
            "total": 1
          }
        }
      }
    }
  }
}

- Use structured logging for the metrics that are periodically logged.

- Add beat.info.uptime.ms to the list of gauges so that the total value is always reported rather than a difference.

- Made a change to ensure that only non-zero counter values are reported at shutdown (this was bug introduced in my last refactoring). Note that zero-value gauges are reported which kind of makes the "Total non-zero metrics" message misleading.

Log samples:

```
2017-12-18T13:07:50.311-0500	INFO	[monitoring]	log/log.go:80	Starting metrics logging every 5s
2017-12-18T13:07:55.312-0500	INFO	[monitoring]	log/log.go:107	Non-zero metrics in the last 5s	{"monitoring": {"metrics": {"beat.info.uptime.ms": 5018, "beat.memstats.gc_next": 5089280, "beat.memstats.memory_alloc": 2587160, "beat.memstats.memory_total": 190951808, "libbeat.config.module.running": 4, "libbeat.config.module.starts": 4, "libbeat.config.reloads": 1, "libbeat.output.type": "elasticsearch", "libbeat.pipeline.clients": 8, "libbeat.pipeline.events.active": 41, "libbeat.pipeline.events.filtered": 1, "libbeat.pipeline.events.published": 41, "libbeat.pipeline.events.retry": 82, "libbeat.pipeline.events.total": 42, "metricbeat.docker.info.events": 1, "metricbeat.docker.info.success": 1, "metricbeat.system.cpu.events": 1, "metricbeat.system.cpu.success": 1, "metricbeat.system.filesystem.events": 5, "metricbeat.system.filesystem.success": 5, "metricbeat.system.fsstat.events": 1, "metricbeat.system.fsstat.success": 1, "metricbeat.system.load.events": 1, "metricbeat.system.load.success": 1, "metricbeat.system.memory.events": 1, "metricbeat.system.memory.success": 1, "metricbeat.system.network.events": 20, "metricbeat.system.network.success": 20, "metricbeat.system.process.events": 10, "metricbeat.system.process.success": 10, "metricbeat.system.process_summary.events": 1, "metricbeat.system.process_summary.success": 1, "metricbeat.system.uptime.events": 1, "metricbeat.system.uptime.success": 1}}}
2017-12-18T13:07:58.156-0500	INFO	[monitoring]	log/log.go:115	Total non-zero metrics	{"monitoring": {"metrics": {"beat.info.uptime.ms": 7862, "beat.memstats.gc_next": 5089280, "beat.memstats.memory_alloc": 2621032, "beat.memstats.memory_total": 190985680, "libbeat.config.module.running": 4, "libbeat.config.module.starts": 4, "libbeat.config.reloads": 1, "libbeat.output.type": "elasticsearch", "libbeat.pipeline.clients": 8, "libbeat.pipeline.events.active": 41, "libbeat.pipeline.events.filtered": 1, "libbeat.pipeline.events.published": 41, "libbeat.pipeline.events.retry": 82, "libbeat.pipeline.events.total": 42, "metricbeat.docker.info.events": 1, "metricbeat.docker.info.success": 1, "metricbeat.system.cpu.events": 1, "metricbeat.system.cpu.success": 1, "metricbeat.system.filesystem.events": 5, "metricbeat.system.filesystem.success": 5, "metricbeat.system.fsstat.events": 1, "metricbeat.system.fsstat.success": 1, "metricbeat.system.load.events": 1, "metricbeat.system.load.success": 1, "metricbeat.system.memory.events": 1, "metricbeat.system.memory.success": 1, "metricbeat.system.network.events": 20, "metricbeat.system.network.success": 20, "metricbeat.system.process.events": 10, "metricbeat.system.process.success": 10, "metricbeat.system.process_summary.events": 1, "metricbeat.system.process_summary.success": 1, "metricbeat.system.uptime.events": 1, "metricbeat.system.uptime.success": 1}}}
2017-12-18T13:07:58.156-0500	INFO	[monitoring]	log/log.go:116	Uptime: 7.867012418s
2017-12-18T13:07:58.156-0500	INFO	[monitoring]	log/log.go:93	Stopping metrics logging.
```
Instead of using flattened key names, write the metrics in "nested" format. There is less redundancy and it will be more machine friendly.

Sample logs:

```
2017-12-19T11:57:21.086-0500	INFO	[monitoring]	log/log.go:79	Starting metrics logging every 3s
2017-12-19T11:57:24.087-0500	INFO	[monitoring]	log/log.go:106	Non-zero metrics in the last 3s	{"monitoring": {"metrics": {"beat":{"info":{"uptime":{"ms":3004}},"memstats":{"gc_next":4194304,"memory_alloc":2298976,"memory_total":12648056}},"libbeat":{"config":{"module":{"running":0}},"output":{"type":"elasticsearch"},"pipeline":{"clients":1,"events":{"active":1,"published":1,"total":1}}},"metricbeat":{"file_integrity":{"file":{"events":1,"success":1}}}}}}
2017-12-19T11:57:25.305-0500	INFO	[monitoring]	log/log.go:114	Total non-zero metrics	{"monitoring": {"metrics": {"beat":{"info":{"uptime":{"ms":4222}},"memstats":{"gc_next":4194304,"memory_alloc":3336376,"memory_total":20548168}},"libbeat":{"config":{"module":{"running":0}},"output":{"type":"elasticsearch"},"pipeline":{"clients":0,"events":{"active":1,"published":1,"total":1}}}}}}
2017-12-19T11:57:25.305-0500	INFO	[monitoring]	log/log.go:115	Uptime: 4.226912514s
2017-12-19T11:57:25.305-0500	INFO	[monitoring]	log/log.go:92	Stopping metrics logging.
```
@andrewkroh
Copy link
Member Author

@ruflin I'm going back on what I said and have "unflattened" the key names.

@ruflin ruflin merged commit d03da24 into elastic:master Dec 20, 2017
@ruflin
Copy link
Member

ruflin commented Dec 20, 2017

@andrewkroh I rebased and merged as I assume that is what you had in mind.

One thing I don't like about having multiple commits in 1 PR is that CI only tests the last commit and shows the others as just green. So in case the second commit is reverted later it's not clear if the first commit without the second would work.

@andrewkroh andrewkroh deleted the feature/libbeat/use-structured-logger branch April 19, 2018 23:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants