Report more metrics to monitor K8s task runner #14771

YongGang · 2023-08-07T21:39:28Z

Description

Report one new metrics to monitor K8s task runner:

k8s/peon/startup/time: to report the time for peon pod takes to startup

And implement the two existing metrics: taskSlot/used/count and taskSlot/idle/count

Along with the previous #14698 change, we can answer the following questions about K8s task runner status:

how many tasks/peons are running in K8s now? Use taskSlot/used/count.
How long K8s takes to start peon job? Use k8s/peon/startup/time
How long the pod actually run to finish? Use task/run/time
How long task is pending before running in K8s? Use task/pending/time

Release note

Report k8s/peon/startup/time metrics for k8s based ingestion.

Key changed/added classes in this PR

In KubernetesPeonClient report k8s/peon/startup/time metrics when pod startup.

This PR has:

YongGang · 2023-08-08T18:14:06Z

...es-overlord-extensions/src/main/java/org/apache/druid/k8s/overlord/KubernetesTaskRunner.java

@@ -370,13 +370,13 @@ public Optional<ScalingStats> getScalingStats()
  @Override
  public Map<String, Long> getIdleTaskSlotCount()
  {
-    return Collections.emptyMap();
+    return ImmutableMap.of(WORKER_CATEGORY, Long.valueOf(config.getCapacity() - tasks.size()));


This value could be negative, e.g. -6 means 6 tasks are queued in the thread pool haven't been scheduled to run in K8s yet.

The other implementations of getIdleTaskSlotCount do not use negative numbers to indicate being over capacity. I think we should follow the same pattern here.

Suggested change

return ImmutableMap.of(WORKER_CATEGORY, Long.valueOf(config.getCapacity() - tasks.size()));

return ImmutableMap.of(WORKER_CATEGORY, Math.max(0L, Long.valueOf(config.getCapacity() - tasks.size())));

In K8sTaskRunner the task lifecycle is different from other type of runners, for example task is only in pending status if K8s runner submit the job already while in other runners task is set to pending immediately after it handled by the runner.
So in other runners we can check the number of pending tasks to see if the cluster is under provisioned but in K8sTaskRunner we have no visibility on this, thus here I reuse the taskSlot/idle/count metric to indicate that.

Or we can make the task lifecycle in K8sTaskRunner more align with other runners, e.g. if task is queued in thread pool then it's in pending status, if K8s submit the job then it's in running status.

For the K8sTaskRunner, the metric should track the number of pods that are in the waitUntilCondition in launchPeonJobAndWaitForStart. That is what I believe is the equivalent of a pending task with the K8sTaskRunner.

druid/extensions-contrib/kubernetes-overlord-extensions/src/main/java/org/apache/druid/k8s/overlord/common/KubernetesPeonClient.java

Lines 64 to 69 in 8fa7859

.waitUntilCondition(pod -> {

if (pod == null) {

return false;

}

return pod.getStatus() != null && pod.getStatus().getPodIP() != null;

}, howLong, timeUnit);

We could also add a dimension to report if the pod timed out waiting to start.

suneet-s

Request changes because:

The worker categories are inconsistent across the metrics reported from the k8sTaskRunner
I do not think idle tasks should report a negative number as none of the other task runners do something like this. If we want to show the number of tasks that are not running, but should be, I think we need a new metric because a task can be added to the tasks map in Druid, but not yet started because of many reasons and it would be good to have visibility into that.
Please add docs for the new metrics in docs/development/extensions-contrib/k8s-jobs.md

...es-overlord-extensions/src/main/java/org/apache/druid/k8s/overlord/KubernetesTaskRunner.java

suneet-s · 2023-08-14T12:28:15Z

...es-overlord-extensions/src/main/java/org/apache/druid/k8s/overlord/KubernetesTaskRunner.java

@@ -370,13 +370,13 @@ public Optional<ScalingStats> getScalingStats()
  @Override
  public Map<String, Long> getIdleTaskSlotCount()
  {
-    return Collections.emptyMap();
+    return ImmutableMap.of(WORKER_CATEGORY, Long.valueOf(config.getCapacity() - tasks.size()));


The other implementations of getIdleTaskSlotCount do not use negative numbers to indicate being over capacity. I think we should follow the same pattern here.

Suggested change

return ImmutableMap.of(WORKER_CATEGORY, Long.valueOf(config.getCapacity() - tasks.size()));

return ImmutableMap.of(WORKER_CATEGORY, Math.max(0L, Long.valueOf(config.getCapacity() - tasks.size())));

suneet-s · 2023-08-14T12:29:06Z

...es-overlord-extensions/src/main/java/org/apache/druid/k8s/overlord/KubernetesTaskRunner.java

  }

  @Override
  public Map<String, Long> getUsedTaskSlotCount()
  {
-    return Collections.emptyMap();
+    return ImmutableMap.of(WORKER_CATEGORY, Long.valueOf(Math.min(config.getCapacity(), tasks.size())));


There can be a delay between tasks being added to the tasks map and when they are actually running. Can you explain how an operator should think about using this metric?

As long as it's not over capacity, I think the delay is minimum (like within a second, we consider peon job submitted/start as taskSlot being used). So operator can use this metric to decide whether need to add more resource if it's always same as config.getCapacity()

suneet-s · 2023-08-14T12:34:49Z

...lord-extensions/src/main/java/org/apache/druid/k8s/overlord/common/KubernetesPeonClient.java

@@ -69,12 +85,14 @@ public Pod launchPeonJobAndWaitForStart(Job job, long howLong, TimeUnit timeUnit
                         }, howLong, timeUnit);
      long duration = System.currentTimeMillis() - start;
      log.info("Took task %s %d ms for pod to startup", jobName, duration);
+      emitK8sPodMetrics(job, "peon/startup/time", duration);


I think it would be good to preface the metric with k8s so that it is clear that the metric only applies to peons started by the kubernetes task runner.

I also like adding the unit to the end of the metric name to make it easier for operators to understand what unit the metric is reported in.

Suggested change

emitK8sPodMetrics(job, "peon/startup/time", duration);

emitK8sPodMetrics(job, "k8s/peon/startup/timeMillis", duration);

I like the idea of prefix with k8s, will do that. But I saw other metrics are not end with unit such as task/run/time wonder whether we should align with the existing ones.

suneet-s · 2023-08-14T12:40:25Z

...lord-extensions/src/main/java/org/apache/druid/k8s/overlord/common/KubernetesPeonClient.java

@@ -87,6 +105,8 @@ public JobResponse waitForPeonJobCompletion(K8sTaskId taskId, long howLong, Time
                          howLong,
                          unit
                      );
+      long duration = System.currentTimeMillis() - start;
+      emitK8sPodMetrics(job, "peon/running/time", duration);


How is this different than the task/run/time metric?

i think you would probably want to include the startup time as well, maybe it makes sense to emit this in KubernetesPeonLifecycle instead?

Removed, task/run/time metric will include job startup and running time.

suneet-s · 2023-08-14T12:46:09Z

...lord-extensions/src/main/java/org/apache/druid/k8s/overlord/common/KubernetesPeonClient.java

  {
    this.clientApi = clientApi;
    this.namespace = namespace;
    this.debugJobs = debugJobs;
+    this.adapter = adapter;
+    this.emitter = emitter;
  }

  public Pod launchPeonJobAndWaitForStart(Job job, long howLong, TimeUnit timeUnit)


The job object that is being passed in here is calculated by using the adapter to convert a task to the Job in KubernetesTaskRunner#doTask. And then later we are using the adapter to convert it back to a task.

Would it be cleaner to just pass in the Task object here instead?

i am also hoping to deprecate toTask(Job) in this pr: https://github.com/apache/druid/pull/14802/files to allow us to pull tasks instead of storing them as a env variable

...lord-extensions/src/main/java/org/apache/druid/k8s/overlord/KubernetesTaskRunnerFactory.java

.../src/test/java/org/apache/druid/k8s/overlord/taskadapter/DruidPeonClientIntegrationTest.java

YongGang · 2023-08-14T22:31:10Z

Request changes because:

The worker categories are inconsistent across the metrics reported from the k8sTaskRunner

I do not think idle tasks should report a negative number as none of the other task runners do something like this. If we want to show the number of tasks that are not running, but should be, I think we need a new metric because a task can be added to the tasks map in Druid, but not yet started because of many reasons and it would be good to have visibility into that.

Please add docs for the new metrics in docs/development/extensions-contrib/k8s-jobs.md

Thanks for the review, I think I addressed all of them.

suneet-s · 2023-08-15T20:58:14Z

...overlord-extensions/src/main/java/org/apache/druid/k8s/overlord/KubernetesPeonLifecycle.java

@@ -117,7 +117,7 @@ protected KubernetesPeonLifecycle(
   * @return
   * @throws IllegalStateException
   */
-  protected synchronized TaskStatus run(Job job, long launchTimeout, long timeout) throws IllegalStateException
+  protected synchronized TaskStatus run(Job job, Task task, long launchTimeout, long timeout) throws IllegalStateException


This is a bit of a strange function definition, as the PeonLifecycle is already scoped to a task. Passing in both a task and a job is confusing, because they should be to the same task. How about a function definition like this, and has the constructor accept an adapter so that the adapter can be used in this method

Suggested change

protected synchronized TaskStatus run(Job job, Task task, long launchTimeout, long timeout) throws IllegalStateException

protected synchronized TaskStatus run(long launchTimeout, long timeout) throws IllegalStateException

Updated but not the same as you suggested as my previous comment. Basically reverted my change to the method definition.

suneet-s · 2023-08-15T21:03:56Z

...lord-extensions/src/main/java/org/apache/druid/k8s/overlord/common/KubernetesPeonClient.java

  }

-  public Pod launchPeonJobAndWaitForStart(Job job, long howLong, TimeUnit timeUnit)
+  public Pod launchPeonJobAndWaitForStart(Job job, Task task, long howLong, TimeUnit timeUnit)


Suggested change

public Pod launchPeonJobAndWaitForStart(Job job, Task task, long howLong, TimeUnit timeUnit)

public Pod launchPeonJobAndWaitForStart(Task task, long howLong, TimeUnit timeUnit)

Agree this is better but I found change like this will make the integration test DruidPeonClientIntegrationTest hard to write as we push the job creation to the underlying class, there is no easy way to construct a job to suit our testing purpose. So I decide to keep the interface as it is.

suneet-s

Looks nice! I have one suggestion on changing the interface to KubernetesPeonLifecycle and KubernetesPeonClient to try and make it a little easier to follow.

Can you update the PR title and description to reflect the latest state of the patch

YongGang · 2023-08-16T00:01:18Z

Looks nice! I have one suggestion on changing the interface to KubernetesPeonLifecycle and KubernetesPeonClient to try and make it a little easier to follow.

Can you update the PR title and description to reflect the latest state of the patch

Done, also replied why some interface is not updated as you suggest.

YongGang added 3 commits August 7, 2023 14:30

Report pod running metrics to monitor K8s task runner

115ee5c

refine method definition

6e6b0bd

fix checkstyle

cbfa7a8

suneet-s added Kubernetes Area - Metrics/Event Emitting labels Aug 8, 2023

implement task metrics

3b938fe

YongGang commented Aug 8, 2023

View reviewed changes

more comment

b13fa4a

suneet-s requested changes Aug 14, 2023

View reviewed changes

address comments

b2f5c95

github-advanced-security bot found potential problems Aug 14, 2023

View reviewed changes

...lord-extensions/src/main/java/org/apache/druid/k8s/overlord/KubernetesTaskRunnerFactory.java Fixed Show fixed Hide fixed

.../src/test/java/org/apache/druid/k8s/overlord/taskadapter/DruidPeonClientIntegrationTest.java Fixed Show fixed Hide fixed

update doc for the new metrics reported

cfb9176

github-actions bot added the Area - Documentation label Aug 14, 2023

fix checkstyle

773f763

suneet-s reviewed Aug 15, 2023

View reviewed changes

YongGang added 2 commits August 15, 2023 16:28

refine method definition

3b40cc9

minor refine

6c65232

YongGang changed the title ~~Report pod running metrics to monitor K8s task runner~~ Report more metrics to monitor K8s task runner Aug 15, 2023

suneet-s approved these changes Aug 16, 2023

View reviewed changes

suneet-s merged commit 3954685 into apache:master Aug 16, 2023
74 checks passed

YongGang deleted the more-metrics branch August 27, 2023 20:42

LakshSingla added this to the 28.0 milestone Oct 12, 2023

LakshSingla mentioned this pull request Nov 4, 2023

[DRAFT] 28.0.0 release notes #15326

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Report more metrics to monitor K8s task runner #14771

Report more metrics to monitor K8s task runner #14771

YongGang commented Aug 7, 2023 •

edited by suneet-s

Loading

YongGang Aug 8, 2023

suneet-s Aug 14, 2023

YongGang Aug 14, 2023

suneet-s Aug 15, 2023

suneet-s left a comment

suneet-s Aug 14, 2023

suneet-s Aug 14, 2023

YongGang Aug 14, 2023

suneet-s Aug 14, 2023

YongGang Aug 14, 2023

suneet-s Aug 14, 2023

georgew5656 Aug 14, 2023

YongGang Aug 14, 2023

suneet-s Aug 14, 2023

georgew5656 Aug 14, 2023

YongGang commented Aug 14, 2023

suneet-s Aug 15, 2023 •

edited

Loading

YongGang Aug 15, 2023

suneet-s Aug 15, 2023

YongGang Aug 15, 2023

suneet-s left a comment

YongGang commented Aug 16, 2023

	return ImmutableMap.of(WORKER_CATEGORY, Long.valueOf(config.getCapacity() - tasks.size()));
	return ImmutableMap.of(WORKER_CATEGORY, Math.max(0L, Long.valueOf(config.getCapacity() - tasks.size())));

	.waitUntilCondition(pod -> {
	if (pod == null) {
	return false;
	}
	return pod.getStatus() != null && pod.getStatus().getPodIP() != null;
	}, howLong, timeUnit);

	emitK8sPodMetrics(job, "peon/startup/time", duration);
	emitK8sPodMetrics(job, "k8s/peon/startup/timeMillis", duration);

	protected synchronized TaskStatus run(Job job, Task task, long launchTimeout, long timeout) throws IllegalStateException
	protected synchronized TaskStatus run(long launchTimeout, long timeout) throws IllegalStateException

	public Pod launchPeonJobAndWaitForStart(Job job, Task task, long howLong, TimeUnit timeUnit)
	public Pod launchPeonJobAndWaitForStart(Task task, long howLong, TimeUnit timeUnit)

Report more metrics to monitor K8s task runner #14771

Report more metrics to monitor K8s task runner #14771

Conversation

YongGang commented Aug 7, 2023 • edited by suneet-s Loading

Description

Release note

Key changed/added classes in this PR

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

suneet-s left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

YongGang commented Aug 14, 2023

suneet-s Aug 15, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

suneet-s left a comment

Choose a reason for hiding this comment

YongGang commented Aug 16, 2023

YongGang commented Aug 7, 2023 •

edited by suneet-s

Loading

suneet-s Aug 15, 2023 •

edited

Loading