User job write too much logs will cause disk pressure #4694

Binyang2014 · 2020-07-10T03:11:16Z

PAI will keep user job log under /var/log/pai.
If user job write too much logs, it will cause machine disk pressure.

We need to:

Make the log path configurable, then we can store user log into a large disk
Investigate how to kill such offence job

fanyangCS · 2020-07-10T03:13:03Z

relate to #3765 and #3340

fanyangCS · 2020-07-10T03:15:45Z

@Binyang2014, please firstly make sure the OpenPAI service pods are of higher QoS class than job pods. In some case the service pods get evicted.

Binyang2014 · 2020-07-10T04:14:47Z

@Binyang2014, please firstly make sure the OpenPAI service pods are of higher QoS class than job pods. In some case the service pods get evicted.

We may need to mark these pods as critical to achieve this: https://kubernetes.io/docs/tasks/administer-cluster/guaranteed-scheduling-critical-addon-pods/

Binyang2014 · 2020-07-12T10:07:30Z

Checked the QoS class. Currently, job-exporter, node-exporter qos are Burstable, log-manager qos class is BestEffort and pod user job qos class is Guaranteed.
For k8s node eviction, BestEffort will be first evicted. Guaranteed and BestEffort pods whose usage is beneath requests are evicted last. (https://kubernetes.io/docs/tasks/administer-cluster/out-of-resource/?spm=a2c65.11461447.0.0.6a497eafo7oGQp#evicting-end-user-pods)

For Guaranteed and BestEffort pods which resource usage is not exceed their requests, will ordered by pod priority.

For this case, the resource is disk, and all pod don't claim the requests for the disk. So the eviction order will rank by pod priority, then resource usage.
We can get the eviction order from the log:

eviction manager: must evict pod(s) to reclaim ephemeral-storage
eviction_manager.go:362] eviction manager: pods ranked for eviction: user-job-pod, job-exporter-zgf4j_default(e9e0a4a5-660a-493b-ac4b-a95f8977867a),
nginx-proxy-prodk80bg000012_kube-system(6d20234d7a8eda76fb23d52d6f743b77),
log-manager-ds-9tflc_default(67956c0d-ba89-4aa9-a14e-ed2fbbcd915e),
k8s-host-device-plugin-daemonset-kqh67_kube-system(8853e257-8f09-46ca-b372-2632cb94eea5),
blobfuse-flexvol-installer-vrqdg_kube-system(dd5a903a-9a38-4f27-b3d8-00bed955c9e9),
node-exporter-2ks4s_default(326bb034-a011-4960-baef-6eb6fa7c9f24),
kube-proxy-l7gzb_kube-system(82351f89-30b4-4a20-b679-0a15f213b999),
nvidia-device-plugin-daemonset-j2f8z_kube-system(9d98eaed-cc87-43c4-a514-82c147833843)

Since user job usually consume more disk, it will be evicted first. But we use hostPath for the log folder, evict user job will not solve the problem. Then, kubelet continue to evict pai service pod.

To leverage k8s eviction policy to avoid disk pressure, we'd better not store job logs in each host.
It's better to use default k8s log mechanism, then use fluend store log into centralized storage-server. (https://kubernetes.io/docs/concepts/cluster-administration/logging/#cluster-level-logging-architectures)

Binyang2014 · 2020-07-12T10:10:27Z

And our log-manger is mis-configured. It not rotate the log according to size, but according to time. After reconfigure the log-manager and fix some bugs, this issue can be mitigated.

fanyangCS · 2020-07-12T10:59:15Z

@Binyang2014
Can we update the deployment script to avoid future misconfiguration? (At least we should update the document)

Moreover, it seems we need the following:

Set the QoS class of job pod to the "lowest" (BestEffort); Set the QoS class of log-manager to Burstable (same as other OpenPAI services)
To avoid mis-eviction of the wrong job pods, we still need a watchdog to kill the offending job pod
Leverage k8s log mechanism

Anything more?

Binyang2014 · 2020-07-13T03:22:15Z

Since the QoS class is assigned by k8s according to pod resource request/limit. We can't change the job pod QoS class. (Change the resource request/limit value may affect scheduler). We can keep job pod QoS class as Guaranteed, and use pod priority to control the eviction rank.

We can do following:

Set the QoS class of log-manager to Burstable, and fix some config error of logrotate.
Give pai service pod higher priority than job pod, make sure them will not be evicted before job pod.
A watch dog to kill the offending job pod / Leverage k8s log mechanism. If we leverage k8s log mechanism, k8s will help us kill the offending job.

Binyang2014 · 2020-12-07T07:43:04Z

Closed, this issue already fixed

Binyang2014 self-assigned this Jul 10, 2020

Binyang2014 linked a pull request Jul 13, 2020 that will close this issue

[log-manager] reconfigure log-manager #4702

Merged

This was referenced Jul 13, 2020

[pure-k8s] Disk usage monitor module #3340

Closed

[pure k8s] K8s's pod eviction may not trigger when job container uses too much disk space #3765

Closed

scarlett2018 mentioned this issue Jul 13, 2020

2020 July ~ Aug Release #4642

Closed

39 tasks

Binyang2014 closed this as completed in #4702 Jul 14, 2020

Binyang2014 reopened this Jul 14, 2020

scarlett2018 added the pai-dev label Jul 22, 2020

scarlett2018 added this to the Aug 2020 Release milestone Jul 22, 2020

fanyangCS mentioned this issue Jul 30, 2020

Cannot generate proper exit diagnostics when too much storage is used #4769

Closed

Binyang2014 mentioned this issue Aug 6, 2020

make pai daemon with high priority #4775

Merged

yiyione mentioned this issue Aug 26, 2020

2020 July ~ Aug Test Plan #4838

Closed

30 tasks

Binyang2014 mentioned this issue Oct 20, 2020

PAI logging pipeline enhance #4992

Closed

8 tasks

Binyang2014 closed this as completed Dec 7, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

User job write too much logs will cause disk pressure #4694

User job write too much logs will cause disk pressure #4694

Binyang2014 commented Jul 10, 2020

fanyangCS commented Jul 10, 2020

fanyangCS commented Jul 10, 2020

Binyang2014 commented Jul 10, 2020

Binyang2014 commented Jul 12, 2020

Binyang2014 commented Jul 12, 2020

fanyangCS commented Jul 12, 2020

Binyang2014 commented Jul 13, 2020

Binyang2014 commented Dec 7, 2020

User job write too much logs will cause disk pressure #4694

User job write too much logs will cause disk pressure #4694

Comments

Binyang2014 commented Jul 10, 2020

fanyangCS commented Jul 10, 2020

fanyangCS commented Jul 10, 2020

Binyang2014 commented Jul 10, 2020

Binyang2014 commented Jul 12, 2020

Binyang2014 commented Jul 12, 2020

fanyangCS commented Jul 12, 2020

Binyang2014 commented Jul 13, 2020

Binyang2014 commented Dec 7, 2020