Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve "Operator crashes on startup with OOMKilled" docs section #5836

Merged
merged 5 commits into from
Jul 27, 2022

Conversation

ppf2
Copy link
Member

@ppf2 ppf2 commented Jun 28, 2022

Add an example of an OOMKilled event and share best practice on setting memory limits: recommend setting spec.containers[].resources.limits.memory and spec.containers[].resources.requests.memory to the same value.

cc: @kunisen

Recommend setting `spec.containers[].resources.limits.memory` and `spec.containers[].resources.requests.memory` to the same value as a best practice.

@kunisen
@ppf2 ppf2 changed the title Update common-problems.asciidoc Add note to set spec.containers[].resources.limits.memory and spec.containers[].resources.requests.memory to the same value as a best practice Jun 28, 2022
@botelastic botelastic bot added the triage label Jun 28, 2022
Adding example containerStatuses section for an OOMKilled event.
@ppf2 ppf2 changed the title Add note to set spec.containers[].resources.limits.memory and spec.containers[].resources.requests.memory to the same value as a best practice Add example of an OOMKilled event and best practice on setting memory limits Jun 28, 2022
@thbkrkr thbkrkr added the >docs Documentation label Jun 28, 2022
@botelastic botelastic bot removed the triage label Jun 28, 2022
@barkbay
Copy link
Contributor

barkbay commented Jun 29, 2022

May I ask your opinion to see if we should also explicitly set that for the operator too, please? @pebrc

For the JVM it is indeed a best practice because:

  • The JVM sets the Heap size according to the limit, not the request
  • Elasticsearch is started with -XX:+AlwaysPreTouch

Therefore, memory pages of an Elasticsearch Pod are immediately mapped, up to the limit. Said differently, if the request is lower than the limit then an Elasticsearch Pod may immediately be a candidate for eviction, a soon as it is started (update: AlwaysPreTouch only applies to the heap, not to the native memory. It still means that a significant amount of memory is already mapped at startup)

The situation is a bit different for the operator since it is not running in a virtual machine, with a fixed amount of memory mapped. There may be memory usage spikes, but memory may eventually be returned to the operating system. Hence, having different values might make sense. That being said, we can indeed remind the user that to avoid any disruption it is recommend to:

  • set the request and the limit to a same value (or just set the limit without any request)
  • monitor the RSS memory to adjust the allocated resources

ppf2 and others added 2 commits July 4, 2022 18:05
Co-authored-by: Michael Morello <michael.morello@elastic.co>
Co-authored-by: Michael Morello <michael.morello@elastic.co>
@ppf2
Copy link
Member Author

ppf2 commented Jul 5, 2022

thx for the suggestions!

Copy link
Contributor

@barkbay barkbay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@thbkrkr thbkrkr added the v2.4.0 label Jul 7, 2022
Copy link
Contributor

@alaudazzi alaudazzi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a minor editing suggestion. Otherwise LGTM.

Co-authored-by: Arianna Laudazzi <46651782+alaudazzi@users.noreply.github.com>
@thbkrkr thbkrkr changed the title Add example of an OOMKilled event and best practice on setting memory limits Improve "Operator crashes on startup with OOMKilled" docs section Jul 25, 2022
@thbkrkr thbkrkr merged commit ea3b503 into main Jul 27, 2022
@thbkrkr thbkrkr deleted the ppf2-limits branch August 9, 2022 08:38
fantapsody pushed a commit to fantapsody/cloud-on-k8s that referenced this pull request Feb 7, 2023
Add an example of an OOMKilled event that crashes the operator and share best practice on setting memory requests and limits to the same value.

Co-authored-by: Michael Morello <michael.morello@elastic.co>
Co-authored-by: Thibault Richard <thbkrkr@users.noreply.github.com>
Co-authored-by: Arianna Laudazzi <46651782+alaudazzi@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>docs Documentation v2.4.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants