Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

libvirt: Add Terraform variables for memory/CPU, bump master to 4GiB #785

Merged
merged 1 commit into from
Dec 5, 2018

Conversation

cgwalters
Copy link
Member

@cgwalters cgwalters commented Dec 4, 2018

My main dev environment is a Lenovo P50 with 64GB of RAM - I got
it specifically to run some large VMs (and/or many VMs) specifically
with OpenShift in mind.

Increasing RAM on my master to 8GB is a very noticeable speed improvement
and I think reliabilty; before I saw the apiserver be OOMKilled
sometimes, and kswapd0 was constantly doing writeback.

These variables aren't bubbled all the way up to the documented
installer config, but one can now do e.g.:

$ env TF_VAR_libvirt_master_memory=8192 ./bin/openshift-install create cluster ...

Previously:

@openshift-ci-robot openshift-ci-robot added the size/S Denotes a PR that changes 10-29 lines, ignoring generated files. label Dec 4, 2018
@ashcrow
Copy link
Member

ashcrow commented Dec 4, 2018

This is great ❤️

One ask though would be to have a follow on to document it in user docs.

@rphillips
Copy link
Contributor

Thoughts on bumping the default to 4096?

@cgwalters
Copy link
Member Author

cgwalters commented Dec 4, 2018

Thoughts on bumping the default to 4096?

TL;DR: yes

Longer 2¢ - libvirt is primarily a developer target; we should maintain a "functional baseline" for it.

That said, I can imagine people wanting to use it on a "real" server to kick the tires a bit more seriously before e.g. switching to using OpenStack or bare metal.

For developers, 16GB laptops are common I believe. If we target the one master, one worker case which is still the default, I think 4GiB should be fine.

Now, one issue is that right now workers and masters get the same amount of RAM. That seems definitely worth fixing. (edit: nevermind, I am not sure why I thought that, clearly workers use the cluster API and are 2GiB today)

@cgwalters
Copy link
Member Author

Updated to default to 4GiB.

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Dec 4, 2018
@ashcrow
Copy link
Member

ashcrow commented Dec 4, 2018

/lgtm cancel

(I shouldn't approve installer changes directly IMHO)

@openshift-ci-robot openshift-ci-robot removed the lgtm Indicates that a PR is ready to be merged. label Dec 4, 2018
@wking
Copy link
Member

wking commented Dec 4, 2018

(I shouldn't approve installer changes directly IMHO)

/lgtm is for any org member. Obviously you can make your own decisions about when to use it, but I have no problem with anyone dropping /lgtm if they want to. Some more on the expected /lgtm vs. /approve semantics over here.

@@ -32,3 +32,15 @@ variable "tectonic_libvirt_master_ips" {
type = "list"
description = "the list of desired master ips. Must match tectonic_master_count"
}

variable "tectonic_libvirt_memory" {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're moving away from "tectonic" (#644). Can you use master_memory or some such here? And master_vcpu (or whatever to match your memory variable) below?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

master_memory and not libvirt_master_memory?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're not going to launch a libvirt-and-AWS-at-the-same-time cluster, so I don't see a need for platforms in Terraform variable names. But we certainly have plenty of existing names like that, so I'm fine with you going that way if you like.

@cgwalters cgwalters changed the title libvirt: Add (undocumented) Terraform variables for memory/CPU libvirt: Add Terraform variables for memory/CPU, bump master to 4GiB Dec 5, 2018
@abhinavdahiya
Copy link
Contributor

Is there is a specific process that is using up all of that memory. I would expect a little bit more investigation before we bump the RAM again.

firefox/chrome can eat up quite a bit of GBs so not much room left :P

@wking
Copy link
Member

wking commented Dec 5, 2018

Is there is a specific process that is using up all of that memory. I would expect a little bit more investigation before we bump the RAM again.

I've seen folks pointing fingers at Prometheus, but haven't measured myself. @crawford may be able to clarify.

And while I'm in favor of trimming memory back, I like stability too ;). Unfortunately, it's hard to get time allocated for "claw back memory", e.g. see @sjenning promising to get us back under 2GB once we dropped kube-core-operator (which happened in #420). :p

@cgwalters
Copy link
Member Author

firefox/chrome can eat up quite a bit of GBs so not much room left :P

This is an interesting topic actually - I don't have a good link to hand but at least a few years ago Firefox basically defaulted to using at least ⅓ of available RAM or so. More recently came along "tab idling" where tabs you haven't used in a while get basically torn down except for their URL. Part of implementing this is ensuring that the browser serializes TextArea data (like the one I'm typing in now!) so you don't lose data.

This is a bit like pod idling - but without CRIU your app needs to be storing its state on a PV (or be stateless).

But there's a latency tradeoff here - which is why browsers use RAM. (Also, it's not browsers that use RAM mostly, it's websites)

@cgwalters
Copy link
Member Author

Updated 🆕

My main dev environment is a Lenovo P50 with 64GB of RAM - I got
it specifically to run some large VMs (and/or many VMs) specifically
with OpenShift in mind.

First, default masters to 4096 MiB since we are seeing a default
install be overloaded.

And for me, increasing RAM on my master to 8GB is a *very* noticeable speed improvement
and I think reliabilty; before I saw the apiserver be `OOMKilled`
sometimes, and `kswapd0` was constantly doing writeback.

These variables aren't bubbled all the way up to the documented
installer config, but one can now do e.g.:

```
$ env TF_VAR_libvirt_master_memory=8192 TF_VAR_libvirt_master_vcpu=4 ./bin/openshift-install create cluster --dir=osiris
```

Previously:

 - openshift#408
 - openshift#163
@cgwalters
Copy link
Member Author

Updated again to fix descriptions.

@wking
Copy link
Member

wking commented Dec 5, 2018

/lgtm

On the, ah, wildly optimistic side, tunables could be used by folks trying to push memory back down too ;).

@crawford
Copy link
Contributor

crawford commented Dec 5, 2018

/refresh

@crawford
Copy link
Contributor

crawford commented Dec 5, 2018

Mirroring @wking's lgtm. It looks like the bot missed the message.

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Dec 5, 2018
@openshift-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: cgwalters, crawford, wking

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@richm
Copy link

richm commented Dec 7, 2018

@openshift/sig-logging

wking added a commit to wking/openshift-installer that referenced this pull request Dec 10, 2018
This reverts commit a230376
(data/aws: Switch to m4.large, 2018-11-29, openshift#765), now that we've
bumped our t3.medium limits in the CI and dev accounts to cover our
expected loads.  Moving from m4.large to t3.medium also reduces memory
from 8 GiB [1] to 4 GiB [2], but after a recent run of end-to-end
tests, the master consumption was:

  $ free -m
                total        used        free      shared  buff/cache   available
  Mem:           7980        2263         794           8        4922        5259
  Swap:             0           0           0

so 4 GiB should be sufficient.  And it also matches our libvirt setup
since e59513f (libvirt: Add Terraform variables for memory/CPU, bump
master to 4GiB, 2018-12-04, openshift#785).

[1]: https://aws.amazon.com/ec2/instance-types/#General_Purpose
[2]: https://aws.amazon.com/ec2/instance-types/t3/#Product_Details
@mykaul
Copy link
Contributor

mykaul commented Jan 20, 2019

Are those vars documented somewhere?
For example, TF_VAR_libvirt_master_vcpu ?

@wking
Copy link
Member

wking commented Jan 20, 2019

It's just documented in this PR at the moment, but the coming origin rebase should help with both memory and CPU and the planned CPU-reservation adjustments will help with CPU.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. size/S Denotes a PR that changes 10-29 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants