Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Merged by Bors] - Implement resource requests and limits for OPA pods #347

Closed
wants to merge 10 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 9 additions & 3 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,16 +4,22 @@ All notable changes to this project will be documented in this file.

## [Unreleased]

### Added

- CPU and memory limits are now configurable ([#347]).

[#347]: https://github.com/stackabletech/opa-operator/pull/347

## [0.10.0] - 2022-09-06

### Changed

- Include chart name when installing with a custom release name ([#313], [#314]).
- `operator-rs` `0.15.0` -> `0.22.0` ([#315]).

[#313]: https://github.com/stackabletech/trino-operator/pull/313
[#314]: https://github.com/stackabletech/trino-operator/pull/314
[#315]: https://github.com/stackabletech/trino-operator/pull/315
[#313]: https://github.com/stackabletech/opa-operator/pull/313
[#314]: https://github.com/stackabletech/opa-operator/pull/314
[#315]: https://github.com/stackabletech/opa-operator/pull/315
sbernauer marked this conversation as resolved.
Show resolved Hide resolved

## [0.9.0] - 2022-06-30

Expand Down
60 changes: 60 additions & 0 deletions deploy/crd/opacluster.crd.yaml

Large diffs are not rendered by default.

60 changes: 60 additions & 0 deletions deploy/helm/opa-operator/crds/crds.yaml

Large diffs are not rendered by default.

60 changes: 60 additions & 0 deletions deploy/manifests/crds.yaml

Large diffs are not rendered by default.

63 changes: 63 additions & 0 deletions docs/modules/ROOT/pages/usage.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -78,3 +78,66 @@ servers:
default:
config: {}
----

=== Storage for data volumes

The OPA Operator currently does not support using https://kubernetes.io/docs/concepts/storage/persistent-volumes[PersistentVolumeClaims] for internal storage.

=== Memory requests

You can request a certain amount of memory for each individual role group as shown below:

[source,yaml]
----
servers:
roleGroups:
default:
config:
resources:
memory:
limit: '2Gi'
----

In this example, each OPA container in the `default` role group will have a maximum of 2 gigabytes of memory. To be more precise, these memory limits apply to the container running OPA but not to any sidecar containers (e.g. the BundleBuilder) that are part of the pod.

A general rule of thumb on how to size the memory for OPA is described https://www.openpolicyagent.org/docs/latest/policy-performance/#resource-utilization[in the OPA documentation].

For more details regarding Kubernetes memory requests and limits see: https://kubernetes.io/docs/tasks/configure-pod-container/assign-memory-resource/[Assign Memory Resources to Containers and Pods].

=== CPU requests

Similarly to memory resources, you can also configure CPU limits, as shown below:

[source,yaml]
----
servers:
roleGroups:
default:
config:
resources:
cpu:
max: '500m'
min: '250m'
----

=== Defaults

If nothing is specified, the operator will automatically set the following default values for resources:

[source,yaml]
----
servers:
roleGroups:
default:
config:
resources:
cpu:
min: '200m'
max: "2"
memory:
limit: '2Gi'
----

WARNING: The default values are _most likely_ not sufficient to run a proper cluster in production. Please adapt according to your requirements.

For more details regarding Kubernetes CPU limits see: https://kubernetes.io/docs/tasks/configure-pod-container/assign-cpu-resource/[Assign CPU Resources to Containers and Pods].
76 changes: 70 additions & 6 deletions rust/crd/src/lib.rs
Original file line number Diff line number Diff line change
@@ -1,8 +1,14 @@
use serde::{Deserialize, Serialize};
use stackable_operator::kube::CustomResource;
use stackable_operator::product_config_utils::{ConfigError, Configuration};
use stackable_operator::role_utils::Role;
use stackable_operator::schemars::{self, JsonSchema};
use stackable_operator::{
commons::resources::{CpuLimits, MemoryLimits, NoRuntimeLimits, Resources},
config::merge::Merge,
k8s_openapi::apimachinery::pkg::api::resource::Quantity,
kube::CustomResource,
product_config_utils::{ConfigError, Configuration},
role_utils::Role,
role_utils::RoleGroupRef,
schemars::{self, JsonSchema},
};
use std::collections::BTreeMap;
use strum::{Display, EnumIter, EnumString};

Expand Down Expand Up @@ -31,9 +37,31 @@ pub struct OpaSpec {
pub version: Option<String>,
}

#[derive(Clone, Debug, Default, Deserialize, Eq, JsonSchema, PartialEq, Serialize)]
#[derive(Clone, Debug, Default, Deserialize, Eq, Merge, JsonSchema, PartialEq, Serialize)]
#[serde(rename_all = "camelCase")]
pub struct OpaConfig {}
pub struct OpaStorageConfig {}

#[derive(Clone, Debug, Default, Deserialize, JsonSchema, PartialEq, Serialize)]
#[serde(rename_all = "camelCase")]
pub struct OpaConfig {
pub resources: Option<Resources<OpaStorageConfig, NoRuntimeLimits>>,
}

impl OpaConfig {
fn default_resources() -> Resources<OpaStorageConfig, NoRuntimeLimits> {
Resources {
cpu: CpuLimits {
min: Some(Quantity("200m".to_owned())),
max: Some(Quantity("2".to_owned())),
},
memory: MemoryLimits {
limit: Some(Quantity("2Gi".to_owned())),
runtime_limits: NoRuntimeLimits {},
},
storage: OpaStorageConfig {},
}
}
}

impl Configuration for OpaConfig {
type Configurable = OpaCluster;
Expand Down Expand Up @@ -97,4 +125,40 @@ impl OpaCluster {
self.metadata.namespace.as_ref()?
))
}

/// Retrieve and merge resource configs for role and role groups
pub fn resolve_resource_config_for_role_and_rolegroup(
&self,
role: &OpaRole,
rolegroup_ref: &RoleGroupRef<OpaCluster>,
) -> Option<Resources<OpaStorageConfig, NoRuntimeLimits>> {
// Initialize the result with all default values as baseline
let conf_defaults = OpaConfig::default_resources();

let role = match role {
OpaRole::Server => &self.spec.servers,
};

// Retrieve role resource config
let mut conf_role: Resources<OpaStorageConfig, NoRuntimeLimits> =
role.config.config.resources.clone().unwrap_or_default();

// Retrieve rolegroup specific resource config
let mut conf_rolegroup: Resources<OpaStorageConfig, NoRuntimeLimits> = role
.role_groups
.get(&rolegroup_ref.role_group)
.and_then(|rg| rg.config.config.resources.clone())
.unwrap_or_default();

// Merge more specific configs into default config
// Hierarchy is:
// 1. RoleGroup
// 2. Role
// 3. Default
conf_role.merge(&conf_defaults);
conf_rolegroup.merge(&conf_role);

tracing::debug!("Merged resource config: {:?}", conf_rolegroup);
Some(conf_rolegroup)
}
}
14 changes: 12 additions & 2 deletions rust/operator-binary/src/controller.rs
Original file line number Diff line number Diff line change
Expand Up @@ -3,8 +3,9 @@
use crate::built_info::PKG_VERSION;
use crate::discovery::{self, build_discovery_configmaps};
use snafu::{OptionExt, ResultExt, Snafu};
use stackable_opa_crd::{OpaCluster, OpaRole, APP_NAME};
use stackable_opa_crd::{OpaCluster, OpaRole, OpaStorageConfig, APP_NAME};
use stackable_operator::builder::SecurityContextBuilder;
use stackable_operator::commons::resources::{NoRuntimeLimits, Resources};
use stackable_operator::k8s_openapi::api::core::v1::{
EmptyDirVolumeSource, HTTPGetAction, Probe, ServiceAccount,
};
Expand Down Expand Up @@ -112,6 +113,8 @@ pub enum Error {
ProductConfigTransform {
source: stackable_operator::product_config_utils::ConfigError,
},
#[snafu(display("failed to resolve and merge resource config for role and role group"))]
FailedToResolveResourceConfig,
}
type Result<T, E = Error> = std::result::Result<T, E>;

Expand Down Expand Up @@ -191,8 +194,13 @@ pub async fn reconcile_opa(opa: Arc<OpaCluster>, ctx: Arc<Ctx>) -> Result<Action
role_group: rolegroup_name.to_string(),
};

let resources = opa
.resolve_resource_config_for_role_and_rolegroup(&OpaRole::Server, &rolegroup)
.context(FailedToResolveResourceConfigSnafu)?;

let rg_configmap = build_server_rolegroup_config_map(&rolegroup, &opa)?;
let rg_daemonset = build_server_rolegroup_daemonset(&rolegroup, &opa, rolegroup_config)?;
let rg_daemonset =
build_server_rolegroup_daemonset(&rolegroup, &opa, rolegroup_config, &resources)?;
let rg_service = build_rolegroup_service(&opa, &rolegroup)?;

client
Expand Down Expand Up @@ -375,6 +383,7 @@ fn build_server_rolegroup_daemonset(
rolegroup_ref: &RoleGroupRef<OpaCluster>,
opa: &OpaCluster,
server_config: &HashMap<PropertyNameKind, BTreeMap<String, String>>,
resources: &Resources<OpaStorageConfig, NoRuntimeLimits>,
) -> Result<DaemonSet> {
let opa_version = opa_version(opa)?;
let image = format!("docker.stackable.tech/stackable/opa:{}", opa_version);
Expand All @@ -400,6 +409,7 @@ fn build_server_rolegroup_daemonset(
.add_env_vars(env)
.add_container_port(APP_PORT_NAME, APP_PORT.into())
.add_volume_mount("config", "/stackable/config")
.resources(resources.clone().into())
.readiness_probe(Probe {
initial_delay_seconds: Some(5),
period_seconds: Some(10),
Expand Down
44 changes: 44 additions & 0 deletions tests/templates/kuttl/resources/01-assert.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
---
apiVersion: kuttl.dev/v1beta1
kind: TestAssert
commands:
- script: kubectl -n $NAMESPACE rollout status daemonset opa-server-resources-from-role --timeout 301s
timeout: 300
- script: kubectl -n $NAMESPACE rollout status daemonset opa-server-resources-from-role-group --timeout 301s
timeout: 300
---
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: opa-server-resources-from-role
spec:
template:
spec:
containers:
- name: opa
resources:
requests:
cpu: 100m
memory: 1Gi
limits:
cpu: "1"
memory: 1Gi
- name: opa-bundle-builder
---
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: opa-server-resources-from-role-group
spec:
template:
spec:
containers:
- name: opa
resources:
requests:
cpu: 300m
memory: 3Gi
limits:
cpu: "3"
memory: 3Gi
- name: opa-bundle-builder
25 changes: 25 additions & 0 deletions tests/templates/kuttl/resources/01-install-opa.yaml.j2
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
---
apiVersion: opa.stackable.tech/v1alpha1
kind: OpaCluster
metadata:
name: opa
spec:
version: {{ test_scenario['values']['opa-latest'] }}
servers:
config:
resources:
cpu:
min: 100m
max: "1"
memory:
limit: 1Gi
roleGroups:
resources-from-role: {}
resources-from-role-group:
config:
resources:
cpu:
min: 300m
max: "3"
memory:
limit: 3Gi
6 changes: 6 additions & 0 deletions tests/test-definition.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,13 @@ dimensions:
values:
- 0.37.2-stackable0.2.0
- 0.41.0-stackable0.1.0
- name: opa-latest
values:
- 0.41.0-stackable0.1.0
tests:
- name: smoke
dimensions:
- opa
- name: resources
dimensions:
- opa-latest