Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Build: Disable Builder SA #1558

Closed

Conversation

adambkaplan
Copy link
Contributor

Enhancement proposal to provide cluster configuration options that disable the generation of the builder service account. Today, OCP generates the builder service account in every namespace for clusters that enable the "Build" capability. Long-living clusters that upgraded from OCP 4.13 or earlier are not able to turn off the "Build" capability, but may have otherwise found ways to disable builds on OCP through RBAC controls. In these instances, the builder service account represents a potential security threat, which cannot be mitigated without overrides that bring the cluster into an unsupported state.

Turning off the builder service account will have significant impact on developer experience, which can be mitigated through accurate documentation and the use of internal developer platforms like RHDH (powered by Backstage). The proposal outlines these requirements as well as requiring builds to "fail fast" if the referenced or implied service account does not exist.

@openshift-ci openshift-ci bot requested review from mfojtik and stlaz February 6, 2024 18:40
Copy link
Contributor

openshift-ci bot commented Feb 6, 2024

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please ask for approval from adambkaplan. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@adambkaplan
Copy link
Contributor Author

/assign @soltysh

Assigning to Maciej per @bparees while he is on "sabbatical".

/cc @coreydaley @sayan-biswas @apoorvajagtap @siamaksade

Adding Pipeline Integrations team members for review.

enhancements/builds/disable-builder-sa.md Outdated Show resolved Hide resolved
enhancements/builds/disable-builder-sa.md Show resolved Hide resolved
enhancements/builds/disable-builder-sa.md Outdated Show resolved Hide resolved
@dhellmann
Copy link
Contributor

#1555 is changing the enhancement template in a way that will cause the header check in the linter job to fail for existing PRs. If this PR is merged within the development period for 4.16 you may override the linter if the only failures are caused by issues with the headers (please make sure the markdown formatting is correct). If this PR is not merged before 4.16 development closes, please update the enhancement to conform to the new template.

Copy link

@soltysh soltysh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You'll also need to update the template to match Doug's recent changes

enhancements/builds/disable-builder-sa.md Show resolved Hide resolved
enhancements/builds/disable-builder-sa.md Outdated Show resolved Hide resolved
adambkaplan and others added 3 commits February 19, 2024 15:13
Enhancement proposal to provide cluster configuration options that
disable the generation of the `builder` service account. Today,
OCP generates the `builder` service account in every namespace for
clusters that enable the "Build" capability. Long-living clusters
that upgraded from OCP 4.13 or earlier are not able to turn off the
"Build" capability, but may have otherwise found ways to disable
builds on OCP through RBAC controls. In these instances, the
`builder` service account represents a potential security threat,
which cannot be mitigated without overrides that bring the cluster
into an unsupported state.

Turning off the `builder` service account will have significant
impact on developer experience, which can be mitigated through
accurate documentation and the use of internal developer platforms
like RHDH (powered by Backstage). The proposal outlines these
requirements as well as requiring builds to "fail fast" if the
referenced or implied service account does not exist.

Signed-off-by: Adam Kaplan <adam.kaplan@redhat.com>
Typo in "configuring"

Co-authored-by: Maciej Szulik <soltysh@gmail.com>
- Add context around long-lived cluster upgrading to OCP 4.14,
  particularly for large fleets.
- Add details related to SLI metrics for builds. The current metrics
  for BuildConfig do not provide reliable, long-running data.
  Addressing this is out of scope.
- Add requirement that we verify builds can succeed if admins/platform
  teams bring their own service account for builds.
@adambkaplan adambkaplan force-pushed the service-accounts-mgmt branch 2 times, most recently from 7510960 to 3560b95 Compare February 19, 2024 21:03
Use runtime validation by the ocm-o cluster operator to handle
defaulting of the new `spec.builderServiceAccount` field. This allows
us flexibility to change the default behavior of OCP in the future.
There is a minor cost to implementing this in the operator, instead of
CRD defaulting mechanisms.

The proposal was also updated to use the new enhancement template.
Document that OpenShift already has mitigations in place for this
feature for large clusters. Openshift-controller-manager already has
QPS limits for each individual controller (10 QPS), which implies an
upper bound of 40 objects created per second (each reconcile can create
a service account, role binding, and 2 Secrets for internal registry
auth tokens). At scale, this contributes small amounts of storage
(40 MiB max per 10k namespaces) and will not flood the cluster with
object creation.
Final copy edit of the enhancement proposal.
Copy link
Contributor

openshift-ci bot commented Feb 20, 2024

@adambkaplan: all tests passed!

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@adambkaplan
Copy link
Contributor Author

/close

Per review from @deads2k: this feature will leave the Build API system in a half-broken state. At scale, cluster admins will disable the builder service account generation without understanding the consequences and required actions on their part. We are better served with a stronger feature sets that are less prone to user error/impact, such as:

  • Enabling cluster admins to rebuild clusters with the Build capability disabled
  • Allowing cluster admins to turn off the entire Build API system gracefully:
    • Blocking creation of new BuildConfig, Build, etc. API objects
    • Informing admins of the requirement to delete BuildConfig, Build, etc. objects (the operator can't do this because it is a destructive, irrevocable operation).
    • Cleaning up all Build related assets once the respective BuildConfig, Build, etc. objects are removed.

The half-broken state is exacerbated by the poor metrics we have for BuildConfigs - admins have no reliable mechanisms to create an SLO for Builds and be properly alerted when builds fail due to missing service accounts.

@openshift-ci openshift-ci bot closed this Feb 21, 2024
Copy link
Contributor

openshift-ci bot commented Feb 21, 2024

@adambkaplan: Closed this PR.

In response to this:

/close

Per review from @deads2k: this feature will leave the Build API system in a half-broken state. At scale, cluster admins will disable the builder service account generation without understanding the consequences and required actions on their part. We are better served with a stronger feature sets that are less prone to user error/impact, such as:

  • Enabling cluster admins to rebuild clusters with the Build capability disabled
  • Allowing cluster admins to turn off the entire Build API system gracefully:
  • Blocking creation of new BuildConfig, Build, etc. API objects
  • Informing admins of the requirement to delete BuildConfig, Build, etc. objects (the operator can't do this because it is a destructive, irrevocable operation).
  • Cleaning up all Build related assets once the respective BuildConfig, Build, etc. objects are removed.

The half-broken state is exacerbated by the poor metrics we have for BuildConfigs - admins have no reliable mechanisms to create an SLO for Builds and be properly alerted when builds fail due to missing service accounts.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants