Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Run rbac createresources for namespaces concurrently to avoid slowness #2218

Merged
merged 2 commits into from
Jul 30, 2024

Conversation

jkhelil
Copy link
Member

@jkhelil jkhelil commented Jun 19, 2024

Changes

  • Process rbac resource creation for openshift namespace reconciliation concurrently to avoid slowness on clusters with high number of namespace using a workerpool
  • Avoid blocking the reconciliation loop on error, when errors happen on namespace rbac resource reconciliation for openshift
  • Add unit tests to createRessource in rbac.go

Submitter Checklist

These are the criteria that every PR should meet, please check them off as you
review them:

See the contribution guide for more details.

Release Notes

@tekton-robot tekton-robot added the release-note Denotes a PR that will be considered when it comes time to generate release notes. label Jun 19, 2024
@tekton-robot tekton-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Jun 19, 2024
@jkandasa
Copy link
Member

/retest

@tekton-robot tekton-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jun 21, 2024
@tekton-robot
Copy link
Contributor

The following is the coverage report on the affected files.
Say /test pull-tekton-operator-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/reconciler/openshift/tektonconfig/common.go 0.0% 54.8% 54.8
pkg/reconciler/openshift/tektonconfig/rbac.go 0.0% 43.4% 43.4

@jkhelil
Copy link
Member Author

jkhelil commented Jun 21, 2024

@jkandasa @piyush-garg can you lgtm please ?

@tekton-robot
Copy link
Contributor

The following is the coverage report on the affected files.
Say /test pull-tekton-operator-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/reconciler/openshift/tektonconfig/common.go 0.0% 54.8% 54.8
pkg/reconciler/openshift/tektonconfig/rbac.go 0.0% 43.4% 43.4

@jkhelil
Copy link
Member Author

jkhelil commented Jun 26, 2024

@jkandasa @piyush-garg Please have a look to my PR

Copy link
Member

@jkandasa jkandasa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jkhelil Thanks for the PR
LGTM

@tekton-robot tekton-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 2, 2024
@jkandasa
Copy link
Member

jkandasa commented Jul 3, 2024

@piyush-garg @vdemeester can you please review?

@@ -393,54 +396,73 @@ func (r *rbac) createResources(ctx context.Context) error {
return err
}

var wg sync.WaitGroup
sem := make(chan struct{}, rbacMaxConcurrentCalls) // Semaphore with buffer size of rbacMaxConcurrentCalls
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need this if we use sync.WaitGroup ?

Copy link
Member Author

@jkhelil jkhelil Jul 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vdemeester I am using semaphore to set a maximum number of concurrent goroutine to avoid overloading the api server as processResourcesForSingleNamespace is doing a couple of calls to apiserver
I think there is noneed to the waitgroup, removing the waitgroup as it is useless

My bad we need the wait group to ensure that createResource doesnt return before the goroutines finishes their job
and we use the semaphore to set maximum number of concurrent goroutines

@jkhelil
Copy link
Member Author

jkhelil commented Jul 3, 2024

/test pull-tekton-operator-build-tests


// if No error add `openshift-pipelines.tekton.dev/namespace-reconcile-version` label to namespace
// so that rbac won't loop on it again
if err == nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

err will be nil right if above function failed but last one succeed?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, thank you for catching this @piyush-garg, proposing a fix

Copy link
Member Author

@jkhelil jkhelil Jul 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems that might be a data race using the shared logger stored in the context with
logger := logging.FromContext(ctx), it returns a zaplogger that is shared between goroutines and fortunately the issue happened on one of the tests
https://storage.googleapis.com/tekton-prow/pr-logs/pull/tektoncd_operator/2218/pull-tekton-operator-unit-tests/1809201707154935808/build-log.txt

I am now creating a logger for each goroutine

@jkhelil jkhelil force-pushed the SRVKP-2279_2 branch 2 times, most recently from 094536f to 1aeb77c Compare July 5, 2024 13:44
@tekton-robot
Copy link
Contributor

The following is the coverage report on the affected files.
Say /test pull-tekton-operator-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/reconciler/openshift/tektonconfig/common.go 0.0% 54.8% 54.8
pkg/reconciler/openshift/tektonconfig/rbac.go 0.0% 42.4% 42.4

@jkhelil
Copy link
Member Author

jkhelil commented Jul 5, 2024

/test pull-tekton-operator-build-tests

5 similar comments
@jkhelil
Copy link
Member Author

jkhelil commented Jul 5, 2024

/test pull-tekton-operator-build-tests

@jkhelil
Copy link
Member Author

jkhelil commented Jul 5, 2024

/test pull-tekton-operator-build-tests

@jkhelil
Copy link
Member Author

jkhelil commented Jul 5, 2024

/test pull-tekton-operator-build-tests

@jkhelil
Copy link
Member Author

jkhelil commented Jul 6, 2024

/test pull-tekton-operator-build-tests

@jkhelil
Copy link
Member Author

jkhelil commented Jul 7, 2024

/test pull-tekton-operator-build-tests

@jkhelil
Copy link
Member Author

jkhelil commented Jul 8, 2024

/test pull-tekton-operator-build-tests

@tekton-robot
Copy link
Contributor

The following is the coverage report on the affected files.
Say /test pull-tekton-operator-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/reconciler/openshift/tektonconfig/common.go 0.0% 54.8% 54.8
pkg/reconciler/openshift/tektonconfig/rbac.go 0.0% 42.4% 42.4

@piyush-garg
Copy link
Contributor

/retest

@tekton-robot
Copy link
Contributor

The following is the coverage report on the affected files.
Say /test pull-tekton-operator-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/reconciler/openshift/tektonconfig/common.go 0.0% 54.8% 54.8
pkg/reconciler/openshift/tektonconfig/rbac.go 0.0% 42.4% 42.4

}
} else {
logger.Errorf("failed to reconcile namespace %s, %w", ns.Name, err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just a small thought like, dont we need to requeue the request or something to process it again?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

true, i ve added an error chan that capture the error and return it to main createResource. createResource returns the error to the caller, and request is then requeued

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

aah ohk fine

@@ -0,0 +1,179 @@
package tektonconfig
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add license header

@tekton-robot
Copy link
Contributor

The following is the coverage report on the affected files.
Say /test pull-tekton-operator-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/reconciler/openshift/tektonconfig/common.go 0.0% 54.8% 54.8
pkg/reconciler/openshift/tektonconfig/rbac.go 0.0% 43.1% 43.1

@tekton-robot
Copy link
Contributor

The following is the coverage report on the affected files.
Say /test pull-tekton-operator-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/reconciler/openshift/tektonconfig/common.go 0.0% 54.8% 54.8
pkg/reconciler/openshift/tektonconfig/rbac.go 0.0% 45.0% 45.0

@Diliz
Copy link

Diliz commented Jul 25, 2024

Changes

  • Process rbac resource creation for openshift namespace reconciliation concurrently to avoid slowness on clusters with high number of namespace
  • Avoid blocking the reconciliation loop on error, when errors happen on namespace rbac resource reconciliation for openshift

I am in this situation, thanks for this pull request 👍

@tekton-robot
Copy link
Contributor

The following is the coverage report on the affected files.
Say /test pull-tekton-operator-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/reconciler/openshift/tektonconfig/common.go 0.0% 54.8% 54.8
pkg/reconciler/openshift/tektonconfig/rbac.go 0.0% 45.0% 45.0

@tekton-robot
Copy link
Contributor

The following is the coverage report on the affected files.
Say /test pull-tekton-operator-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/reconciler/openshift/tektonconfig/common.go 0.0% 54.8% 54.8
pkg/reconciler/openshift/tektonconfig/rbac.go 0.0% 45.2% 45.2

@piyush-garg
Copy link
Contributor

/approve

cc @jkandasa

Copy link
Member

@jkandasa jkandasa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@tekton-robot tekton-robot added the lgtm Indicates that a PR is ready to be merged. label Jul 30, 2024
@tekton-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jkandasa, piyush-garg, vdemeester

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [jkandasa,piyush-garg,vdemeester]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@tekton-robot tekton-robot merged commit ea1a142 into tektoncd:main Jul 30, 2024
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants