Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

remove cert-manager as dependency #727

Merged
merged 3 commits into from
Jul 19, 2024
Merged

remove cert-manager as dependency #727

merged 3 commits into from
Jul 19, 2024

Conversation

eguzki
Copy link
Contributor

@eguzki eguzki commented Jul 3, 2024

What

While this older #680 change was removing cert-manager operator as dependency and, instead, add cert-manager API as dependency, this PR is completely removing cert-manager as dependency for the operator.

Additionally, TLS policy status reports with some understandable error when the cert-manager is not available in the cluster.

Why

The cert-manager is, in fact, a direct dependency of the kuadrant controller (regarding the TLS policy) and indirectly for authorino (required for the webhook deployment). Unfortunately, the dependency declaration of the cert-manager, either as operator or as API, creates unsolvable (from our side) issues in Openshift.

And the issue comes from the RH build of the cert-manager operator . This operator, only available for openshift, provides the same APIs as the upstream community version of the cert-manager operator. That overlapping APIs, was an issue when kuadrant had dependency on the operator. When the openshift cluster had the RH build of the cert-manager operator installed, the kuadrant requirement could not be met as upstream cert-manager was unable to install. On #680 we fixed this issue changing the dependency type from operator to API level. Unfortunately, there are still issues. And the main issue is that The RH build of the cert-manager cannot be installed in the same namespace as the kuadrant operator. The reason being that the kuadrant operator is cluster wide and the RH cert-manager is namespace scoped operator. The OperatorGroup does not enable installing different scoped operators in the same namespace.

Even if the RH cert manager is installed in another NS A, when installing kuadrant in some NS K, OLM tries to install yet another RH cert-manager operator instance in the NS K. Which fails for the reasons explained above.

Extracted from the installplan resource:

clusterServiceVersionNames:             
- dns-operator.v0.0.0                   
- limitador-operator.v0.9.0             
- cert-manager-operator.v1.13.1         
- authorino-operator.v0.11.1            
- kuadrant-operator.v0.8.0 

My guess (just a guess) is that OLM thinks that even if the GVK exists, nobody is watching for it in the specified namespace, hence it tries to install an operator that fulfill the needs in the specified namespace.

Interesting to note that even if the community cert-manager is installed before Kuadrant, the RH cert-manager will still try to be installed in the same namespace as kuadrant. Not sure OLM is behaving as it should. We might have found a bug here. If we re-add cert-manager APIs as dependencies, we need to verify that when the upstream operator is pre-installed, the RH build operator is not being installed in openshift.

read public slack channel discussion about this topic here

The Plan

The plan now is to have the cert-manager API as documentation requirement. The pre-requisite for kuadrant to work as expected is that the cert-manager APIs are in place in the cluster.

The RH build of the cert-manager is planning cluster wide scope feature. Tracked here: openshift/cert-manager-operator#188.

Once a release of RH cert-manager that supports all namespaces is available, we can move back to having it as a dependency. One test we will need to do after bringing the cert-manager dependency back: verify that when the upstream operator is pre-installed, the RH build operator is not being installed in openshift. Created issue for this #729

Verification steps

  • Create kind cluster
make kind-create-cluster
  • Deploy OLM system
make install-olm
  • Deploy Gateway API CRDs (for these verification steps no need to install istio)
make gateway-api-install
  • Deploy kuadrant operator using OLM. The catalog has already been created for you out of the changes of this branch. But feel free to create your own catalog.
make deploy-catalog CATALOG_IMG=quay.io/kuadrant/kuadrant-operator-catalog:fix-cert-manager-deps

The operators should be installed correctly. Wait until the CSV reports all the operators succeeded.

❯ kubectl get clusterserviceversions.operators.coreos.com 
NAME                        DISPLAY              VERSION   REPLACES                         PHASE
authorino-operator.v0.0.0   Authorino Operator   0.0.0                                      Succeeded
dns-operator.v0.0.0         DNS Operator         0.0.0                                      Succeeded
kuadrant-operator.v0.0.0    Kuadrant Operator    0.0.0     kuadrant-operator.v0.0.0-alpha   Succeeded
limitador-operator.v0.0.0   Limitador            0.0.0                                      Succeeded

This is the key change of this PR: the cert-manager has NOT been installed

Let's test the TLSPolicy on this cert-manager-less scenario:

  • Create a namespace:
kubectl create namespace my-gateways
  • Create an ingress gateway
kubectl -n my-gateways apply -f - <<EOF
apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
  name: prod-web
spec:
  gatewayClassName: istio
  listeners:
    - allowedRoutes:
        namespaces:
          from: All
      name: api
      hostname: "*.toystore.local"
      port: 443
      protocol: HTTPS
      tls:
        mode: Terminate
        certificateRefs:
          - name: toystore-local-tls
            kind: Secret
EOF
  • Create a Kuadrant TLSPolicy to configure TLS:
kubectl apply -n my-gateways -f - <<EOF
apiVersion: kuadrant.io/v1alpha1
kind: TLSPolicy
metadata:
  name: prod-web
spec:
  targetRef:
    name: prod-web
    group: gateway.networking.k8s.io
    kind: Gateway
  issuerRef:
    group: cert-manager.io
    kind: Issuer
    name: selfsigned-issuer
EOF

The TLSPolicy controller should be disabled with the following error log

❯ k logs deployment/kuadrant-operator-controller-manager -n kuadrant-system | grep TLSPolicy
{"level":"error","ts":"2024-07-15T09:38:19Z","logger":"kuadrant-operator.tlspolicy","msg":"CertManager CRD was not installed","group":"cert-manager.io","kind":"Certificate","version":"v1","stacktrace":"github.com/kuadrant/kuadrant-operator/pkg/library/gatewayapi.IsCertManagerInstalled\n\t/workspace/pkg/library/gatewayapi/utils.go:162\ngithub.com/kuadrant/kuadrant-operator/controllers.(*TLSPolicyReconciler).SetupWithManager\n\t/workspace/controllers/tlspolicy_controller.go:205\nmain.main\n\t/workspace/main.go:212\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:267"}
{"level":"info","ts":"2024-07-15T09:38:19Z","logger":"kuadrant-operator.tlspolicy","msg":"TLSPolicy controller disabled. CertManager was not found"}

The TLSPolicy instance status should be empty as the controller is not enabled. There is an open issue #730 for adding some understandable error when the cert-manager is not available in the cluster.

Copy link

codecov bot commented Jul 3, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 83.03%. Comparing base (ece13e8) to head (ef04624).
Report is 135 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #727      +/-   ##
==========================================
+ Coverage   80.20%   83.03%   +2.82%     
==========================================
  Files          64       77      +13     
  Lines        4492     5978    +1486     
==========================================
+ Hits         3603     4964    +1361     
- Misses        600      669      +69     
- Partials      289      345      +56     
Flag Coverage Δ
bare-k8s-integration 4.55% <ø> (?)
controllers-integration 72.95% <ø> (?)
gatewayapi-integration 11.07% <ø> (?)
integration ?
istio-integration 56.52% <ø> (?)
unit 33.23% <ø> (+3.20%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Components Coverage Δ
api/v1beta1 (u) 71.42% <ø> (ø)
api/v1beta2 (u) 92.17% <94.11%> (+0.75%) ⬆️
pkg/common (u) 88.13% <ø> (-0.70%) ⬇️
pkg/istio (u) 73.88% <ø> (-0.03%) ⬇️
pkg/log (u) 94.73% <ø> (ø)
pkg/reconcilers (u) ∅ <ø> (∅)
pkg/rlptools (u) 83.33% <ø> (+3.87%) ⬆️
controllers (i) 82.25% <84.37%> (+5.45%) ⬆️

see 40 files with indirect coverage changes

@eguzki eguzki requested a review from a team July 3, 2024 17:32
@maleck13
Copy link
Collaborator

maleck13 commented Jul 4, 2024

This makes sense to me @eguzki I guess once a release that supports all namespaces is available we can move back to having it as a dependency?

Copy link
Collaborator

@maleck13 maleck13 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@mikenairn might be worth a look over also?

@mikenairn
Copy link
Member

Seems fine, but should we not consider just adding this as a pattern for all polices that have a dependency on CRDs not provided by the kuadrant operator i.e. all of them? I know we control the other operators currently required, and therefore less likely to be an issue, but a consistent message on any policy that can't work due to missing CRDs might be beneficial no? Presumably this could be an issue for these operators also if they happened to be installed on cluster already, at least it seems we can't be sure what OLM is going to do.

@eguzki
Copy link
Contributor Author

eguzki commented Jul 4, 2024

This makes sense to me @eguzki I guess once a release that supports all namespaces is available we can move back to having it as a dependency?

yes @maleck13 ! That's the plan. I missed adding this to the description. (Actually, I did but was lost in the ether). Updating the description, thanks

@eguzki
Copy link
Contributor Author

eguzki commented Jul 4, 2024

Added verification steps. I like doing that, only just for the record and future reference.

@eguzki eguzki marked this pull request as ready for review July 4, 2024 08:26
@eguzki eguzki added kind/bug Something isn't working size/small labels Jul 4, 2024
@eguzki
Copy link
Contributor Author

eguzki commented Jul 4, 2024

Once this PR is merged, @maleck13 can you add the new "documentation requirement"? Or give me a hint about where should I add that.

@azgabur
Copy link

azgabur commented Jul 4, 2024

I can verify this fixes the automatic installation of RH cert-manager on Openshift.
The next step should be updating documentation (https://docs.kuadrant.io/dev/kuadrant-operator/doc/install/install-openshift/) with the extra step of installing cert-manager manually. But this can be done in separate PR.

Copy link
Member

@mikenairn mikenairn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes look fine, but as pointed out in chat, some of the changes here might be an issue #715

@eguzki
Copy link
Contributor Author

eguzki commented Jul 4, 2024

Changes look fine, but as pointed out in chat, some of the changes here might be an issue #715

Indeed, if the TLS policy controller starts watching not exiting types, the controller needs to be disabled, and therefore the status would not be populated. I proposed an independent common status controller to fulfill that need in the following issue #730

@jasonmadigan
Copy link
Member

makes sense

but a consistent message on any policy that can't work due to missing CRDs might be beneficial no?

Could be done as some other issue, but I think so yes; if someone installs our operator with some kube distro with a UI, and doesn't install stuff that put other CRDs we need for certain policies in place, we don't really surface that well right now. easy for users to end up with a broken install, with little idea as to why.

@maleck13
Copy link
Collaborator

maleck13 commented Jul 4, 2024

I think it should probably go in here https://docs.kuadrant.io/0.8.0/kuadrant-operator/doc/install/install-openshift/

@@ -243,9 +261,9 @@ spec:
storage:
redis-cached:
configSecretRef:
name: redis-config
EOF
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My IDE does not like trailing spaces :) And me neither.

@eguzki
Copy link
Contributor Author

eguzki commented Jul 4, 2024

@maleck13 kindly asking for review regarding the last commit about the update on the openshift installation doc.

@eguzki eguzki mentioned this pull request Jul 4, 2024

Install one of the different flavours of the Cert-Manager.

#### Install community version of the cert-manager
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we want to specify a version or min version?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We did not specify version or require min version until now.

Is there really a min version of cert-manager that kuadrant supports? @mikenairn ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, we did when the dependency was based on the operator, not on the API:

From https://github.com/Kuadrant/kuadrant-operator/pull/680/files#diff-14e7feda952796eb377d3d14ab302dccca03b212ce86cfddea587f95a8351cebL16-L17

- type: olm.package
    value:
      packageName: cert-manager
      version: "1.14.2"

Adding that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The version of cert manager we are currently testing with, at least from this repo, is "1.12.1" https://github.com/Kuadrant/kuadrant-operator/blob/main/config/dependencies/cert-manager/kustomization.yaml#L2. Not sure what min version we support really, but I'd probably just say that version.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@maleck13
Copy link
Collaborator

maleck13 commented Jul 4, 2024

Looks good. Minor comment but not critical

@eguzki
Copy link
Contributor Author

eguzki commented Jul 9, 2024

rebased to fix conflicts with current main and added cert-manager min version to doc.

Please review at your leisure @mikenairn @maleck13


return kuadrant.AcceptedCondition(tlsPolicy, specErr)
}

func (r *TLSPolicyReconciler) enforcedCondition(ctx context.Context, tlsPolicy *v1alpha1.TLSPolicy, targetNetworkObject client.Object) *metav1.Condition {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this relevant now since #715. The controller no longer gets started if the cert manager CRDs are missing so should never get here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, reverting the changes done for the status reporting as they are useless. Pending tasks for a good reporting captured in #730

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Collaborator

@didierofrivia didierofrivia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is actually the way we've chosen to go with the Helm Charts too. Same issue with namespacing is affecting Helm Chart dependencies, so it'd be explicitly stated as a dependency in the docs.

@eguzki
Copy link
Contributor Author

eguzki commented Jul 15, 2024

@mikenairn some changes were reverted. The status logic does not change. Looking for review here

@eguzki eguzki merged commit c8d02d0 into main Jul 19, 2024
26 checks passed
@eguzki eguzki deleted the fix-cert-manager-deps branch July 19, 2024 08:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Something isn't working size/small
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

6 participants