Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Defaults & Overrides (RFC 0009) #58

Merged
merged 16 commits into from
Apr 10, 2024
Merged

Defaults & Overrides (RFC 0009) #58

merged 16 commits into from
Apr 10, 2024

Conversation

guicassolato
Copy link
Contributor

RFC: Defaults & Overrides (Epic issue: Kuadrant/kuadrant-operator#431)

Proposes the extension of the Kuadrant Policy APIs so we support use cases of Defaults & Overrides (D/O) for Inherited Policies, including the following base use cases:

  • Defaults: policies set at a lower level in the hierarchy supersede ones set (as "defaults") at a higher level, or "more specific beats less specific"
  • Overrides: policies set at a higher level in the hierarchy supersede ones set at the lower levels, or "less specific beats more specific"

As well as the following derivative cases:

  • Merged defaults: default policy rules that are merged into the more specific policies (as opposed to an atomic less specific set of rules that is activated only when another more specific one is absent)
  • Merged overrides: override policy rules that are merged into the more specific policies (as opposed to an atomic less specific set of rules that is activated fully replacing another more specific one that is present)
  • Constraints: specialization of an override that rather than declaring concrete values, specify constraints for values – typically numeric values and regular patterns (e.g. limited sets) – declared at the lower levels, that is used to "clip" the requested specific values within the boundaries dictated by the constraint, in an override fashion – e.g.: min value, max value, in operator.
  • Deactivation: specialization that completes a merge default use case by allowing lower level policies to disable ("deactivate") individual defaults set a higher level (as opposed to superseding those defaults with actual more specific policy rules with meaning)

Out of scope:

  • Requirements: high level policies that declare requirements to be fulfilled by more specific (lower level) policies without specifying concrete default or override values or constraints. E.g.: "an authentication policy must be enforced, but none is provided by default."

Affected APIs:

  • AuthPolicy
  • RateLimitPolicy

Non-affected APIs, while these are considered Direct Policies, i.e. with no hierarchical effect:

  • DNSPolicy
  • TLSPolicy

@thomasmaas
Copy link

Couple of questions:

  1. GEP Examples use singular default and override. The examples in this pr use plural. On purpose?
  2. GEP Examples seem to always use override and default keywords while in the examples in this PR all policies targeting a http route don't use any keyword (and are thus direct policies?). On purpose?

@guicassolato
Copy link
Contributor Author

Couple of questions:

1. [GEP Examples](https://gateway-api.sigs.k8s.io/geps/gep-713/#inherited-policy-attachment-its-all-about-the-defaults-and-overrides) use singular `default` and `override`. The examples in this pr use plural. On purpose?

2. GEP Examples seem to always use `override` and `default` keywords while in the examples in this PR all policies targeting a http route don't use any keyword (and are thus direct policies?). On purpose?

No, it was not intentional, but because I've actually seen the GEP using both ways. Maybe not in the YAML examples, but throughout the text. For example, where it says:

Settings in overrides stanzas will win over the same setting in a defaults stanza.

Either way, thanks for pointing out. Happy yo change to singular form.

@guicassolato
Copy link
Contributor Author

guicassolato commented Feb 16, 2024

Even in the examples actually. Try searching for "overrides:" @thomasmaas. You'll find plural form, for example, in this section of the GEP: https://gateway-api.sigs.k8s.io/geps/gep-713/#inherited-policy-attachment

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one I found odd. I wonder is the remove really needed? Finding it hard to reconcile

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that said I am also struggling to put my finger on why this in particular jumped out at me. I think it is the "look back" that gives me pause.

| As a Platform Engineer, when configuring a Gateway, I want to set policy rules (parts of a policy) for all routes linked to my Gateway, that cannot be individually replaced by more specific ones(*), but only expanded with additional more specific rules(\*). | OR | **gateway-override-policy-rule** |
| As an Application Developer, when managing an application, I want to set a policy for my application, that fully replaces any default policy that may exist for the application at the level of the Gateway, without having to know about the existence of the default policy. | D | **route-replace-policy** |
| As an Application Developer, when managing an application, I want to expand a default set of policy rules set for my application at the level of the gateway, without having to refer to those existing rules by name. | D/O | **route-add-policy-rule** |
| As an Application Developer, when managing an application, I want to deactivate an individual default rule set for my application at the level of the gateway. | RR | **route-disable-policy-rule** |
Copy link
Collaborator

@maleck13 maleck13 Feb 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to be clear I am not against this, I am just struggling a little to see the flow for the user. I imagine it will have to be three steps

  1. create a policy without a remove in place
  2. see my effective policy and figure out I need to remove something
  3. add a remove clause to my policy and re-deploy

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. create a policy without a remove in place
  2. see my effective policy and figure out I need to remove something
  3. add a remove clause to my policy and re-deploy

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my mind defaults are meant to have a base line in place. If a platform engineer adds a default, the message to the app developer is; do something here but I've got your back if you don't. If the app developer can just come in and deactivate it, I guess the case can be made that the app developer did consider but rejected. But is that in the spirit of the spec?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This use case completes the gateway-default-policy-rule one.

Say a PE sets at Gateway level default rules A, B and C. Strategy is merge. Then an App Dev sets at Route level: A' and D, where A' replaces A. The effective policy will be: A', B, C, D.

Now, imagine the App Dev is OK with B, but does not want/need C. We need to support App Dev specifying A', D, -C; otherwise there is no way to have the effective policy without C.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could it be expressed via the property IE "C": {}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to provide actual examples of how kinky this can get...

In an AuthPolicy, there is already a way to "nullify" (skip) authentication. That way is by declaring an authentication rule with the anonymous method. In fact, by default, an AuthPolicy missing the authentication block or with the block set to an empty object is interpreted as a shortcut to anonymous access. E.g.:

authentication:
  "my-authn-rule":
    anonymous: {}

or

authentication: {}

Obviously, one cannot nullify a limit definition in a RateLimitPolicy in the same way. Current validation won't allow it.

Even in the AuthPolicy, there's no exact equivalent for rules in the other blocks (authorization, metadata, etc.) One can nullify the entire block of rules (by setting it to an empty object) because they are not required block, but the individual rules in the block, have each one its own way to effectively make the rule a no-op and sometimes no way at all do it.

An authorization rule can be "nullified" by modifying the method to Rego:

authorization:
  "my-authz-rule":
    rego: allow = true

A metadata rule cannot be nullified today.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This use case completes the gateway-default-policy-rule one.

Say a PE sets at Gateway level default rules A, B and C. Strategy is merge. Then an App Dev sets at Route level: A' and D, where A' replaces A. The effective policy will be: A', B, C, D.

Now, imagine the App Dev is OK with B, but does not want/need C. We need to support App Dev specifying A', D, -C; otherwise there is no way to have the effective policy without C.

Deactivation/Nullification sounds like a edge/special case that is an artefact of the merge strategy and also the granualarity problem. If the Gateway used the atomic strategy, this probably wouldn't be needed I'm guessing but the Route level would never end up with A', D (from route) and B (from gateway assumming they want this and not C) from inheritance 🤔

I feel like the uncertainty from this is while this is a valid use case, there is a bit of contridiction with the concept of defaults (i.e. this field should not be empty / ignored / have a value) and inheritance, only to later allow nullification of inherited fields/values 🤷 This might bring to question whether we view defaults as just "recommendation for this field to have this value" set by the P.E. and overrides as "this field must be this value". Nullification feels like an override (to ignore field) in this case 🤔 Anyway, just a few thoughts

Copy link
Contributor Author

@guicassolato guicassolato Mar 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might bring to question whether we view defaults as just "recommendation for this field to have this value" set by the P.E. and overrides as "this field must be this value".

I see as more than recommendation because a default also sets the value when a more specific one is not defined. So, I little bit more than "can you please accept this value?" Yet, not to be confused with setting a requirement though, which is another thing. But overall, yes, a default implies a choice (which an override doesn't.)

One who sets a default gives the option for that value to be modified. What are the limits of that modification? Could somebody modify the value to the point that it becomes a no-op? There's nothing in the semantics of a default that says otherwise IMO.

Therefore, I see no contradiction with what a default means, because I don't think it means the field should not be empty. It only means that a value was provided and that it can be changed; until it's changed, the value is effective.

I'm not denying the use case for setting a value that cannot be removed. I'm just saying that IMO that is not what a default is.

A value that cannot be removed can actually be one of 2 different cases:

  1. the case where the value not only cannot be removed, but cannot be changed either – this is basically the definition of an override;
  2. the case of a field that must be set, though without specifying a concrete default value with it – I've called this use case "requirement" before, which was removed from the scope of D/O.

I can see PEs defining requirements AND defaults. Nullifying/deactivating a rule may not violate a default, but still violate a requirement.

I have no doubt deactivation is a valid use case – as well as requirements may be.

Thinking one may really need to deactivate a default inherited from a higher level is a fair assumption. By definition, what's set at the higher level is more generic. These rules (say, at the Gateway) will be inherited by all special cases beneath (say, all routes.) This doesn't leave much room for exception.

But exceptions do exist. Without a way to remove the inherited default, the owner of the lower policy would have to request to the owner of the higher policy for a special condition to be added at the higher policy, so the default wouldn't take effect only in that particular case.

I see this as leaking details that belong to the specialisation level upwards into the generalisation layer. This should not happen.

Another point in favour of supporting deactivation is that configuring a system in a way that no user (or only specific users) can use this feature is easy. The way I imagine it, it won't require any additional code. Whereas the contrary – i.e. not supporting deactivations in our API and yet finding a way for users to be able unset a default – is not easy and perhaps not even possible.

Deactivation/Nullification sounds like a edge/special case that is an artefact of the merge strategy and also the granualarity problem

Correct. For the atomic strategy, no rule from a higher policy would be merged if a lower policy is defined.

As for the granularity, if we end up supporting levels of granularity lower that the level of named policy rules, then users might ask for deactivation/nullification of those yet lower values as well. Luckily, the design proposed for deactivation already accounts for that.


I still have a couple TODOs in the RFC regarding deactivation, including bringing these POVs and concerns to the text. I also would like to rename to unset. WDYT?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very good points Gui, I agree with them. This sounds good to me. I'm happy to have this in and renaming it to unset 👍

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a subsection explaining better the concept of "unsetting defaults" and implementation pushed to Tier 3.

@youngnick
Copy link

Reading over this, and one relevant change that I'm working on in splitting out Direct Attached Policy and Inherited Policy into their own GEPs is this:

I'm changing the definition of Direct and Inherited Policies a little bit.

Direct Policies now target only one object and only affect that object. Before there was some leeway where you could have specified a policy without defaults or overrides at, for example, the Gateway level, and had that affect Routes, but it still basically counted as a Direct Policy.

Now, Inherited Policies are any policy that has effects outside of the resource that it's attached to, regardless of whether there are defaults or overrides stanzas present.

I don't think that anything I'm seeing in this doc contradicts that, but just so you know.

I also really like the use of the word "constraints" for the class of settings that don't directly default or override, but place bounds around possible values. If you don't mind, I'm going to steal that and put it in the Inherited Policy GEP once it's split out.

Comment on lines +102 to +114
kind: AuthPolicy
metadata:
name: gw-policy
spec:
targetRef:
kind: Gateway
defaults:
rules:
authentication:
"a": {…}
authorization:
"b": {…}
strategy: atomic

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd recommend moving to using lists of named subobjects instead of maps for the rules sections:

kind: AuthPolicy
metadata:
  name: gw-policy
spec:
  targetRef:
    kind: Gateway
  defaults:
    rules:
      authentication:
        - name: "a"
          {...}
      authorization:
        - name: "b"
          {...}
    strategy: atomic

This is straight from the Kubernetes API conventions, https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/api-conventions.md#lists-of-named-subo[…]preferred-over-maps, and as it says "maintains the invariant that all JSON/YAML keys are fields in API objects"

kubernetes/kubernetes#2004 goes into some more detail of why this is probably better.

Copy link
Contributor Author

@guicassolato guicassolato Feb 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We actually walked the opposite direction, replacing lists with maps not too long ago. It was our (possibly naive?) attempt to prepare for supporting merges between policies that would acknowledge each of those named policy rules individually.

We knew that, with lists, a merge would have to pick either one entire list of rules or the other, but never mix and combine two lists. Something about that was recommended in GEP-713, I recon.

To support our use cases, we found no way but to treat those user-defined values (names of rules within a policy) as keys, not as value. Using maps felt then the lesser of two evils perhaps.

Any suggestion that could help us solve the issue? Could we still merge lists picking elements of both sides?

Thanks for the references!

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The trick is to treat the list as what the apiserver calls a listMapType: although the objects are stored as a list, all code that handles them treats them as a map with the name as the key. (This is actually how Conditions and Listeners work already).

Then the apiserver can use the Strategic merge type and handle a lot of this for you.

In the CRD definitions, this is done with:

	// +listType=map
	// +listMapKey=name

on the containing field (the []Type field).

Doing things this way also gives a big advantage in that the list is ordered, so you can assume that the listed order is important if you want. A map is explicitly randomly ordered, so you'll need to do something else to sort the keys.

To be clear, this is a case of all the solutions having bad points. But personally, I think that the listType=map has the least onerous tradeoffs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added as a future possibility as it's not a blocker.


All the examples in the RFC are based on API DESIGN 1.

## API DESIGN 1 - `strategy` field - _RECOMMENDED_

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an interesting design - I'm curious to see how it works in practice.

It feels to me like there are a lot of moving parts here, and it may be difficult for users, particularly new users, to understand how these interactions work.

In Gateway API itself, I've recommended we stick with something more approaching the merge strategy, and try not to use structs that can't be compared on a per-field basis, to try to keep the interaction complexity as low as possible. I explicitly did not include a switch to change the mode (as the strategy field does) because I'm worried about combinatorial interaction complexity.

To paraphrase Spider-man, with great complexity comes great potential for confusion.

I suspect that it will be easy to build a system of Policies that require a UI or similar to understand the resultant policy, which doesn't seem ideal. But conversely, calculating the resultant policy and putting it on a status somewhere creates a large fanout update problem.

So yes, I'm a little worried about how complex this will end up, but not enough to tell you to stop. :)

Copy link
Contributor Author

@guicassolato guicassolato Feb 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @youngnick. This is good feedback. Thank you so much!

Indeed, managing the complexity implied by these proposed designs poses a challenge. Even with this option, which I consider to be the safest among the 5.

I suppose communication between users is key, and it has to happen one way or another. One who declares an intent will want to know if that request got affected by something in the middle. This works both ways, for one who sets a policy, say, at the Gateway level, as well as for one who does it at the level of the route.

I guess our bet here is that we will be able to handle that communication inside the cluster, via status stanzas of the resources, provided that the proposed API design is still fairly conservative in the sense that the level of atomicity of the declarations within a policy has been kept relatively high – i.e. either the entire policy or the level of what we call "named policy rules," which is just one level down. (We have no plans to support merges lower than that for now.)

With the named policy rules, we can report statuses such as, for example:

Enforced Reason Message
true
true PartiallyEnforced The following policy rules were overridden by gw-ns/gw-policy: a, b
true EnforcedWithAdditions The following default policy rules were added by gw-ns/gw-policy: c, d
true PartiallyEnforcedWithAdditions The following policy rules were overridden by gw-ns/gw-policy: a, b; the following default policy rules were added by gw-ns/gw-policy: c, d
false Overridden Policy fully overridden by gw-ns/gw-policy

We could consider something even more conservative and, say, support completely atomic defaults or overrides declarations only (i.e. no merges at all!) In this case, however, one who wants to "cherry pick" individual policy rules to override, at the lower level or the higher, but not necessarily an entire policy, would have to learn about those rules (offline?) and, I guess, duplicate in their policies the rules to keep? This seems to work OK-ish for replacing higher defaults at a lower level, but can you imagine the contrary, i.e. overriding, at the higher level, only parts of lower level policies? Keeping consistency between those duplicated rules would be already a problem, but, on top of that, I see also potential leakage of lower (more specific) definitions upwards (into the less specific level,) while users try to workaround the API to achieve that combinatorial interaction that we otherwise refused to incorporate by design.

IOW, people will do merges one way or another. I personally think it's better if we try to control them (to an extent,) and provide users with some framework to follow. (A decision I could easily regret – acknowledging your shared Spider-man wisdom 🙂)

I promise to give this a few more rounds before committing to anything. Your concerns are very well justified.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IOW, people will do merges one way or another. I personally think it's better if we try to control them (to an extent,) and provide users with some framework to follow.

I agree, this is a good, pragmatic point.

I think that as long as the design thinks carefully through all the discoverability concerns (as in https://gateway-api.sigs.k8s.io/geps/gep-713/#user-discoverability ), then it's worth trying things and seeing what works.

@alexsnaps alexsnaps added kind/epic RFC Request For Comments labels Feb 19, 2024
@guicassolato guicassolato self-assigned this Feb 19, 2024
@alexsnaps
Copy link
Member

alexsnaps commented Feb 20, 2024

With the proposal of adding support for CEL expressions in when clauses, and if we really open policies to CEL in general, I'm wondering if we couldn't address much of this proposal (strategy, constraints, ...) with supporting any node in the policy tree to be either T or fn(T): T, where fn would be a CEL expression(s).

In order to achieve that we'd need a way to "escape" into such a CEL expression... I'm unsure if this has been done before or if there are well established patterns to do so, but I'll just use the cel: prefix to a string to identify such "escaping" in the quick examples below. An atomic merge strategy would be the default... in order to merge, e.g. add a key/value to a map, one would escape to cel and "just" add the entry to the map: map_typed_key: cel:value["a"] = my_added_value_for_a... We'd define the binding's name for the node upfront (in this example value), but could be anything. And this example my_added_value_for_a would need to be properly constructed to be a valid entry value for this
map. We could force a return to the function, or "just" reuse the binding passed in (in this case, value) to reassign to the policy's tree node.

Now that entry could be conditionally added, e.g. only if it is missing. A CEL expression could also validate that all entries in a map e.g. for rate limits are within certain bounds and clip/clamp it accordingly... effectively acting as constraints on the policy.

While I think this simplifies, I also fear it might open the door to too many complex cases, given the power CEL would add at any given node of the tree that expresses a Policy. Arguably it would only be evaluated once, when a policy is created/modified. Also, we could absolutely restrict what a CEL expression can and cannot do initially and only slowly expand the usage further, with new releases as the need/usecases emerge.

tl;dr I'm torn... this feels like a simple enough idea to cover all of the usecases one can think of, but also feels like this is too much power. But I couldn't talk myself out of it for the last week thinking about it. Anything I miss that makes this a real bad idea?

addendum: I think this would also "solve" (push on the user) a way to merge lists...

@guicassolato
Copy link
Contributor Author

I'm wondering if we couldn't address much of this proposal (strategy, constraints, ...) with supporting any node in the policy tree to be either T or fn(T): T, where fn would be a CEL expression(s).

Thanks, @alexsnaps. Now that I understand the idea better, I'll try to capture it among the options by updating DESIGN OPTION 3.

The pros you've highlighted very well. I have nothing to add.

I agree with you and also believe we can workaround the con of possibly causing "unmergeable objects" to merge – not only for lists but for maps too! – by pushing on the users. They would be the ones defining the final output of the functions.

Imagining the maps of policy rules re-implemented as lists, an "atomic" set of default authentication rules could look like this:

spec:
  defaults:
    rules:
      authentication:
      - name: x
        value: 1
      - name: y
        value: 2

This would be a case of a value declared as T, not as fn(T): T yet.

I still have one problem with it though. What is the key and what is the value? What if we the policy declared an authorization block as well? Would authentication and authorization be 2 keys or would rules be the one and only key (and authentication and authorization part of the value?)

Maybe assuming the values are always the scalars and the lists, and that the strategy is always a merge...

An equivalent example to the above, but with dynamic value, i.e. using fn(T): T instead of T:

spec:
  defaults:
    rules:
      authentication: |
        cel:[Authentication{name: "x", value: 1},Authentication{name: "y", value: 2}]

Of course, this second example shows the need for having a lib that defines a protobuf message Authentication.

A default that adds an authentication rule if it's not defined in the target would probably have to be declared as an override, I reckon:

spec:
  overrides:
    rules:
      authentication: |
        cel:self.exists_one(a, a.name == 'x') ? self : self + [Authentication{name: "x", value: 3}]

I still have my doubts about the potential "implementation nightmare." We would have to redefine tons of API types, cannot simply reuse current ones (from the APIs without D/O), plus protobuf messages for all of those, etc.

To reduce the burden, though at the expense of loosing things up a bit, we could define CEL function extensions that deal with the dynamic types. Found a few of those in https://tekton.dev/docs/triggers/cel_expressions/ (e.g. parseJSON(), parseYAML.)

My last thought here regards the cognitive complexity for one who reads out a policy to mentally work out all the function outputs. I think this is a point you've raised offline as well, @alexsnaps, but please correct me if I'm wrong.

@youngnick
Copy link

I am strongly -1 on any Kubernetes object using an inline function as a value. Sure, this is super flexible, but, as mentioned, both the cognitive overhead for human creators and readers of the policy, and implementation overhead for whoever's implementing the policy, seem terrifyingly complex to me.

To put this another way, I don't think that adding the ability to have what may end up as Turing-complete fields (if they aren't already, I haven't looked at the CEL spec for a while) is going to go well.

@alexsnaps
Copy link
Member

My last thought here regards the cognitive complexity for one who reads out a policy to mentally work out all the function outputs. I think this is a point you've raised offline as well, @alexsnaps, but please correct me if I'm wrong.

I am strongly -1 on any Kubernetes object using an inline function as a value. Sure, this is super flexible, but, as mentioned, both the cognitive overhead for human creators and readers of the policy, and implementation overhead for whoever's implementing the policy, seem terrifyingly complex to me.

Yes, that was my point @guicassolato . I think the cognitive overhead might be "a little much" indeed, to say the least. If we have a tool (e.g. gwctl) that could give a user the effective policy on any object would almost be a necessity, and yet probably not even be enough to then have some reason about that how that actually came together by "just reading" the policies themselves with CEL interleaved in them.

From an implementer point of view, i.e. literally mine, I actually think this is a very elegant and actually pretty simple (not easy necessarily tho) to go about it. But don't get me wrong, I'm NOT using this as an argument to actually make it happen. In my experience, this thinking has led to more damage than good in my perspective. The one interesting aspect tho, is how much would it take away if we do not do it? And further more is it reversible (in Kuadrant obviously, not talking about any GEP here)?

To put this another way, I don't think that adding the ability to have what may end up as Turing-complete fields (if they aren't already, I haven't looked at the CEL spec for a while) is going to go well.

That, we could control. Just as validation limits the complexity of CEL expressions. Not that I believe (again), that this justifies the feature... but leaves it up for consideration.

tl;dr while I think there is much that scares me (as pointed out by all) and makes me feel very uncomfortable still, it seems to me like this is actually the "simplest" proposal (vs. complex, i.e. has the less things complected). And I still can't make up my mind for or against it... I'll be rereading the latest GEPs proposal in the meanwhile... 😞

This was referenced Feb 27, 2024
@guicassolato
Copy link
Contributor Author

I was about to replace DESIGN OPTION 3 (when conditions based on CEL expressions) with the alternative described by @alexsnaps, based on full power of CEL expressions not only to establish conditions but to determine the value to set as a default or as an override as well (mentioned at yesterday’s community call.) But, instead, I will add that as yet another alternative design, thus adding to 6 different options.

Because I’ve changed this a couple times already, including reordering the options and changing their assigned numbers, I will now abandon the numbers altogether and refer by the options unique name.

@guicassolato
Copy link
Contributor Author

guicassolato commented Mar 6, 2024

I still have a few TODOs on this:

  • Add note about targeting a portion of an object (with and without named route rules, with and without targetRef.sectionName)
    • With named route rules and with targetRef.sectionName: remove routeSelectors, enable multiple policies targeting same object as long as at different sections
    • With named route rules and without targetRef.sectionName: refactor routeSelectors to use named route rules
    • Without named route rule: do nothing
    • Add a note about changing maps of policy rules back to lists (and the importance of this for performing merges)
  • Add a note about matching the policy spec closer to the route spec
  • Add a note about allowing N:1 policy-target relationship (for App Devs setting D/O between them, for multiple Platform Engineers setting different policy rules on the same Gateway)
  • Rename strategy, atomic and merge keywords
  • Rename deactivations' remove keyword (as unset?)
  • Add a note about changing a default versus unsetting a default (i.e. users may want to set defaults that cannot be removed)
  • Provide examples as text (if just as nice as now)
  • Add a flow diagram of the implementation logic
  • Postpone deactivations to Tier 3
  • Add implementation details on status and effective policy

@maleck13
Copy link
Collaborator

maleck13 commented Mar 8, 2024

@guicassolato I think proceeding with the strategy based approach initially makes sense. Do you need anything further here? Not sure I should approve at this point or wait for an update?

@guicassolato
Copy link
Contributor Author

@guicassolato I think proceeding with the strategy based approach initially makes sense. Do you need anything further here? Not sure I should approve at this point or wait for an update?

@maleck13, I'm still pending on a few changes here. Just pushed one 🙂

Early next week, I'll finish those and mark the PR for review. No need to approve in the meantime, though your comments are still welcome ofc.

@guicassolato guicassolato force-pushed the rfc/do branch 3 times, most recently from b15bbcf to d3433ce Compare March 8, 2024 15:14
Prior art: Only Kuadrant itself and orientative examples provided by Gateway API (spec and gwctl)

Unresolved questions:
- Policy spec resembling more the target spec

Future possibilities:
- N:1 policy-target relationship
- Route rule `name` and `targetRef.sectionName`
- Use listMapType instead of maps of policy rules
@guicassolato guicassolato marked this pull request as ready for review March 11, 2024 18:14
Comment on lines 74 to 85
- the bare policy rules block without further qualification as a default or override set of rules – e.g. the `rules` field in a Kuadrant AuthPolicy, the `limits` field in a RateLimitPolicy.

Typically, one will specify either `defaults` and/or `overrides`, or just the bare set of policy rules block without further qualification as neither defaults nor overrides.

In case two or more of these fields are specified, they are processed in the following order:
1. `defaults`
2. bare set of policy rules without qualification (treated indistinctively as another block of defaults)
3. `overrides`

Supporting specifying the bare set of policy rules at the first level of the spec, as well and simultaneously along with `defaults` and `overrides` blocks, is a strategy that aims to provide:
1. more natural usability, especially for those who write policies attached to the lowest level of the hierarchy supported; as well as
2. backward compatibility for policies that did not support explicit D/O and later on have moved to doing so.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking about this and while playing around with the potential implementation, if we are keeping the bare set of policy rules, it might make sense to have validation to prevent the usage of both the bare set of policy rules and defaults at the same time (i.e., mutually exclusive) 🤔

Otherwise, I feel like this adds another layer of merging that the implementation would need to decide upon depending on the strategy used for the effective policy 🤔

I'm somewhat undecided on whether it should be just bare rules or defaults, or keeping both. I feel default seems unnecessary if it acts as an intrinsic default anyway. Or are we using the keyword default as an indicator that the policy is an Inherited policy? One argument from the other side could be that having both fields can be seen as confusing also, a user can interpret that the bare rules will be unaffected from Inheritrance 🤷 Anyway, just some thoughts 🤔

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might make sense to have validation to prevent the usage of both the bare set of policy rules and defaults at the same time (i.e., mutually exclusive)

I like that. It simplifies processing the policy – one less "layer of merging" as you said – and it gives users some guidance for them to also write policies that are easier to read and to reason about.

One who first created a policy declaring a bare set of rules (i.e. implicit defaults) and later wants to add an overrides block to it can easily add the defaults keyword on top of the existing rules, fix the indentation, and add the overrides part. Not a big deal. I personally prefer that than having complicated policies that mix two ways to express the same thing.

Or are we using the keyword default as an indicator that the policy is an Inherited policy?

No, we are not. Our inherited policies are inherited policies regardless of whether the rules are explicitly grouped within D/O blocks or not. In fact, we'll label the policy CRDs as inherited so there's no doubt about it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated. Please, let me know if it looks better now. Thanks!

Comment on lines 334 to 337
| Enforced | True | "Enforced" | "Policy has been successfuly enforced" |
| | True | "EnforcedWithAdditions" | "Policy has been successfuly enforced. The following defaults have been added by <policy-ns/policy-name>: x, y" |
| | True | "PartiallyEnforced" | "Policy has been successfuly enforced. The following rules have been overridden by <policy-ns/policy-name>: a, b" |
| | True | "PartiallyEnforcedWithAdditions" | "Policy has been successfuly enforced. The following rules have been overridden by <policy-ns/policy-name>: a, b; the following defaults have been added by <policy-ns/policy-name>: x, y" |
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about adding these 2 extra reasons to cover the inheritance ("addition"), at the lower level, of policy rules defined at the upper levels.

An alternative, to not deviate from RFC 0004, could be working with only "Enforced" and "PartiallyEnforced" and add the details of the additions to the message. I.e.:

Type Status Reason Message
Enforced True "Enforced" "Policy has been successfuly enforced[. The following defaults have been added by <policy-ns/policy-name>: x, y]"
True "PartiallyEnforced" "Policy has been successfuly enforced. The following rules have been overridden by <policy-ns/policy-name>: a, b[; the following defaults have been added by <policy-ns/policy-name>: x, y]"
False "Overridden" "Policy has been overridden by <policy-ns/policy-name>"
False "Unknown" "Policy has encountered some issues"

cc @didierofrivia

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might be a bit biased by the Status RFC, but I can see it's a bit simpler having just "Enforced" and "PartiallyEnforced" as you mention in this comment and effectively elaborate in the message.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm also leaning towards using the current "Enforced" and "PartitallyEnforced" statuses with detailed messages to keep it a bit simpler

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Thanks for the feedback!

… the message details instead of having dedicated conditions for it
@guicassolato guicassolato added the status/FCP Final Comment Period label Mar 27, 2024
Copy link
Contributor

@KevFan KevFan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me 👍

@alexsnaps
Copy link
Member

Looks like this maybe a policy attachment recurring thing… This is huge.
Key part to me tho is the tiered approach with which I think I almost completely agree with. Certainly the first one as it effectively "merely has us play by the current state of the policy attachment GEP", adding the defaults & overrides stanza and the labels & status…

Then tho… I'm not sure I want to commit to the plan from there on. I certainly have more questions (adding them to the related parts), but also considering the spec could still change, I'd propose to split this in 3 RFCs, one for each tier of the implementation. Where on the first I'd imagine everyone being ok with…

| As a Platform Engineer, when configuring a Gateway, I want to set policy rules (parts of a policy) for all routes linked to my Gateway, that cannot be individually replaced by more specific ones(*), but only expanded with additional more specific rules(\*). | OR | **gateway-override-policy-rule** |
| As an Application Developer, when managing an application, I want to set a policy for my application, that fully replaces any default policy that may exist for the application at the level of the Gateway, without having to know about the existence of the default policy. | D | **route-replace-policy** |
| As an Application Developer, when managing an application, I want to expand a default set of policy rules set for my application at the level of the gateway, without having to refer to those existing rules by name. | D/O | **route-add-policy-rule** |
| As an Application Developer, when managing an application, I want to unset an individual default rule set for my application at the level of the gateway. | U | **route-unset-policy-rule** |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually challenge that somehow would either want to do that, but most importantly should be able to do that. I guess "unset" something is the equivalent of either setting it's value to null or removing the entry from a Map like type (name magic field in a list's item that is effectively a unique identifier), is that right? Looking at it that way it seems more like a data modeling concern to me that is being addressed by adding a concept to the higher level API. Can we see if we can avoid introducing that feature by "just" modelling things accordingly on a per use-case basis? I think this is meant for Tier 3 in the implementation tho…

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to reason for this use case in more details in the subsection entitled "Unsetting inherited defaults."

Using gateways and routes as examples, the summary is that a default set at the level of a gateway is inherited by all routes attached to the gateway. If there is enough reason that justifies keeping such default (e.g. too many routes to otherwise manage individually) and yet for a single route one of those defaults should not apply, there must be a way to express this exception.

One way of doing that could be by setting the rule to null at the lower level, except that I don't believe this design would scale well. For specific cases, some other meaning may have been assigned to null already. This is true at least for the authentication rules in an AuthPolicy, where null means anonymous access. Moreover, using null would also be challenging if we ever want to extend the merge strategy to other levels of the policy spec, requiring many fields to be retyped as optional.

Without explicitly supporting unsetting defaults at the lower levels, the only alternatives I can see are:

  1. Using conditions at the upper level to incorporate the exceptions (e.g. "for all routes except route X".) → I don't like this option first because it goes against the pattern of inheritance, allowing details from subclasses to leak upwards into the superclasses. Besides, with enough exceptions to cover, it can get very messy.
  2. Dropping the defaults altogether in favour of only having specific policies set at the lower levels → I see tons of duplication, plus it defeats the purpose of attaching policies at multiple levels of the hierarchy for setting defaults in the first place.

IMO, the unset field is the lesser of all evils.

  • It allows covering the exception cases that will arise due to defaults sometimes overshooting
  • It's constrict within the inheritance pattern
  • It does not require significant changes in the API or semantics of existing fields
  • It scales well for other possible levels of merging between specs
  • It can be opt-out and/or put under RBAC

Comment on lines +487 to +488
- D/O `when` conditions (and support for "constraints")
- Merge strategy
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't where to best put this comment other than here, but are the two "features" usable in combination?
Would the strategy be conditional on "when"? Also, can I have multiple "when" clauses (if only thru multiple policies), mapping to a "switch case" of some kind? Or would we prohibit this somehow?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The entire block (of defaults or overrides) would be conditional on when. I.e. if the condition evaluates to true, then apply the rules specified in the block according to the strategy specified (atomic or merge.)

TBH, I hadn't thought about multiple cases of conditions (if..else or switch-like) before. But I can see how the current proposal can be limiting for constraints. For example, how to set a constraint for a rate limit between 50 and 100? You'd need 2 blocks of overrides, though only 1 is allowed per policy; therefore you'd need 2 policies targeting the same object. Any other idea?

Copy link
Member

@alexsnaps alexsnaps left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm approving this, so we can move forward with it.
I don't know how changes to phase 2 and/or 3 would be reflected moving forward... but I also understand how this effort needs to move forward now. Maybe this RFC was too big... maybe we need a better way to deal with things in flux than an "unmerged" PR... maybe we need revisions... push these to the website... I don't know. And this isn't the place to resolve that. But this shows the documented RFC process might be unsuited for this team and we should review it.

@guicassolato guicassolato merged commit d179699 into main Apr 10, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
RFC Request For Comments status/FCP Final Comment Period
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

7 participants