Defaults & Overrides (RFC 0009) #58

guicassolato · 2024-02-15T19:39:51Z

RFC: Defaults & Overrides (Epic issue: Kuadrant/kuadrant-operator#431)

Proposes the extension of the Kuadrant Policy APIs so we support use cases of Defaults & Overrides (D/O) for Inherited Policies, including the following base use cases:

Defaults: policies set at a lower level in the hierarchy supersede ones set (as "defaults") at a higher level, or "more specific beats less specific"
Overrides: policies set at a higher level in the hierarchy supersede ones set at the lower levels, or "less specific beats more specific"

As well as the following derivative cases:

Merged defaults: default policy rules that are merged into the more specific policies (as opposed to an atomic less specific set of rules that is activated only when another more specific one is absent)
Merged overrides: override policy rules that are merged into the more specific policies (as opposed to an atomic less specific set of rules that is activated fully replacing another more specific one that is present)
Constraints: specialization of an override that rather than declaring concrete values, specify constraints for values – typically numeric values and regular patterns (e.g. limited sets) – declared at the lower levels, that is used to "clip" the requested specific values within the boundaries dictated by the constraint, in an override fashion – e.g.: min value, max value, in operator.
Deactivation: specialization that completes a merge default use case by allowing lower level policies to disable ("deactivate") individual defaults set a higher level (as opposed to superseding those defaults with actual more specific policy rules with meaning)

Out of scope:

Requirements: high level policies that declare requirements to be fulfilled by more specific (lower level) policies without specifying concrete default or override values or constraints. E.g.: "an authentication policy must be enforced, but none is provided by default."

Affected APIs:

AuthPolicy
RateLimitPolicy

Non-affected APIs, while these are considered Direct Policies, i.e. with no hierarchical effect:

DNSPolicy
TLSPolicy

thomasmaas · 2024-02-16T09:21:10Z

Couple of questions:

GEP Examples use singular default and override. The examples in this pr use plural. On purpose?
GEP Examples seem to always use override and default keywords while in the examples in this PR all policies targeting a http route don't use any keyword (and are thus direct policies?). On purpose?

guicassolato · 2024-02-16T10:08:28Z

Couple of questions:

1. [GEP Examples](https://gateway-api.sigs.k8s.io/geps/gep-713/#inherited-policy-attachment-its-all-about-the-defaults-and-overrides) use singular `default` and `override`. The examples in this pr use plural. On purpose?

2. GEP Examples seem to always use `override` and `default` keywords while in the examples in this PR all policies targeting a http route don't use any keyword (and are thus direct policies?). On purpose?

No, it was not intentional, but because I've actually seen the GEP using both ways. Maybe not in the YAML examples, but throughout the text. For example, where it says:

Settings in overrides stanzas will win over the same setting in a defaults stanza.

Either way, thanks for pointing out. Happy yo change to singular form.

guicassolato · 2024-02-16T10:11:04Z

Even in the examples actually. Try searching for "overrides:" @thomasmaas. You'll find plural form, for example, in this section of the GEP: https://gateway-api.sigs.k8s.io/geps/gep-713/#inherited-policy-attachment

maleck13 · 2024-02-16T14:29:54Z

rfcs/0009-defaults-and-overrides-assets/example-f1.png

This one I found odd. I wonder is the remove really needed? Finding it hard to reconcile

that said I am also struggling to put my finger on why this in particular jumped out at me. I think it is the "look back" that gives me pause.

maleck13 · 2024-02-16T14:50:00Z

rfcs/0009-defaults-and-overrides.md

+| As a Platform Engineer, when configuring a Gateway, I want to set policy rules (parts of a policy) for all routes linked to my Gateway, that cannot be individually replaced by more specific ones(*), but only expanded with additional more specific rules(\*).                                   | OR    | **gateway-override-policy-rule** |
+| As an Application Developer, when managing an application, I want to set a policy for my application, that fully replaces any default policy that may exist for the application at the level of the Gateway, without having to know about the existence of the default policy.                      | D     | **route-replace-policy**         |
+| As an Application Developer, when managing an application, I want to expand a default set of policy rules set for my application at the level of the gateway, without having to refer to those existing rules by name.                                                                              | D/O   | **route-add-policy-rule**        |
+| As an Application Developer, when managing an application, I want to deactivate an individual default rule set for my application at the level of the gateway.                                                                                                                                      | RR    | **route-disable-policy-rule**    |


to be clear I am not against this, I am just struggling a little to see the flow for the user. I imagine it will have to be three steps

create a policy without a remove in place

see my effective policy and figure out I need to remove something

add a remove clause to my policy and re-deploy

create a policy without a remove in place

see my effective policy and figure out I need to remove something

add a remove clause to my policy and re-deploy

In my mind defaults are meant to have a base line in place. If a platform engineer adds a default, the message to the app developer is; do something here but I've got your back if you don't. If the app developer can just come in and deactivate it, I guess the case can be made that the app developer did consider but rejected. But is that in the spirit of the spec?

This use case completes the gateway-default-policy-rule one.

Say a PE sets at Gateway level default rules A, B and C. Strategy is merge. Then an App Dev sets at Route level: A' and D, where A' replaces A. The effective policy will be: A', B, C, D.

Now, imagine the App Dev is OK with B, but does not want/need C. We need to support App Dev specifying A', D, -C; otherwise there is no way to have the effective policy without C.

could it be expressed via the property IE "C": {}

Just to provide actual examples of how kinky this can get...

In an AuthPolicy, there is already a way to "nullify" (skip) authentication. That way is by declaring an authentication rule with the anonymous method. In fact, by default, an AuthPolicy missing the authentication block or with the block set to an empty object is interpreted as a shortcut to anonymous access. E.g.:

authentication: "my-authn-rule": anonymous: {}

or

authentication: {}

Obviously, one cannot nullify a limit definition in a RateLimitPolicy in the same way. Current validation won't allow it.

Even in the AuthPolicy, there's no exact equivalent for rules in the other blocks (authorization, metadata, etc.) One can nullify the entire block of rules (by setting it to an empty object) because they are not required block, but the individual rules in the block, have each one its own way to effectively make the rule a no-op and sometimes no way at all do it.

An authorization rule can be "nullified" by modifying the method to Rego:

authorization: "my-authz-rule": rego: allow = true

A metadata rule cannot be nullified today.

This use case completes the gateway-default-policy-rule one.

Say a PE sets at Gateway level default rules A, B and C. Strategy is merge. Then an App Dev sets at Route level: A' and D, where A' replaces A. The effective policy will be: A', B, C, D.

Now, imagine the App Dev is OK with B, but does not want/need C. We need to support App Dev specifying A', D, -C; otherwise there is no way to have the effective policy without C.

Deactivation/Nullification sounds like a edge/special case that is an artefact of the merge strategy and also the granualarity problem. If the Gateway used the atomic strategy, this probably wouldn't be needed I'm guessing but the Route level would never end up with A', D (from route) and B (from gateway assumming they want this and not C) from inheritance 🤔

I feel like the uncertainty from this is while this is a valid use case, there is a bit of contridiction with the concept of defaults (i.e. this field should not be empty / ignored / have a value) and inheritance, only to later allow nullification of inherited fields/values 🤷 This might bring to question whether we view defaults as just "recommendation for this field to have this value" set by the P.E. and overrides as "this field must be this value". Nullification feels like an override (to ignore field) in this case 🤔 Anyway, just a few thoughts

This might bring to question whether we view defaults as just "recommendation for this field to have this value" set by the P.E. and overrides as "this field must be this value".

I see as more than recommendation because a default also sets the value when a more specific one is not defined. So, I little bit more than "can you please accept this value?" Yet, not to be confused with setting a requirement though, which is another thing. But overall, yes, a default implies a choice (which an override doesn't.)

One who sets a default gives the option for that value to be modified. What are the limits of that modification? Could somebody modify the value to the point that it becomes a no-op? There's nothing in the semantics of a default that says otherwise IMO.

Therefore, I see no contradiction with what a default means, because I don't think it means the field should not be empty. It only means that a value was provided and that it can be changed; until it's changed, the value is effective.

I'm not denying the use case for setting a value that cannot be removed. I'm just saying that IMO that is not what a default is.

A value that cannot be removed can actually be one of 2 different cases:

the case where the value not only cannot be removed, but cannot be changed either – this is basically the definition of an override;

the case of a field that must be set, though without specifying a concrete default value with it – I've called this use case "requirement" before, which was removed from the scope of D/O.

I can see PEs defining requirements AND defaults. Nullifying/deactivating a rule may not violate a default, but still violate a requirement.

I have no doubt deactivation is a valid use case – as well as requirements may be.

Thinking one may really need to deactivate a default inherited from a higher level is a fair assumption. By definition, what's set at the higher level is more generic. These rules (say, at the Gateway) will be inherited by all special cases beneath (say, all routes.) This doesn't leave much room for exception.

But exceptions do exist. Without a way to remove the inherited default, the owner of the lower policy would have to request to the owner of the higher policy for a special condition to be added at the higher policy, so the default wouldn't take effect only in that particular case.

I see this as leaking details that belong to the specialisation level upwards into the generalisation layer. This should not happen.

Another point in favour of supporting deactivation is that configuring a system in a way that no user (or only specific users) can use this feature is easy. The way I imagine it, it won't require any additional code. Whereas the contrary – i.e. not supporting deactivations in our API and yet finding a way for users to be able unset a default – is not easy and perhaps not even possible.

Deactivation/Nullification sounds like a edge/special case that is an artefact of the merge strategy and also the granualarity problem

Correct. For the atomic strategy, no rule from a higher policy would be merged if a lower policy is defined.

As for the granularity, if we end up supporting levels of granularity lower that the level of named policy rules, then users might ask for deactivation/nullification of those yet lower values as well. Luckily, the design proposed for deactivation already accounts for that.

I still have a couple TODOs in the RFC regarding deactivation, including bringing these POVs and concerns to the text. I also would like to rename to unset. WDYT?

Very good points Gui, I agree with them. This sounds good to me. I'm happy to have this in and renaming it to unset 👍

Added a subsection explaining better the concept of "unsetting defaults" and implementation pushed to Tier 3.

youngnick · 2024-02-19T03:20:30Z

Reading over this, and one relevant change that I'm working on in splitting out Direct Attached Policy and Inherited Policy into their own GEPs is this:

I'm changing the definition of Direct and Inherited Policies a little bit.

Direct Policies now target only one object and only affect that object. Before there was some leeway where you could have specified a policy without defaults or overrides at, for example, the Gateway level, and had that affect Routes, but it still basically counted as a Direct Policy.

Now, Inherited Policies are any policy that has effects outside of the resource that it's attached to, regardless of whether there are defaults or overrides stanzas present.

I don't think that anything I'm seeing in this doc contradicts that, but just so you know.

I also really like the use of the word "constraints" for the class of settings that don't directly default or override, but place bounds around possible values. If you don't mind, I'm going to steal that and put it in the Inherited Policy GEP once it's split out.

youngnick · 2024-02-19T03:23:48Z

rfcs/0009-defaults-and-overrides.md

+kind: AuthPolicy
+metadata:
+  name: gw-policy
+spec:
+  targetRef:
+    kind: Gateway
+  defaults:
+    rules:
+      authentication:
+        "a": {…}
+      authorization:
+        "b": {…}
+    strategy: atomic


I'd recommend moving to using lists of named subobjects instead of maps for the rules sections:

kind: AuthPolicy metadata: name: gw-policy spec: targetRef: kind: Gateway defaults: rules: authentication: - name: "a" {...} authorization: - name: "b" {...} strategy: atomic

This is straight from the Kubernetes API conventions, https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/api-conventions.md#lists-of-named-subo[…]preferred-over-maps, and as it says "maintains the invariant that all JSON/YAML keys are fields in API objects"

kubernetes/kubernetes#2004 goes into some more detail of why this is probably better.

We actually walked the opposite direction, replacing lists with maps not too long ago. It was our (possibly naive?) attempt to prepare for supporting merges between policies that would acknowledge each of those named policy rules individually.

We knew that, with lists, a merge would have to pick either one entire list of rules or the other, but never mix and combine two lists. Something about that was recommended in GEP-713, I recon.

To support our use cases, we found no way but to treat those user-defined values (names of rules within a policy) as keys, not as value. Using maps felt then the lesser of two evils perhaps.

Any suggestion that could help us solve the issue? Could we still merge lists picking elements of both sides?

Thanks for the references!

The trick is to treat the list as what the apiserver calls a listMapType: although the objects are stored as a list, all code that handles them treats them as a map with the name as the key. (This is actually how Conditions and Listeners work already).

Then the apiserver can use the Strategic merge type and handle a lot of this for you.

In the CRD definitions, this is done with:

// +listType=map // +listMapKey=name

on the containing field (the []Type field).

Doing things this way also gives a big advantage in that the list is ordered, so you can assume that the listed order is important if you want. A map is explicitly randomly ordered, so you'll need to do something else to sort the keys.

To be clear, this is a case of all the solutions having bad points. But personally, I think that the listType=map has the least onerous tradeoffs.

Added as a future possibility as it's not a blocker.

youngnick · 2024-02-19T04:29:49Z

rfcs/0009-defaults-and-overrides.md

+
+All the examples in the RFC are based on API DESIGN 1.
+
+## API DESIGN 1 - `strategy` field - _RECOMMENDED_


This is an interesting design - I'm curious to see how it works in practice.

It feels to me like there are a lot of moving parts here, and it may be difficult for users, particularly new users, to understand how these interactions work.

In Gateway API itself, I've recommended we stick with something more approaching the merge strategy, and try not to use structs that can't be compared on a per-field basis, to try to keep the interaction complexity as low as possible. I explicitly did not include a switch to change the mode (as the strategy field does) because I'm worried about combinatorial interaction complexity.

To paraphrase Spider-man, with great complexity comes great potential for confusion.

I suspect that it will be easy to build a system of Policies that require a UI or similar to understand the resultant policy, which doesn't seem ideal. But conversely, calculating the resultant policy and putting it on a status somewhere creates a large fanout update problem.

So yes, I'm a little worried about how complex this will end up, but not enough to tell you to stop. :)

Hi @youngnick. This is good feedback. Thank you so much!

Indeed, managing the complexity implied by these proposed designs poses a challenge. Even with this option, which I consider to be the safest among the 5.

I suppose communication between users is key, and it has to happen one way or another. One who declares an intent will want to know if that request got affected by something in the middle. This works both ways, for one who sets a policy, say, at the Gateway level, as well as for one who does it at the level of the route.

I guess our bet here is that we will be able to handle that communication inside the cluster, via status stanzas of the resources, provided that the proposed API design is still fairly conservative in the sense that the level of atomicity of the declarations within a policy has been kept relatively high – i.e. either the entire policy or the level of what we call "named policy rules," which is just one level down. (We have no plans to support merges lower than that for now.)

With the named policy rules, we can report statuses such as, for example:

Enforced Reason Message

true

true PartiallyEnforced The following policy rules were overridden by gw-ns/gw-policy: a, b

true EnforcedWithAdditions The following default policy rules were added by gw-ns/gw-policy: c, d

true PartiallyEnforcedWithAdditions The following policy rules were overridden by gw-ns/gw-policy: a, b; the following default policy rules were added by gw-ns/gw-policy: c, d

false Overridden Policy fully overridden by gw-ns/gw-policy

We could consider something even more conservative and, say, support completely atomic defaults or overrides declarations only (i.e. no merges at all!) In this case, however, one who wants to "cherry pick" individual policy rules to override, at the lower level or the higher, but not necessarily an entire policy, would have to learn about those rules (offline?) and, I guess, duplicate in their policies the rules to keep? This seems to work OK-ish for replacing higher defaults at a lower level, but can you imagine the contrary, i.e. overriding, at the higher level, only parts of lower level policies? Keeping consistency between those duplicated rules would be already a problem, but, on top of that, I see also potential leakage of lower (more specific) definitions upwards (into the less specific level,) while users try to workaround the API to achieve that combinatorial interaction that we otherwise refused to incorporate by design.

IOW, people will do merges one way or another. I personally think it's better if we try to control them (to an extent,) and provide users with some framework to follow. (A decision I could easily regret – acknowledging your shared Spider-man wisdom 🙂)

I promise to give this a few more rounds before committing to anything. Your concerns are very well justified.

IOW, people will do merges one way or another. I personally think it's better if we try to control them (to an extent,) and provide users with some framework to follow.

I agree, this is a good, pragmatic point.

I think that as long as the design thinks carefully through all the discoverability concerns (as in https://gateway-api.sigs.k8s.io/geps/gep-713/#user-discoverability ), then it's worth trying things and seeing what works.

alexsnaps · 2024-02-20T14:54:11Z

With the proposal of adding support for CEL expressions in when clauses, and if we really open policies to CEL in general, I'm wondering if we couldn't address much of this proposal (strategy, constraints, ...) with supporting any node in the policy tree to be either T or fn(T): T, where fn would be a CEL expression(s).

In order to achieve that we'd need a way to "escape" into such a CEL expression... I'm unsure if this has been done before or if there are well established patterns to do so, but I'll just use the cel: prefix to a string to identify such "escaping" in the quick examples below. An atomic merge strategy would be the default... in order to merge, e.g. add a key/value to a map, one would escape to cel and "just" add the entry to the map: map_typed_key: cel:value["a"] = my_added_value_for_a... We'd define the binding's name for the node upfront (in this example value), but could be anything. And this example my_added_value_for_a would need to be properly constructed to be a valid entry value for this
map. We could force a return to the function, or "just" reuse the binding passed in (in this case, value) to reassign to the policy's tree node.

Now that entry could be conditionally added, e.g. only if it is missing. A CEL expression could also validate that all entries in a map e.g. for rate limits are within certain bounds and clip/clamp it accordingly... effectively acting as constraints on the policy.

While I think this simplifies, I also fear it might open the door to too many complex cases, given the power CEL would add at any given node of the tree that expresses a Policy. Arguably it would only be evaluated once, when a policy is created/modified. Also, we could absolutely restrict what a CEL expression can and cannot do initially and only slowly expand the usage further, with new releases as the need/usecases emerge.

tl;dr I'm torn... this feels like a simple enough idea to cover all of the usecases one can think of, but also feels like this is too much power. But I couldn't talk myself out of it for the last week thinking about it. Anything I miss that makes this a real bad idea?

addendum: I think this would also "solve" (push on the user) a way to merge lists...

guicassolato · 2024-02-21T12:51:45Z

I'm wondering if we couldn't address much of this proposal (strategy, constraints, ...) with supporting any node in the policy tree to be either T or fn(T): T, where fn would be a CEL expression(s).

Thanks, @alexsnaps. Now that I understand the idea better, I'll try to capture it among the options by updating DESIGN OPTION 3.

The pros you've highlighted very well. I have nothing to add.

I agree with you and also believe we can workaround the con of possibly causing "unmergeable objects" to merge – not only for lists but for maps too! – by pushing on the users. They would be the ones defining the final output of the functions.

Imagining the maps of policy rules re-implemented as lists, an "atomic" set of default authentication rules could look like this:

spec:
  defaults:
    rules:
      authentication:
      - name: x
        value: 1
      - name: y
        value: 2

This would be a case of a value declared as T, not as fn(T): T yet.

I still have one problem with it though. What is the key and what is the value? What if we the policy declared an authorization block as well? Would authentication and authorization be 2 keys or would rules be the one and only key (and authentication and authorization part of the value?)

Maybe assuming the values are always the scalars and the lists, and that the strategy is always a merge...

An equivalent example to the above, but with dynamic value, i.e. using fn(T): T instead of T:

spec:
  defaults:
    rules:
      authentication: |
        cel:[Authentication{name: "x", value: 1},Authentication{name: "y", value: 2}]

Of course, this second example shows the need for having a lib that defines a protobuf message Authentication.

A default that adds an authentication rule if it's not defined in the target would probably have to be declared as an override, I reckon:

spec:
  overrides:
    rules:
      authentication: |
        cel:self.exists_one(a, a.name == 'x') ? self : self + [Authentication{name: "x", value: 3}]

I still have my doubts about the potential "implementation nightmare." We would have to redefine tons of API types, cannot simply reuse current ones (from the APIs without D/O), plus protobuf messages for all of those, etc.

To reduce the burden, though at the expense of loosing things up a bit, we could define CEL function extensions that deal with the dynamic types. Found a few of those in https://tekton.dev/docs/triggers/cel_expressions/ (e.g. parseJSON(), parseYAML.)

My last thought here regards the cognitive complexity for one who reads out a policy to mentally work out all the function outputs. I think this is a point you've raised offline as well, @alexsnaps, but please correct me if I'm wrong.

youngnick · 2024-02-26T02:54:35Z

I am strongly -1 on any Kubernetes object using an inline function as a value. Sure, this is super flexible, but, as mentioned, both the cognitive overhead for human creators and readers of the policy, and implementation overhead for whoever's implementing the policy, seem terrifyingly complex to me.

To put this another way, I don't think that adding the ability to have what may end up as Turing-complete fields (if they aren't already, I haven't looked at the CEL spec for a while) is going to go well.

alexsnaps · 2024-02-26T23:32:05Z

My last thought here regards the cognitive complexity for one who reads out a policy to mentally work out all the function outputs. I think this is a point you've raised offline as well, @alexsnaps, but please correct me if I'm wrong.

I am strongly -1 on any Kubernetes object using an inline function as a value. Sure, this is super flexible, but, as mentioned, both the cognitive overhead for human creators and readers of the policy, and implementation overhead for whoever's implementing the policy, seem terrifyingly complex to me.

Yes, that was my point @guicassolato . I think the cognitive overhead might be "a little much" indeed, to say the least. If we have a tool (e.g. gwctl) that could give a user the effective policy on any object would almost be a necessity, and yet probably not even be enough to then have some reason about that how that actually came together by "just reading" the policies themselves with CEL interleaved in them.

From an implementer point of view, i.e. literally mine, I actually think this is a very elegant and actually pretty simple (not easy necessarily tho) to go about it. But don't get me wrong, I'm NOT using this as an argument to actually make it happen. In my experience, this thinking has led to more damage than good in my perspective. The one interesting aspect tho, is how much would it take away if we do not do it? And further more is it reversible (in Kuadrant obviously, not talking about any GEP here)?

To put this another way, I don't think that adding the ability to have what may end up as Turing-complete fields (if they aren't already, I haven't looked at the CEL spec for a while) is going to go well.

That, we could control. Just as validation limits the complexity of CEL expressions. Not that I believe (again), that this justifies the feature... but leaves it up for consideration.

tl;dr while I think there is much that scares me (as pointed out by all) and makes me feel very uncomfortable still, it seems to me like this is actually the "simplest" proposal (vs. complex, i.e. has the less things complected). And I still can't make up my mind for or against it... I'll be rereading the latest GEPs proposal in the meanwhile... 😞

guicassolato · 2024-02-28T12:32:30Z

I was about to replace DESIGN OPTION 3 (when conditions based on CEL expressions) with the alternative described by @alexsnaps, based on full power of CEL expressions not only to establish conditions but to determine the value to set as a default or as an override as well (mentioned at yesterday’s community call.) But, instead, I will add that as yet another alternative design, thus adding to 6 different options.

Because I’ve changed this a couple times already, including reordering the options and changing their assigned numbers, I will now abandon the numbers altogether and refer by the options unique name.

guicassolato · 2024-03-06T10:27:51Z

maleck13 · 2024-03-08T14:19:09Z

@guicassolato I think proceeding with the strategy based approach initially makes sense. Do you need anything further here? Not sure I should approve at this point or wait for an update?

guicassolato · 2024-03-08T14:35:17Z

@guicassolato I think proceeding with the strategy based approach initially makes sense. Do you need anything further here? Not sure I should approve at this point or wait for an update?

@maleck13, I'm still pending on a few changes here. Just pushed one 🙂

Early next week, I'll finish those and mark the PR for review. No need to approve in the meantime, though your comments are still welcome ofc.

…xplanation for the feature

Prior art: Only Kuadrant itself and orientative examples provided by Gateway API (spec and gwctl) Unresolved questions: - Policy spec resembling more the target spec Future possibilities: - N:1 policy-target relationship - Route rule `name` and `targetRef.sectionName` - Use listMapType instead of maps of policy rules

KevFan · 2024-03-12T12:59:36Z

rfcs/0009-defaults-and-overrides.md

+- the bare policy rules block without further qualification as a default or override set of rules – e.g. the `rules` field in a Kuadrant AuthPolicy, the `limits` field in a RateLimitPolicy.
+
+Typically, one will specify either `defaults` and/or `overrides`, or just the bare set of policy rules block without further qualification as neither defaults nor overrides.
+
+In case two or more of these fields are specified, they are processed in the following order:
+1. `defaults`
+2. bare set of policy rules without qualification (treated indistinctively as another block of defaults)
+3. `overrides`
+
+Supporting specifying the bare set of policy rules at the first level of the spec, as well and simultaneously along with `defaults` and `overrides` blocks, is a strategy that aims to provide:
+1. more natural usability, especially for those who write policies attached to the lowest level of the hierarchy supported; as well as
+2. backward compatibility for policies that did not support explicit D/O and later on have moved to doing so.


Thinking about this and while playing around with the potential implementation, if we are keeping the bare set of policy rules, it might make sense to have validation to prevent the usage of both the bare set of policy rules and defaults at the same time (i.e., mutually exclusive) 🤔

Otherwise, I feel like this adds another layer of merging that the implementation would need to decide upon depending on the strategy used for the effective policy 🤔

I'm somewhat undecided on whether it should be just bare rules or defaults, or keeping both. I feel default seems unnecessary if it acts as an intrinsic default anyway. Or are we using the keyword default as an indicator that the policy is an Inherited policy? One argument from the other side could be that having both fields can be seen as confusing also, a user can interpret that the bare rules will be unaffected from Inheritrance 🤷 Anyway, just some thoughts 🤔

It might make sense to have validation to prevent the usage of both the bare set of policy rules and defaults at the same time (i.e., mutually exclusive)

I like that. It simplifies processing the policy – one less "layer of merging" as you said – and it gives users some guidance for them to also write policies that are easier to read and to reason about.

One who first created a policy declaring a bare set of rules (i.e. implicit defaults) and later wants to add an overrides block to it can easily add the defaults keyword on top of the existing rules, fix the indentation, and add the overrides part. Not a big deal. I personally prefer that than having complicated policies that mix two ways to express the same thing.

Or are we using the keyword default as an indicator that the policy is an Inherited policy?

No, we are not. Our inherited policies are inherited policies regardless of whether the rules are explicitly grouped within D/O blocks or not. In fact, we'll label the policy CRDs as inherited so there's no doubt about it.

Updated. Please, let me know if it looks better now. Thanks!

Make it so, in a policy, it’s either the D/O blocks or the bare set of policy rules, but not both at the same time

guicassolato · 2024-03-13T09:39:32Z

rfcs/0009-defaults-and-overrides.md

+| Enforced | True   | "Enforced"                       | "Policy has been successfuly enforced"                                                                                                                                                                                  |
+|          | True   | "EnforcedWithAdditions"          | "Policy has been successfuly enforced. The following defaults have been added by &lt;policy-ns/policy-name&gt;: x, y"                                                                                                   |
+|          | True   | "PartiallyEnforced"              | "Policy has been successfuly enforced. The following rules have been overridden by &lt;policy-ns/policy-name&gt;: a, b"                                                                                                 |
+|          | True   | "PartiallyEnforcedWithAdditions" | "Policy has been successfuly enforced. The following rules have been overridden by &lt;policy-ns/policy-name&gt;: a, b; the following defaults have been added by &lt;policy-ns/policy-name&gt;: x, y"                  |


I thought about adding these 2 extra reasons to cover the inheritance ("addition"), at the lower level, of policy rules defined at the upper levels.

An alternative, to not deviate from RFC 0004, could be working with only "Enforced" and "PartiallyEnforced" and add the details of the additions to the message. I.e.:

Type Status Reason Message

Enforced True "Enforced" "Policy has been successfuly enforced[. The following defaults have been added by <policy-ns/policy-name>: x, y]"

True "PartiallyEnforced" "Policy has been successfuly enforced. The following rules have been overridden by <policy-ns/policy-name>: a, b[; the following defaults have been added by <policy-ns/policy-name>: x, y]"

False "Overridden" "Policy has been overridden by <policy-ns/policy-name>"

False "Unknown" "Policy has encountered some issues"

cc @didierofrivia

I might be a bit biased by the Status RFC, but I can see it's a bit simpler having just "Enforced" and "PartiallyEnforced" as you mention in this comment and effectively elaborate in the message.

I'm also leaning towards using the current "Enforced" and "PartitallyEnforced" statuses with detailed messages to keep it a bit simpler

Done. Thanks for the feedback!

rfcs/0009-defaults-and-overrides.md

… the message details instead of having dedicated conditions for it

KevFan

Looks good to me 👍

alexsnaps · 2024-04-02T13:24:38Z

Looks like this maybe a policy attachment recurring thing… This is huge.
Key part to me tho is the tiered approach with which I think I almost completely agree with. Certainly the first one as it effectively "merely has us play by the current state of the policy attachment GEP", adding the defaults & overrides stanza and the labels & status…

Then tho… I'm not sure I want to commit to the plan from there on. I certainly have more questions (adding them to the related parts), but also considering the spec could still change, I'd propose to split this in 3 RFCs, one for each tier of the implementation. Where on the first I'd imagine everyone being ok with…

alexsnaps · 2024-04-02T13:29:51Z

rfcs/0009-defaults-and-overrides.md

+| As a Platform Engineer, when configuring a Gateway, I want to set policy rules (parts of a policy) for all routes linked to my Gateway, that cannot be individually replaced by more specific ones(*), but only expanded with additional more specific rules(\*).                                   | OR    | **gateway-override-policy-rule** |
+| As an Application Developer, when managing an application, I want to set a policy for my application, that fully replaces any default policy that may exist for the application at the level of the Gateway, without having to know about the existence of the default policy.                      | D     | **route-replace-policy**         |
+| As an Application Developer, when managing an application, I want to expand a default set of policy rules set for my application at the level of the gateway, without having to refer to those existing rules by name.                                                                              | D/O   | **route-add-policy-rule**        |
+| As an Application Developer, when managing an application, I want to unset an individual default rule set for my application at the level of the gateway.                                                                                                                                           | U     | **route-unset-policy-rule**      |


I actually challenge that somehow would either want to do that, but most importantly should be able to do that. I guess "unset" something is the equivalent of either setting it's value to null or removing the entry from a Map like type (name magic field in a list's item that is effectively a unique identifier), is that right? Looking at it that way it seems more like a data modeling concern to me that is being addressed by adding a concept to the higher level API. Can we see if we can avoid introducing that feature by "just" modelling things accordingly on a per use-case basis? I think this is meant for Tier 3 in the implementation tho…

I tried to reason for this use case in more details in the subsection entitled "Unsetting inherited defaults."

Using gateways and routes as examples, the summary is that a default set at the level of a gateway is inherited by all routes attached to the gateway. If there is enough reason that justifies keeping such default (e.g. too many routes to otherwise manage individually) and yet for a single route one of those defaults should not apply, there must be a way to express this exception.

One way of doing that could be by setting the rule to null at the lower level, except that I don't believe this design would scale well. For specific cases, some other meaning may have been assigned to null already. This is true at least for the authentication rules in an AuthPolicy, where null means anonymous access. Moreover, using null would also be challenging if we ever want to extend the merge strategy to other levels of the policy spec, requiring many fields to be retyped as optional.

Without explicitly supporting unsetting defaults at the lower levels, the only alternatives I can see are:

Using conditions at the upper level to incorporate the exceptions (e.g. "for all routes except route X".) → I don't like this option first because it goes against the pattern of inheritance, allowing details from subclasses to leak upwards into the superclasses. Besides, with enough exceptions to cover, it can get very messy.

Dropping the defaults altogether in favour of only having specific policies set at the lower levels → I see tons of duplication, plus it defeats the purpose of attaching policies at multiple levels of the hierarchy for setting defaults in the first place.

IMO, the unset field is the lesser of all evils.

It allows covering the exception cases that will arise due to defaults sometimes overshooting

It's constrict within the inheritance pattern

It does not require significant changes in the API or semantics of existing fields

It scales well for other possible levels of merging between specs

It can be opt-out and/or put under RBAC

alexsnaps · 2024-04-02T13:32:05Z

rfcs/0009-defaults-and-overrides.md

+- D/O `when` conditions (and support for "constraints")
+- Merge strategy


I don't where to best put this comment other than here, but are the two "features" usable in combination?
Would the strategy be conditional on "when"? Also, can I have multiple "when" clauses (if only thru multiple policies), mapping to a "switch case" of some kind? Or would we prohibit this somehow?

The entire block (of defaults or overrides) would be conditional on when. I.e. if the condition evaluates to true, then apply the rules specified in the block according to the strategy specified (atomic or merge.)

TBH, I hadn't thought about multiple cases of conditions (if..else or switch-like) before. But I can see how the current proposal can be limiting for constraints. For example, how to set a constraint for a rate limit between 50 and 100? You'd need 2 blocks of overrides, though only 1 is allowed per policy; therefore you'd need 2 policies targeting the same object. Any other idea?

alexsnaps

I'm approving this, so we can move forward with it.
I don't know how changes to phase 2 and/or 3 would be reflected moving forward... but I also understand how this effort needs to move forward now. Maybe this RFC was too big... maybe we need a better way to deal with things in flux than an "unmerged" PR... maybe we need revisions... push these to the website... I don't know. And this isn't the place to resolve that. But this shows the documented RFC process might be unsuited for this team and we should review it.

guicassolato force-pushed the rfc/do branch from b062143 to c21a42f Compare February 16, 2024 10:09

maleck13 reviewed Feb 16, 2024

View reviewed changes

guicassolato force-pushed the rfc/do branch 4 times, most recently from 5b8f38e to c46d4a2 Compare February 16, 2024 19:33

youngnick reviewed Feb 19, 2024

View reviewed changes

alexsnaps added kind/epic RFC Request For Comments labels Feb 19, 2024

guicassolato self-assigned this Feb 19, 2024

guicassolato force-pushed the rfc/do branch 2 times, most recently from 30be65b to 5fc23e1 Compare February 26, 2024 14:39

wip: rfc: Defaults & Overrides

9174999

guicassolato force-pushed the rfc/do branch from 5fc23e1 to 9174999 Compare February 26, 2024 14:39

guicassolato mentioned this pull request Feb 26, 2024

Defaults & Overrides Kuadrant/kuadrant-operator#431

Open

42 tasks

guicassolato removed the kind/epic label Feb 26, 2024

This was referenced Feb 27, 2024

Single merged policy #4

Closed

*No* merging of policies #10

Closed

fix typos and minor enhacements to definitions

f21ab49

Reference-level explanation - applying policies to a Gateway

91dc03e

Added support for deactivations in the implementation diagrams

d916a5b

guicassolato force-pushed the rfc/do branch 3 times, most recently from b15bbcf to d3433ce Compare March 8, 2024 15:14

Implementation tiers

bb3cf55

guicassolato force-pushed the rfc/do branch from d3433ce to bb3cf55 Compare March 8, 2024 15:14

guicassolato added 3 commits March 11, 2024 16:43

Renames 'deactivations' as 'unsetting defaults' and provides better e…

37389f4

…xplanation for the feature

D/O conditions & constraints

c398089

guicassolato marked this pull request as ready for review March 11, 2024 18:14

KevFan reviewed Mar 12, 2024

View reviewed changes

guicassolato added 4 commits March 12, 2024 15:24

Mutually exclusive D/O blocks or bare set of policy rules

39344f9

Make it so, in a policy, it’s either the D/O blocks or the bare set of policy rules, but not both at the same time

Policy status, policy discoverability and effective policy reporting

d9b61b9

fix: typo

dd4cac3

Merging policies with references to external objects

93c33fd

guicassolato commented Mar 13, 2024

View reviewed changes

martinhesko mentioned this pull request Mar 20, 2024

Defaults & Overrides Kuadrant/testsuite#369

Closed

2 tasks

KevFan reviewed Mar 20, 2024

View reviewed changes

rfcs/0009-defaults-and-overrides.md Outdated Show resolved Hide resolved

guicassolato added 2 commits March 26, 2024 16:51

Simplify set of status conditions - addition of defaults signalled in…

e34b431

… the message details instead of having dedicated conditions for it

Fix typos

06687e8

guicassolato added the status/FCP Final Comment Period label Mar 27, 2024

KevFan approved these changes Mar 28, 2024

View reviewed changes

alexsnaps reviewed Apr 2, 2024

View reviewed changes

alexsnaps approved these changes Apr 8, 2024

View reviewed changes

guicassolato merged commit d179699 into main Apr 10, 2024
1 check passed


		All the examples in the RFC are based on API DESIGN 1.

		## API DESIGN 1 - `strategy` field - _RECOMMENDED_

`Enforced`	`Reason`	`Message`
`true`
`true`	`PartiallyEnforced`	The following policy rules were overridden by gw-ns/gw-policy: a, b
`true`	`EnforcedWithAdditions`	The following default policy rules were added by gw-ns/gw-policy: c, d
`true`	`PartiallyEnforcedWithAdditions`	The following policy rules were overridden by gw-ns/gw-policy: a, b; the following default policy rules were added by gw-ns/gw-policy: c, d
`false`	`Overridden`	Policy fully overridden by gw-ns/gw-policy

Type	Status	Reason	Message
Enforced	True	"Enforced"	"Policy has been successfuly enforced[. The following defaults have been added by <policy-ns/policy-name>: x, y]"
	True	"PartiallyEnforced"	"Policy has been successfuly enforced. The following rules have been overridden by <policy-ns/policy-name>: a, b[; the following defaults have been added by <policy-ns/policy-name>: x, y]"
	False	"Overridden"	"Policy has been overridden by <policy-ns/policy-name>"
	False	"Unknown"	"Policy has encountered some issues"

		- D/O `when` conditions (and support for "constraints")
		- Merge strategy

Defaults & Overrides (RFC 0009) #58

Defaults & Overrides (RFC 0009) #58

Conversation

guicassolato commented Feb 15, 2024

thomasmaas commented Feb 16, 2024

guicassolato commented Feb 16, 2024

guicassolato commented Feb 16, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

maleck13 Feb 16, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

guicassolato Mar 1, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

youngnick commented Feb 19, 2024

Choose a reason for hiding this comment

guicassolato Feb 19, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

guicassolato Feb 19, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alexsnaps commented Feb 20, 2024 • edited Loading

guicassolato commented Feb 21, 2024

youngnick commented Feb 26, 2024

alexsnaps commented Feb 26, 2024

guicassolato commented Feb 28, 2024

guicassolato commented Mar 6, 2024 • edited Loading

maleck13 commented Mar 8, 2024

guicassolato commented Mar 8, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

KevFan left a comment

Choose a reason for hiding this comment

alexsnaps commented Apr 2, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alexsnaps left a comment

Choose a reason for hiding this comment

guicassolato commented Feb 16, 2024 •

edited

Loading

maleck13 Feb 16, 2024 •

edited

Loading

guicassolato Mar 1, 2024 •

edited

Loading

guicassolato Feb 19, 2024 •

edited

Loading

guicassolato Feb 19, 2024 •

edited

Loading

alexsnaps commented Feb 20, 2024 •

edited

Loading

guicassolato commented Mar 6, 2024 •

edited

Loading