Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rate limiting wasmplugin controller: using GatewayAPI topology (DAG) #447

Merged
merged 9 commits into from
Apr 5, 2024

Conversation

eguzki
Copy link
Contributor

@eguzki eguzki commented Mar 4, 2024

What

Dedicated WASMPlugin controller for the rate limiting policy. The controller implementation is based on a new approach using GatewayAPI Topology data structure Directed Acyclic Graph to represent the GatewayAPI resources.

Supersedes #317

Controller main object type

The type of the main object being reconciled is the Gateway. There is one-to-one mapping of gateway and wasmplugin. Another approach would be having the WASMPlugin as the main object type. In that case, it would be needed one way to get the gateway name from {name, namespace} of the wasmplugin. The event mappers would be more complex. Worth exploring anyway.

Watched resources and event mappers

The controller watches all resources that affect the configuration (spec) of the rate limit Wasmplugin. I.e. HTTPRoutes, Gateways and RateLimitPolicies. On events of those resources, the event mapper walks the Gateway API topology, via targetRef and parentRefs up to the peak of the topology, as far as kuadrant is concerned today, i.e. the Gateway. In other words, if you change a rate limit policy, it will generate events on all gateways affected. If you change an HTTPRroute, it will generate events on all gateways involved. Interesting to note that if you change the parentRef of a HTTProute, it will generate events on the gateways that were targeted and also on the new gateways currently being targeted. That allows doing cleaning for those gateways where a RLP is no longer affecting.

The GatewayAPI Topology

On gateway reconciliation event, the controller builds what it is named "The GatewayAPI Topology". It reads ALL HTTPRoutes targeting the given gateway as well as ALL rate limit policies in the cluster. With all that info, the controller creates a new instance of the Gateway API topology as well as a set of indexes (another "view" of the topology) that can be queried to provide information to build the desired spec of the rate limiting wasmplugin. Interesting to note here that a field indexer is being added to the manager to index all httproute parenting a gateway. Basically you can use this manager index (not a gateway api topology index), to query "give me all the HTTPRoutes parenting Gateway A".

The current implementation of the "GatewayAPI Topology" does not use a k8s client to query Gateway API and kuadrant policies. Instead, fetching resources must be done externally and then the GatewayAPI topology receives a list of resources. It could be done otherwise and have generic function with a k8s client to build topology for a given policy of one kind (for example RLP or KAP). The main reason for the current implementation is to decouple GatewayAPI topology and the cluster. Meaning that I can build GatewayAPI topology from a set of resources filtered with some criteria. The use case here is that I want to build a GatewayAPI topology, not for the entire cluster, but only focused on one gateway and all policies and routes involved.

The implementation assumes "cheap" operations of reading ALL the ratelimitpolicies and all routes parenting a given gateway. That assumption comes from the controller runtime k8s client that holds a cache. At least for list and get operations. That cache is being kept up to date in the background when there are updates/deletes in the objects triggered by kuadrant controllers or any other third party controllers or any other k8s API server user. Check for this comment with a verification of this "cheap" operation assumption. Indeed, the resource lists are cached across reconciliation events and controllers.

The topology implementation is based on a pure Directed Acyclic Graph data structure implementation that is totally decoupled from the GatewayAPI relations. On top of that, there is an implementation of the GatewayAPI topology which builds a graph from Gateways and HTTPRoutes. In the future, it can be extended for additional GatewayAPI API types. On top of the GatewayAPI topology, a number of indexes are created which are tailored for the needs of the rate limiting wasmplugin controller. Thus:

RL WASMPlugin controller 
       |
       |
       V
Custom topology indexes (views)
       |
       |
       V
GatewayAPI topology (focused on one gateway)
       |
       |
       V
     DAG

Additionally:

  • Added github.com/kuadrant/kuadrant-operator/pkg/library/gatewayapi package with all tools and helpers related to GatewayAPI resources and not directly related to any kuadrant controller logic. This package would be a good candidate to be shared and useful for any third party controller dealing with GatewayAPI resources. This package includes the GatewayAPI topology implementation.
  • moved few methods from common to github.com/kuadrant/kuadrant-operator/pkg/library/gatewayapi package that IMO are generic enough and decoupled from kuadrant controller's implementation needs. This change caused (small) changes on lots of files 😞
  • moved base reconcilers to pkg/library/reconcilers package

TODO: Integration tests:

  • Bring some tests from ratelimitpolicy_controller_test.go to rate_limiting_wasmplugin_controller_test.go
  • No routes targeting the gateway
  • No routes selected from the HTTPRoute
  • HTTPRoute moves from one gateway to another.
  • RLP changes target from one route to another
  • RLP A targeting Gateway, 1 free route. New RLP B targeting existing Route. The Wasmplugin should change. The RLP A should not add configuration to the WASMPlugin
  • New httproute without RLP: wasmplugin should change when there is RLP targeting the gateway

↔️ Next 🚋 station: Limitador controller using the GatewayAPI Topology

Copy link

codecov bot commented Mar 4, 2024

Codecov Report

Merging #447 (a46f05f) into main (ece13e8) will increase coverage by 0.58%.
Report is 14 commits behind head on main.
The diff coverage is 85.86%.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #447      +/-   ##
==========================================
+ Coverage   80.20%   80.79%   +0.58%     
==========================================
  Files          64       70       +6     
  Lines        4492     4956     +464     
==========================================
+ Hits         3603     4004     +401     
- Misses        600      642      +42     
- Partials      289      310      +21     
Flag Coverage Δ
integration 71.67% <68.08%> (+0.38%) ⬆️
unit 33.39% <54.88%> (+3.36%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Components Coverage Δ
api/v1beta1 (u) 71.42% <ø> (ø)
api/v1beta2 (u) 87.85% <ø> (-3.58%) ⬇️
pkg/common (u) 83.01% <ø> (-5.81%) ⬇️
pkg/istio (u) 75.14% <100.00%> (+1.23%) ⬆️
pkg/log (u) 94.73% <ø> (ø)
pkg/reconcilers (u) ∅ <ø> (∅)
pkg/rlptools (u) 79.77% <100.00%> (+0.31%) ⬆️
controllers (i) 76.90% <80.99%> (+0.10%) ⬆️
Files Coverage Δ
api/v1beta2/ratelimitpolicy_types.go 80.00% <ø> (-9.10%) ⬇️
controllers/authpolicy_authconfig.go 85.06% <100.00%> (ø)
controllers/authpolicy_controller.go 80.70% <100.00%> (ø)
...ontrollers/authpolicy_istio_authorizationpolicy.go 87.31% <100.00%> (ø)
controllers/dnshealthcheckprobe_eventmapper.go 68.18% <100.00%> (ø)
controllers/dnspolicy_controller.go 80.00% <100.00%> (-3.48%) ⬇️
controllers/gateway_kuadrant_controller.go 81.13% <ø> (ø)
controllers/httprouteparentrefs_eventmapper.go 53.84% <100.00%> (ø)
controllers/kuadrant_controller.go 52.27% <ø> (-1.30%) ⬇️
...ollers/limitador_cluster_envoyfilter_controller.go 87.01% <100.00%> (+5.19%) ⬆️
... and 20 more

... and 5 files with indirect coverage changes

@KevFan KevFan mentioned this pull request Mar 4, 2024
42 tasks
@eguzki eguzki self-assigned this Mar 4, 2024
@eguzki eguzki added this to the v0.7.0 milestone Mar 4, 2024
Comment on lines +223 to +225
policies := utils.Map(rlpList.Items, func(p kuadrantv1beta2.RateLimitPolicy) kuadrantgatewayapi.Policy { return &p })

t, err := kuadrantgatewayapi.NewTopology(
kuadrantgatewayapi.WithGateways([]*gatewayapiv1.Gateway{gw}),
kuadrantgatewayapi.WithRoutes(utils.Map(routeList.Items, ptr.To)),
kuadrantgatewayapi.WithPolicies(policies),
Copy link
Contributor

@KevFan KevFan Mar 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
policies := utils.Map(rlpList.Items, func(p kuadrantv1beta2.RateLimitPolicy) kuadrantgatewayapi.Policy { return &p })
t, err := kuadrantgatewayapi.NewTopology(
kuadrantgatewayapi.WithGateways([]*gatewayapiv1.Gateway{gw}),
kuadrantgatewayapi.WithRoutes(utils.Map(routeList.Items, ptr.To)),
kuadrantgatewayapi.WithPolicies(policies),
policies := utils.Map(rlpList.Items, func(p kuadrantv1beta2.RateLimitPolicy) kuadrantgatewayapi.Policy { return &p })
routes := utils.Map(routeList.Items, func(r gatewayapiv1.HTTPRoute) *gatewayapiv1.HTTPRoute {return &r})
t, err := kuadrantgatewayapi.NewTopology(
kuadrantgatewayapi.WithGateways([]*gatewayapiv1.Gateway{gw}),
kuadrantgatewayapi.WithRoutes(routes),
kuadrantgatewayapi.WithPolicies(policies),

nit: Goland IDE is highlighting this for me but it compiles and runs so not sure on this one, it might be just me .. 🤷
image

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make run-lint (golangci-lint) does not complain. It compiles.

you sure it is valid lint issue? "cannot use generic function without instantiation" does not sound correct for this example.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, unsure of this one either. Might be just Goland IDE specific issue for me. Since it compiles, I think it can be ignored

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am happy to change it to routes := utils.Map(routeList.Items, func(r gatewayapiv1.HTTPRoute) *gatewayapiv1.HTTPRoute {return &r}) if you find this annoying. Let me know.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should fix it:

Suggested change
policies := utils.Map(rlpList.Items, func(p kuadrantv1beta2.RateLimitPolicy) kuadrantgatewayapi.Policy { return &p })
t, err := kuadrantgatewayapi.NewTopology(
kuadrantgatewayapi.WithGateways([]*gatewayapiv1.Gateway{gw}),
kuadrantgatewayapi.WithRoutes(utils.Map(routeList.Items, ptr.To)),
kuadrantgatewayapi.WithPolicies(policies),
policies := utils.Map(rlpList.Items, func(p kuadrantv1beta2.RateLimitPolicy) kuadrantgatewayapi.Policy { return &p })
t, err := kuadrantgatewayapi.NewTopology(
kuadrantgatewayapi.WithGateways([]*gatewayapiv1.Gateway{gw}),
kuadrantgatewayapi.WithRoutes(utils.Map(routeList.Items, ptr.To[gatewayapiv1.HTTPRoute])),
kuadrantgatewayapi.WithPolicies(policies),

Copy link
Contributor

@KevFan KevFan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me anyway! Awesome work! 👍

Comment on lines +38 to +39
parents map[NodeID]*internalNode
children map[NodeID]*internalNode
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we use orderedmap here, so we have consistent responses from funcs like Parents(), Children()?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

happy to add them, but it is not needed right now, is it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a basic data structure that can be used to build indexes and elaborate logic on top. Because we already had problems with that in the past, I'm personally more inclined every day to ensuring consistent ordering on those lists. But up to you.

Comment on lines +103 to +104
d.nodes[n.id] = n
d.populateIndexes(node)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should add a mutex and wrap this as a critical section.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am aware that the dag package is not thread safe. But, as it is used now, it does not need to be thread-safe. Locks can be added when (if) a single instance is accessed by multiple threads.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems a minor effort worth adding IMO.

Comment on lines +141 to +142
parentInternalNode.children[childInternalNode.id] = childInternalNode
childInternalNode.parents[parentInternalNode.id] = parentInternalNode
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nodes []*node
}

// build a mutable simple graph representation out of DAG only for validating purposes
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you considered using a library such as graph? It has methods for DAGs as well as many others that I imagine could become useful in the future.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that graph and also dag. They have algorithms to find paths and subgraphs, to walk the graphs and other graph topology generic functionality. We do not need that, yet, and IMO the implementation is simple enough to do it ourselves. Additionally, we want functionality like "give me all the HTTPRoutes" and I could not find that. I implemented that using indexes on top of the graph.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Preventing cycles is one of the built-in features provided by both of those libs, plus they are both thread-safe by design.

Today we may not need all they offer, agreed. For example, we probably don't care about shortest paths between nodes yet. Still, as long as a lib is efficient enough for what we do need, saves us the effort to have to maintain those functionalities ourselves (simple as they might be), and opens up for more applications in the future, I personally think it's worth considering it.

As for "give me all the HTTPRoutes", isn't that the kind of thing we can get from traversing the graph? If you prefer keeping indexed copies of type-specific nodes – say, for efficiency –, I suppose we could also wrap the DAG within another struct that stores additionally all the indexes we need, maybe using the hashes/ids of the vertices to map to value?

Either way, I leave this as a suggestion for the future. In theory, if we want to, i guess replacing this underlying implementation should be possible to do transparently.

return GatewayNode{
Gateway: gNode.Gateway,
attachedPolicies: gNode.attachedPolicies,
routes: routes,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Routes of the gateways added lazily only when reading the gateways of a topology. Maybe add a comment?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I do not follow. When calling Gateways(), the returned object has all the routes for each gateway in the topology. It is not loaded lazily. There is no callback from the user perspective.

@eguzki
Copy link
Contributor Author

eguzki commented Mar 26, 2024

@guicassolato ready for a new review

Copy link
Contributor

@guicassolato guicassolato left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've tried this PR with the verification steps from #218. Except for the order of the rules in the WasmPlugin resource which seemed completely random (possibly a preexisting issue), it all works fine.

The kuadrant-operator-controller-manager keeps crashing but I don't think it's necessarily related to this PR. Rather, due to some DNSHealthCheckProbe issue spotted in the logs. The same does NOT happen in main though, so maybe rebasing will fix it?

Other than that, I've left a few comments from before – some reinforced in my latest review. The most important one regards thread-safety of the new data structures.

Copy link
Contributor

@guicassolato guicassolato left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for rebasing, @eguzki!

I don't want to block this any further, so approving it and leaving a summary of things I think we could improve in the future:

  1. thread-safety of the DAG data structure
  2. consistent order for the lists
  3. rely on a lib to implement and validate the graph

@eguzki
Copy link
Contributor Author

eguzki commented Apr 3, 2024

Fair points, before merging, I will:

#447 (comment)

Leave a comment to highlight that the dag package is not thread safe

#447 (comment)

Check this out as it should not matter the order of the policies being returned by the DAG. The controller sorts that list of policies afterwards based on policy creation timestamp

EDIT: found issue with the rules order as reported. Fixed in a46f05f

#447 (comment)

Create an issue for that
EDIT: Done #518

@eguzki
Copy link
Contributor Author

eguzki commented Apr 3, 2024

Comments addressed @guicassolato
Asking for one more review :please:

@@ -30,7 +31,16 @@ func WasmRules(rlp *kuadrantv1beta2.RateLimitPolicy, route *gatewayapiv1.HTTPRou
return rules
}

for limitName := range rlp.Spec.Limits {
// Sort RLP limits for consistent comparison with existing wasmplugin objects
limitNames := make([]string, 0, len(rlp.Spec.Limits))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This mitigates but does not solve 100%, I'm afraid. Ceteris paribus 🤓, a consistent wasmplugin config also depends on the order of the policies which are indexed for a gateway using a regular Golang map.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The order of the policies is already being sorted here, isn't it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as I mentioned already

The controller sorts that list of policies afterwards based on policy creation timestamp

@eguzki eguzki merged commit d54678b into main Apr 5, 2024
21 checks passed
@eguzki eguzki deleted the wasmplugin-controller branch April 5, 2024 18:25
@eguzki eguzki mentioned this pull request Apr 10, 2024
12 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

3 participants