-
Notifications
You must be signed in to change notification settings - Fork 697
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Elastic Agent: Set status.ObservedGeneration from metadata.Generation #5383
Conversation
818892f
to
cbf499e
Compare
Add tests for agent observedGeneration. modify agent reconcile behavior to ensure status updates for certain situations. adjust common comparison to allow additional options Update some documentation. Make disabling of k8s client's errors simpler. Make the status update more readable. Configure initial state for Elastic Agent properly from current state. Add missing comments for some exported data. Make errors consistent with capitalization, and namespace+name. Rename e2e test name to be more descriptive. Update pod labels in e2e test to properly trigger generation update. Remove failing client tests. used keyed structs in tests.
removed erroneous cmp.AllowUnexperted()
… test steps. remove WithDefaultESValidation from agent generation e2e test.
Remove unneeded struct in tests. Move Generation e2e tests to it's own package, and use it in all Agent mutation tests.
… observedGeneration.
b595041
to
2bc4fd9
Compare
pkg/controller/agent/controller.go
Outdated
@@ -153,25 +153,44 @@ func (r *ReconcileAgent) Reconcile(ctx context.Context, request reconcile.Reques | |||
func (r *ReconcileAgent) doReconcile(ctx context.Context, agent agentv1alpha1.Agent) *reconciler.Results { | |||
defer tracing.Span(&ctx)() | |||
results := reconciler.NewResult(ctx) | |||
status := newStatus(agent) | |||
if !association.AreConfiguredIfSet(agent.GetAssociations(), r.recorder) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think observedGeneration
should be updated even if associations are not correctly set. One reason is that misconfigured associations are not immediately detected. Here is an example:
- Deploy
config/recipes/elastic-agent/fleet-kubernetes-integration.yaml
observerdGeneration
is eventually updated:
k get agent.agent.k8s.elastic.co/fleet-server -o go-template='{{ printf "\nGeneration: "}}{{.metadata.generation}}{{printf "\n\nStatus:"}}{{.status}}'
Generation: 2
Status:map[availableNodes:1 elasticsearchAssociationsStatus:map[default/elasticsearch:Established] expectedNodes:1 health:green kibanaAssociationStatus:Established observedGeneration:2 version:8.0.0]
⬆️ generation
and observedGeneration
are both set to 2
- Break one of the association, for example
Kibana
,observedGeneration
is still updated:
Generation: 3
Status:map[availableNodes:1 elasticsearchAssociationsStatus:map[default/elasticsearch:Established] expectedNodes:1 health:green kibanaAssociationStatus:Pending observedGeneration:3 version:8.0.0
⬆️ generation
and observedGeneration
are both set to 3
- Update the
Agent
resource while keeping theKibana
association broken,observedGeneration
is no more updated, which seems a bit confusing to me and does not match the description "If the generation observed in status diverges from the generation in metadata, the Elastic Agent controller has not yet processed the changes contained in the Elastic Agent specification."
k get agent.agent.k8s.elastic.co/fleet-server -o go-template='{{ printf "\nGeneration: "}}{{.metadata.generation}}{{printf "\n\nStatus:"}}{{.status}}'
Generation: 4
Status:map[availableNodes:1 elasticsearchAssociationsStatus:map[default/elasticsearch:Established] expectedNodes:1 health:green kibanaAssociationStatus:Pending observedGeneration:3 version:8.0.0]
⬆️ generation
has been updated to 4, but observedGeneration
is still 3
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This logic has been updated, and I'll get these steps tested.
Updating E2E tests to have similar design as Kibana, and ES E2E tests for generation bits.
…392-agent-observedGeneration
…unction properly.
Last update resolves this issue with associations @barkbay initial
broken kibana assocciation
spec updated, observedGeneration updated:
|
pkg/controller/agent/controller.go
Outdated
if agent == nil { | ||
return reconcile.Result{}, err | ||
} | ||
return updateStatus(ctx, *agent, r.Client, newStatus(*agent)).WithError(err).Aggregate() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which case are we covering here? If we cannot fetch the Agent resource we cannot update. The case where we can fetch the Agent resource but the JSON in the association config is invalid is the only one I can think of. We do not handle this case for the Kibana resource as far as I can tell. I think it would be good to be somewhat consistent here.
So assuming I am not missing an obvious case (totally possible), I am not fully decided which way we should go here. Either we consider this case too much of an edge case to warrant updating the status in two places (which I would be OK with at this point) or we try to bring all the status update functionality into the same place.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've just removed updating the status on association failures, as it seems to be causing more issues than it's worth.
Don't worry about status update when FetchWithASsociations fails. Just return errors from updateStatus
pkg/controller/agent/reconcile.go
Outdated
@@ -27,7 +27,7 @@ import ( | |||
"github.com/elastic/cloud-on-k8s/pkg/utils/pointer" | |||
) | |||
|
|||
func reconcilePodVehicle(params Params, podTemplate corev1.PodTemplateSpec) *reconciler.Results { | |||
func reconcilePodVehicle(params Params, podTemplate corev1.PodTemplateSpec, status *agentv1alpha1.AgentStatus) *reconciler.Results { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's a bit messy here is that on the one hand we have a more functional approach where results are returned and not pointers manipulated:
reconcilePodVehicle(...) *reconcilerResults
and now we are introducing status here that is following another idiom where a pointer to a struct is passed and the underlying struct is manipulated: sort of a hidden return value if you will.
reconcilePodVehicle(&status)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see your point here. I have refactored this to return both a status and a results pointer throughout the reconciliation. Lmk if that is what you were intending.
Update status logic to update health properly. Add unit test for updating of health in agent status.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Almost LGTM!
"name", associatedKey.Name) | ||
return results.WithResult(reconcile.Result{Requeue: true}).Aggregate() | ||
} else if err != nil { | ||
log.Error( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
^ this is still open
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
elastic#5383) * Add agent observed generation. Add tests for agent observedGeneration. modify agent reconcile behavior to ensure status updates for certain situations. adjust common comparison to allow additional options Update some documentation. Make disabling of k8s client's errors simpler. Make the status update more readable. Configure initial state for Elastic Agent properly from current state. Add missing comments for some exported data. Make errors consistent with capitalization, and namespace+name. Rename e2e test name to be more descriptive. Update pod labels in e2e test to properly trigger generation update. Remove failing client tests. used keyed structs in tests. * Run generation and update yaml for observedGeneration description change. * Properly order imports * capitalize Agent * remove erroneous debugging line * renamed agent e2e test file. removed erroneous cmp.AllowUnexperted() * Ignore generation when checking if agent status is updated during k8s test steps. remove WithDefaultESValidation from agent generation e2e test. * remove unneeded nil in return from rebase. * Add missing header to e2e test. * Remove unused args struct * move imports around properly * Defer agent validation to reduce repetitive code. Remove unneeded struct in tests. Move Generation e2e tests to it's own package, and use it in all Agent mutation tests. * Updating metav1.Object -> client.Object, and using type switch to get observedGeneration. * Allow status to be updated even if associations are broken. * Allow error to be returned from doReconcile during defer. Updating E2E tests to have similar design as Kibana, and ES E2E tests for generation bits. * ensure withmutatedfrom is used in e2e agent tests. * Ensure we're not returning nil results * Fixing comparison issues from rebase/merge * Add nolint for the naked return, which is required for the defer to function properly. * Handle updating status properly when associations are broken. * Debugging * Debugging, and moving the defer * Go back to previous way of handling fetch association errors * remove extraneous logging statement * Move status update to calling function. Don't worry about status update when FetchWithASsociations fails. Just return errors from updateStatus * Remove unused context * Update the func's comments to be consistent * remove unneeded return * move common.LowestVersionFromPods closer to where pods are retrieved. Update status logic to update health properly. Add unit test for updating of health in agent status. * changes to return status instead of modifying a pointer. * rename variable * Updating how status is handled for consistency
This change is Elastic Agent specific at the moment
related to #3392
related to #5331
This change allows status.ObservedGeneration to be kept in sync with metadata.Generation according to the rules described in the above issue. This change adds a set of unit tests which show more details into when observedGeneration will, and will not be updated according to any errors that may occur during reconciliation. An end-to-end test is also added to show how the change is intended to function in a true cluster.