Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Elastic Agent: Set status.ObservedGeneration from metadata.Generation #5383

Merged
merged 35 commits into from
Apr 5, 2022

Conversation

naemono
Copy link
Contributor

@naemono naemono commented Feb 15, 2022

This change is Elastic Agent specific at the moment

related to #3392
related to #5331

This change allows status.ObservedGeneration to be kept in sync with metadata.Generation according to the rules described in the above issue. This change adds a set of unit tests which show more details into when observedGeneration will, and will not be updated according to any errors that may occur during reconciliation. An end-to-end test is also added to show how the change is intended to function in a true cluster.

@botelastic botelastic bot added the triage label Feb 15, 2022
@naemono naemono changed the title Add agent observed generation. Elastic Agent: Set status.ObservedGeneration from metadata.Generation Feb 15, 2022
@barkbay barkbay added the >enhancement Enhancement of existing functionality label Feb 16, 2022
@botelastic botelastic bot removed the triage label Feb 16, 2022
@naemono naemono force-pushed the 3392-agent-observedGeneration branch from 818892f to cbf499e Compare February 16, 2022 15:27
@naemono naemono marked this pull request as ready for review February 16, 2022 20:11
pkg/controller/agent/controller.go Outdated Show resolved Hide resolved
pkg/controller/agent/controller.go Outdated Show resolved Hide resolved
pkg/controller/agent/controller.go Outdated Show resolved Hide resolved
test/e2e/agent/observed_generation_test.go Outdated Show resolved Hide resolved
pkg/controller/agent/controller_test.go Outdated Show resolved Hide resolved
Add tests for agent observedGeneration.
modify agent reconcile behavior to ensure status updates for certain situations.
adjust common comparison to allow additional options
Update some documentation.
Make disabling of k8s client's errors simpler.
Make the status update more readable.
Configure initial state for Elastic Agent properly from current state.
Add missing comments for some exported data.
Make errors consistent with capitalization, and namespace+name.
Rename e2e test name to be more descriptive.
Update pod labels in e2e test to properly trigger generation update.
Remove failing client tests.
used keyed structs in tests.
removed erroneous cmp.AllowUnexperted()
… test steps.

remove WithDefaultESValidation from agent generation e2e test.
Remove unneeded struct in tests.
Move Generation e2e tests to it's own package, and use it in all Agent mutation tests.
@naemono naemono force-pushed the 3392-agent-observedGeneration branch from b595041 to 2bc4fd9 Compare February 22, 2022 14:04
@@ -153,25 +153,44 @@ func (r *ReconcileAgent) Reconcile(ctx context.Context, request reconcile.Reques
func (r *ReconcileAgent) doReconcile(ctx context.Context, agent agentv1alpha1.Agent) *reconciler.Results {
defer tracing.Span(&ctx)()
results := reconciler.NewResult(ctx)
status := newStatus(agent)
if !association.AreConfiguredIfSet(agent.GetAssociations(), r.recorder) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think observedGeneration should be updated even if associations are not correctly set. One reason is that misconfigured associations are not immediately detected. Here is an example:

  1. Deploy config/recipes/elastic-agent/fleet-kubernetes-integration.yaml
  2. observerdGeneration is eventually updated:
 k get agent.agent.k8s.elastic.co/fleet-server -o go-template='{{ printf "\nGeneration: "}}{{.metadata.generation}}{{printf "\n\nStatus:"}}{{.status}}'

Generation: 2

Status:map[availableNodes:1 elasticsearchAssociationsStatus:map[default/elasticsearch:Established] expectedNodes:1 health:green kibanaAssociationStatus:Established observedGeneration:2 version:8.0.0]

⬆️ generation and observedGeneration are both set to 2

  1. Break one of the association, for example Kibana, observedGeneration is still updated:

Generation: 3

Status:map[availableNodes:1 elasticsearchAssociationsStatus:map[default/elasticsearch:Established] expectedNodes:1 health:green kibanaAssociationStatus:Pending observedGeneration:3 version:8.0.0

⬆️ generation and observedGeneration are both set to 3

  1. Update the Agent resource while keeping the Kibana association broken, observedGeneration is no more updated, which seems a bit confusing to me and does not match the description "If the generation observed in status diverges from the generation in metadata, the Elastic Agent controller has not yet processed the changes contained in the Elastic Agent specification."
k get agent.agent.k8s.elastic.co/fleet-server -o go-template='{{ printf "\nGeneration: "}}{{.metadata.generation}}{{printf "\n\nStatus:"}}{{.status}}'

Generation: 4

Status:map[availableNodes:1 elasticsearchAssociationsStatus:map[default/elasticsearch:Established] expectedNodes:1 health:green kibanaAssociationStatus:Pending observedGeneration:3 version:8.0.0]

⬆️ generation has been updated to 4, but observedGeneration is still 3

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This logic has been updated, and I'll get these steps tested.

pkg/controller/agent/reconcile.go Outdated Show resolved Hide resolved
pkg/controller/agent/reconcile.go Outdated Show resolved Hide resolved
@naemono
Copy link
Contributor Author

naemono commented Mar 10, 2022

Last update resolves this issue with associations @barkbay

initial

❯ kc get agent.agent.k8s.elastic.co/fleet-server -o go-template='{{ printf "\nGeneration: "}}{{.metadata.generation}}{{printf "\n\nStatus:"}}{{.status}}'

Generation: 2

Status:map[availableNodes:1 elasticsearchAssociationsStatus:map[default/elasticsearch:Established] expectedNodes:1 health:green kibanaAssociationStatus:Established observedGeneration:2 version:8.0.0]

broken kibana assocciation

❯ kc get agent.agent.k8s.elastic.co/fleet-server -o go-template='{{ printf "\nGeneration: "}}{{.metadata.generation}}{{printf "\n\nStatus:"}}{{.status}}'

Generation: 3

Status:map[availableNodes:1 elasticsearchAssociationsStatus:map[default/elasticsearch:Established] expectedNodes:1 health:green kibanaAssociationStatus:Pending observedGeneration:3 version:8.0.0]

spec updated, observedGeneration updated:

❯ kc get agent.agent.k8s.elastic.co/fleet-server -o go-template='{{ printf "\nGeneration: "}}{{.metadata.generation}}{{printf "\n\nStatus:"}}{{.status}}'

Generation: 4

Status:map[availableNodes:1 elasticsearchAssociationsStatus:map[default/elasticsearch:Established] expectedNodes:1 health:green kibanaAssociationStatus:Pending observedGeneration:4 version:8.0.0]

pkg/controller/agent/reconcile.go Outdated Show resolved Hide resolved
pkg/controller/agent/reconcile.go Outdated Show resolved Hide resolved
if agent == nil {
return reconcile.Result{}, err
}
return updateStatus(ctx, *agent, r.Client, newStatus(*agent)).WithError(err).Aggregate()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which case are we covering here? If we cannot fetch the Agent resource we cannot update. The case where we can fetch the Agent resource but the JSON in the association config is invalid is the only one I can think of. We do not handle this case for the Kibana resource as far as I can tell. I think it would be good to be somewhat consistent here.

So assuming I am not missing an obvious case (totally possible), I am not fully decided which way we should go here. Either we consider this case too much of an edge case to warrant updating the status in two places (which I would be OK with at this point) or we try to bring all the status update functionality into the same place.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've just removed updating the status on association failures, as it seems to be causing more issues than it's worth.

pkg/controller/agent/controller.go Outdated Show resolved Hide resolved
pkg/controller/agent/controller.go Outdated Show resolved Hide resolved
pkg/controller/agent/controller_test.go Show resolved Hide resolved
pkg/controller/agent/controller.go Outdated Show resolved Hide resolved
pkg/controller/agent/controller.go Outdated Show resolved Hide resolved
pkg/controller/agent/controller.go Outdated Show resolved Hide resolved
pkg/controller/agent/reconcile.go Outdated Show resolved Hide resolved
Don't worry about status update when FetchWithASsociations fails.
Just return errors from updateStatus
@naemono naemono added the >feature Adds or discusses adding a feature to the product label Mar 30, 2022
pkg/controller/agent/reconcile.go Outdated Show resolved Hide resolved
pkg/controller/agent/reconcile.go Outdated Show resolved Hide resolved
pkg/controller/agent/reconcile.go Outdated Show resolved Hide resolved
pkg/controller/agent/reconcile.go Outdated Show resolved Hide resolved
pkg/controller/agent/controller.go Outdated Show resolved Hide resolved
@@ -27,7 +27,7 @@ import (
"github.com/elastic/cloud-on-k8s/pkg/utils/pointer"
)

func reconcilePodVehicle(params Params, podTemplate corev1.PodTemplateSpec) *reconciler.Results {
func reconcilePodVehicle(params Params, podTemplate corev1.PodTemplateSpec, status *agentv1alpha1.AgentStatus) *reconciler.Results {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's a bit messy here is that on the one hand we have a more functional approach where results are returned and not pointers manipulated:

reconcilePodVehicle(...) *reconcilerResults

and now we are introducing status here that is following another idiom where a pointer to a struct is passed and the underlying struct is manipulated: sort of a hidden return value if you will.

reconcilePodVehicle(&status)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see your point here. I have refactored this to return both a status and a results pointer throughout the reconciliation. Lmk if that is what you were intending.

Copy link
Collaborator

@pebrc pebrc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Almost LGTM!

pkg/controller/agent/controller.go Outdated Show resolved Hide resolved
"name", associatedKey.Name)
return results.WithResult(reconcile.Result{Requeue: true}).Aggregate()
} else if err != nil {
log.Error(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

^ this is still open

pkg/controller/agent/reconcile.go Outdated Show resolved Hide resolved
pkg/controller/agent/controller.go Outdated Show resolved Hide resolved
Copy link
Collaborator

@pebrc pebrc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@naemono naemono merged commit c4af92c into elastic:main Apr 5, 2022
@naemono naemono deleted the 3392-agent-observedGeneration branch April 5, 2022 12:51
fantapsody pushed a commit to fantapsody/cloud-on-k8s that referenced this pull request Feb 7, 2023
elastic#5383)

* Add agent observed generation.
Add tests for agent observedGeneration.
modify agent reconcile behavior to ensure status updates for certain situations.
adjust common comparison to allow additional options
Update some documentation.
Make disabling of k8s client's errors simpler.
Make the status update more readable.
Configure initial state for Elastic Agent properly from current state.
Add missing comments for some exported data.
Make errors consistent with capitalization, and namespace+name.
Rename e2e test name to be more descriptive.
Update pod labels in e2e test to properly trigger generation update.
Remove failing client tests.
used keyed structs in tests.

* Run generation and update yaml for observedGeneration description change.

* Properly order imports

* capitalize Agent

* remove erroneous debugging line

* renamed agent e2e test file.
removed erroneous cmp.AllowUnexperted()

* Ignore generation when checking if agent status is updated during k8s test steps.
remove WithDefaultESValidation from agent generation e2e test.

* remove unneeded nil in return from rebase.

* Add missing header to e2e test.

* Remove unused args struct

* move imports around properly

* Defer agent validation to reduce repetitive code.
Remove unneeded struct in tests.
Move Generation e2e tests to it's own package, and use it in all Agent mutation tests.

* Updating metav1.Object -> client.Object, and using type switch to get observedGeneration.

* Allow status to be updated even if associations are broken.

* Allow error to be returned from doReconcile during defer.
Updating E2E tests to have similar design as Kibana, and ES E2E tests for generation bits.

* ensure withmutatedfrom is used in e2e agent tests.

* Ensure we're not returning nil results

* Fixing comparison issues from rebase/merge

* Add nolint for the naked return, which is required for the defer to function properly.

* Handle updating status properly when associations are broken.

* Debugging

* Debugging, and moving the defer

* Go back to previous way of handling fetch association errors

* remove extraneous logging statement

* Move status update to calling function.
Don't worry about status update when FetchWithASsociations fails.
Just return errors from updateStatus

* Remove unused context

* Update the func's comments to be consistent

* remove unneeded return

* move common.LowestVersionFromPods closer to where pods are retrieved.
Update status logic to update health properly.
Add unit test for updating of health in agent status.

* changes to return status instead of modifying a pointer.

* rename variable

* Updating how status is handled for consistency
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>enhancement Enhancement of existing functionality >feature Adds or discusses adding a feature to the product
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants