Skip to content
This repository has been archived by the owner on Nov 1, 2022. It is now read-only.

Return sync errors in ListServices #1410

Merged
merged 3 commits into from
Oct 26, 2018

Conversation

rndstr
Copy link
Contributor

@rndstr rndstr commented Sep 28, 2018

When applying the Kubernetes YAML fails we now return this error in
ListServices as part of the rollout messages.

Closes #1010

daemon/daemon.go Outdated
JobStatusCache *job.StatusCache
EventWriter event.EventWriter
Logger log.Logger
ResourceSyncErrors map[flux.ResourceID]error

This comment was marked as abuse.

daemon/daemon.go Outdated
@@ -136,6 +138,10 @@ func (d *Daemon) ListServicesWithOptions(ctx context.Context, opts v11.ListServi
case service.IsSystem:
readOnly = v6.ReadOnlySystem
}
if syncerr, ok := d.ResourceSyncErrors[service.ID]; ok {
service.Rollout.Messages = append(service.Rollout.Messages, syncerr.Error())

This comment was marked as abuse.

daemon/loop.go Outdated
@@ -192,6 +192,11 @@ func (d *Daemon) doSync(logger log.Logger) (retErr error) {
}

var syncErrors []event.ResourceError
// Since Sync() applies *all* resources we clear all errors here.
// If there is a recurring issue it will show up every time when
// this code is run.

This comment was marked as abuse.

@@ -17,7 +17,7 @@ type Cluster interface {
SomeControllers([]flux.ResourceID) ([]Controller, error)
Ping() error
Export() ([]byte, error)
Sync(SyncDef) error
Sync(SyncDef, bool) error

This comment was marked as abuse.

@squaremo
Copy link
Member

squaremo commented Oct 9, 2018

What's happening with this one? Looks like there's only a couple of things holding it back:

  • the tests don't compile, because there's a fake Applier with an out of date method signature
  • it needs some thought over how sync errors are interpreted with respect to rollout progress

@squaremo
Copy link
Member

That latest commit looks good 🍊

Regarding the sync error field -- we could think of (or even implement) the sync status as a Condition, in which case we'd probably treat it like we would Available or whatever. And then it'd end up in the messages, as it does in the above. I reckon it would still need some interpretation, at least because of the scenario I mentioned above. wdyt @rndstr ?

@squaremo
Copy link
Member

Regarding the sync error field -- we could think of (or even implement) the sync status as a Condition, in which case we'd probably treat it like we would Available or whatever. And then it'd end up in the messages, as it does in the above. I reckon it would still need some interpretation, at least because of the scenario I mentioned above.

Thinking this through a little more, if the sync status is a condition, then it should be considered as part of the rollout status calculation. That would need control flow to change, since that calculation is all done purely from the cluster resource, at present.
A simple way to do it would be to replace the RolloutStatus field with an interface (or just a procedure) that can be invoked to get the rollout status given the sync status.

@squaremo
Copy link
Member

Alternative: remove the bit that adds the sync error to the messages for now, since that's the controversial bit, and the rest is useful without it.

When applying the Kubernetes YAML fails we now record this and make it
available within the daemon.
`fluxsync.Sync()` applies everything as a multidoc by default. If any
apply fails, the next `Sync()` will apply the failed ones one by one
while the rest is still applied as a multidoc.
@rndstr
Copy link
Contributor Author

rndstr commented Oct 25, 2018

I finally came to the realization that the sync errors should not be part of the rollout status. The rollout basically provides the current state of a workload. The sync error is from whenever the yaml file last changed (and has been sync'ed past) and has no relation to the current rollout status.
Maybe that's what you already meant @squaremo?

Creating a release does commit & sync, while sync will fail during an user triggered release and provide semi-instant feedback (through polling jobs).

Providing the information in ListServices would not necessarily be useful for determining the rollout status but just the general state of the YAML.

I removed the part where it copies it into the rollout status for now and will add this directly as sub field for the Controller as SyncError.

@rndstr rndstr force-pushed the issue/1010-sync-errors-in-listservices branch from c818454 to 936d121 Compare October 26, 2018 00:25
The cluster delivers information about controllers and also performs the
sync action. It is standing to reason that it should also fill in the
blanks for sync errors.
@squaremo
Copy link
Member

I finally came to the realization that the sync errors should not be part of the rollout status. The rollout basically provides the current state of a workload. The sync error is from whenever the yaml file last changed (and has been sync'ed past) and has no relation to the current rollout status.
Maybe that's what you already meant @squaremo?

Not quite, but this is at least as good a line of reasoning :-)

I removed the part where it copies it into the rollout status for now and will add this directly as sub field for the Controller as SyncError.

I like the moving it fully into the cluster implementation -- nice work there.

@rndstr rndstr merged commit 706adb1 into master Oct 26, 2018
@rndstr rndstr deleted the issue/1010-sync-errors-in-listservices branch October 26, 2018 16:25
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants