Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

when backing up PVCs, also back up claimed PVs #65

Merged
merged 1 commit into from
Oct 11, 2017

Conversation

skriss
Copy link
Member

@skriss skriss commented Aug 31, 2017

Signed-off-by: Steve Kriss steve@heptio.com

addresses #15, but doesn't fully fix
Fixes #105

@skriss skriss added this to the v0.4.0 milestone Sep 1, 2017
@skriss skriss requested a review from ncdc September 5, 2017 17:04
return err
}

for _, itm := range items {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ctx.additionalItems = append(ctx.additionalItems, items...)


// backedUpItems keeps track of items that have been backed up already. The string should be derived
// by calling String() on an itemKey instance representing the item.
backedUpItems sets.String
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would probably do this as map[itemKey]struct{} - no need to stringify. And/or we could also create type itemKeySet map[itemKey]struct{} and add methods to it to make it more set like.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

heh, i had it that way originally (map[itemKey]struct{}) and switched to this to get the sets methods (mainly for clarity). Not having generics makes me sad at times like these.


// resolve additionalItems to GroupResources
for _, itm := range ctx.additionalItems {
groupResource, err := resolveGroupResource(kb.discoveryHelper.Mapper(), itm.resource)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we modify resolveGroupResource to be resolveGroupVersionResource and return a schema.GroupVersionResource, we'll have the version we need, and you won't need the new clientForGroupResource function below. Let me know if you need further clarification.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We still need the APIResource obj to be able to get a client, how would you do that differently?

backedUpItems sets.String

// additionalItems is a list of items returned by Actions that should be additionally backed up.
additionalItems []itemKey
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Recording here for posterity - might want to consider refactoring the backup code to use a queue + workers, so backing up anything, whether as part of the normal process or "additional" one, would go through a single queue.

@ncdc ncdc modified the milestones: v0.4.0, v0.5.0 Sep 7, 2017
// additionalItems is a list of items returned by Actions that should be additionally backed up.
additionalItems []itemKey

// resources is a map of GroupResource->APIResource that we've seen so far.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would you think about adding extra functionality to our discovery helper so that we could more easily get a client from a GVR or GVK? In other words, when we refresh discovery, build up maps that lets us find an APIResource given either a GroupResource or a GroupKind? That would make it much easier to get a client, and we wouldn't have to rely on either iterating through or keeping track of what we've seen.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that's probably a better place to put the logic. I'll make the switch

@@ -400,5 +500,12 @@ func (*realItemBackupper) backupItem(ctx *backupContext, item map[string]interfa
return err
}

itm := itemKey{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we call itm "item" instead?

// Execute finds the name of the PersistentVolume referenced by the provided
// PersistentVolumeClaim and returns it as an additional item to back up.
func (a *getPVAction) Execute(pvc map[string]interface{}, backup *api.Backup) ([]itemKey, error) {
volumeName, err := collections.GetString(pvc, "spec.volumeName")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feel like adding any logging in here?

return err
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe log what the additional items are?

return kuberrs.NewAggregate(errs)
}

func (kb *kubernetesBackupper) backupAdditionalItems(ctx *backupContext) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Log that we're starting to back up additional items?

@@ -78,6 +78,8 @@ type Dynamic interface {
List(metav1.ListOptions) (runtime.Object, error)
// Watch watches for changes to objects of a given resource.
Watch(metav1.ListOptions) (watch.Interface, error)

Get(name string, opts metav1.GetOptions) (*unstructured.Unstructured, error)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Godoc

@skriss skriss force-pushed the cluster-resource-selection branch 3 times, most recently from 8214a21 to ba81127 Compare September 11, 2017 21:28
@skriss
Copy link
Member Author

skriss commented Sep 11, 2017

@ncdc PTAL at the revisions to this. I moved resource resolution into the discovery helper and built up a map for APIResource fetching there during Refresh().

@@ -50,9 +51,10 @@ type helper struct {
discoveryClient discovery.DiscoveryInterface

// lock guards mapper and resources
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

update comment to include resourcesMap

return schema.GroupVersionResource{}, metav1.APIResource{}, err
}

apiResource, found := h.resourcesMap[gvr]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need to acquire a read lock

@@ -67,7 +67,17 @@ func (ac ActionContext) log(msg string, args ...interface{}) {
type Action interface {
// Execute is invoked on an item being backed up. If an error is returned, the Backup is marked as
// failed.
Execute(ctx ActionContext, item map[string]interface{}, backup *api.Backup) error
Execute(ctx ActionContext, item map[string]interface{}, backup *api.Backup) ([]itemKey, error)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about instead of returning the keys of additional items to back up, we just let the action call back to the itemBackupper to backup the additional items? i.e. each action implementation holds a reference to an itemBackupper. We'd need to make sure that the itemBackupper could look up the action instead of having that be part of kubernetesBackupper#backupResource. This way, each item can immediately back up transitive dependencies and add them to the backupContext's set of already backed up items.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like it in theory; a bunch of stuff will have to move around because we'll want to check resource includes/excludes, label-selector, backed up items before backing up the PV and this is all done outside the backupItem method now.

@skriss
Copy link
Member Author

skriss commented Oct 4, 2017

got this approach working, have some fine-tuning to do on the design but generally looks good.

@skriss skriss force-pushed the cluster-resource-selection branch 4 times, most recently from 7bffb35 to 356b098 Compare October 5, 2017 18:00
@skriss
Copy link
Member Author

skriss commented Oct 5, 2017

@ncdc the code here still needs some fine-tuning but would appreciate a review on the design/refactoring to allow actions to trigger backups of additional items directly (just the last commit)

Copy link
Contributor

@ncdc ncdc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quick initial scan. Would like to review more thoroughly later.

@@ -69,7 +69,17 @@ func (ac ActionContext) log(msg string, args ...interface{}) {
type Action interface {
// Execute is invoked on an item being backed up. If an error is returned, the Backup is marked as
// failed.
Execute(ctx ActionContext, item map[string]interface{}, backup *api.Backup) error
Execute(ctx *backupContext, item map[string]interface{}, backup *api.Backup, backupper itemBackupper) error
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

backupContext already has the Backup in it, so no need to pass the Backup in separately.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should put the itemBackupper in the backupContext?

Copy link
Member Author

@skriss skriss Oct 5, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point on 1; maybe on 2.

Does the move to eliminate ActionContext and just use backupContext within the actions seem reasonable? I was going to have to add many of the fields from the latter to the former otherwise, so this seemed like a better choice.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I'm fine getting rid of the ActionContext and replacing with backupContext

delete(item, "status")

metadata, err := collections.GetMap(item, "metadata")
func (b *realItemBackupper) backupItem(ctx *backupContext, item map[string]interface{}, groupResource schema.GroupResource) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my #112 I call this ib instead of b

@@ -421,5 +461,7 @@ func (*realItemBackupper) backupItem(ctx *backupContext, item map[string]interfa
return errors.WithStack(err)
}

ctx.backedUpItems[key] = struct{}{}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move this up to just below where you check if we've already backed it up, so we record it asap

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left it down here because I didn't want to record it as backed up unless it actually succeeded. WDYT?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My gut says that trying a second time if the initial backup failed would likely encounter the same error?

t.Errorf("headers: expected %d, got %d", e, a)
}
e, a := 1, len(w.headers)
require.Equal(t, e, a, "headers: expected %d, got %d", e, a)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

testify will print out expected and actual values. this could be

require.Equal(t, 1, len(w.headers), "headers")

@@ -65,6 +65,7 @@ func (a *volumeSnapshotAction) Execute(ctx ActionContext, volume map[string]inte
// non-nil error means it's a supported PV source but volume ID can't be found
if err != nil {
return errors.Wrapf(err, "error getting volume ID for backup %q, PersistentVolume %q", backupName, name)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

delete

// back it up
ctx.log("backing up PV %s\n", volumeName)
if err := backupper.backupItem(ctx, pv.UnstructuredContent(), gr); err != nil {
ctx.log("error backing up PV: %s\n", err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we return the error?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, a bunch of the logging/error handling in here was messed up so I fixed it.

@@ -106,6 +121,16 @@ func (h *helper) Refresh() error {

sortResources(h.resources)

h.resourcesMap = make(map[schema.GroupVersionResource]metav1.APIResource)
for _, resourceGroup := range h.resources {
gv, _ := schema.ParseGroupVersion(resourceGroup.GroupVersion)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might as well return if we get an error

ret = append(ret, gr)
set.Insert(gr.String())
}

// go through everything we got from discovery and add anything not in "set" to byName
var byName []schema.GroupResource
for _, resourceGroup := range helper.Resources() {
for _, resourceGroup := range resources {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was there a reason for this change?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good q, not really/may have been a remnant of some rebasing from before. fixed it.

@skriss skriss force-pushed the cluster-resource-selection branch 2 times, most recently from 4a8bf90 to dff665c Compare October 10, 2017 23:02
@skriss
Copy link
Member Author

skriss commented Oct 10, 2017

Two notes:

  • if both PVs and PVCs are being backed up, we need to force PVCs to go first so that the relevant PVs are backed up as part of that. I'll probably just hardcode this logic for now.
  • we need something like Improved backup resource selection #15 (comment) to really make this use-case work as expected. this PR helps with the backend but doesn't address the UX.

ctx.infof("Processing resource %s/%s", group.GroupVersion, resource.Name)
if err := kb.backupResource(ctx, group, resource); err != nil {
errs = append(errs, err)
}
}

for _, resource := range group.APIResources {
// do PVs last because if we're also backing up PVCs, we want to backup
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This works for now, but do you think longer-term we should have prioritizedResources for backups too?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm actually going to want to make sure we have this order: pods, pvcs, pvs (anything else can come whenever). Maybe I'll do prioritized resources as part of my hooks PR. We don't necessarily have to expose it initially, but I think something a bit more flexible than a bool or two is going to be needed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, thought the same thing - if you're okay with it, I'll leave this as-is and either you or I can do priorities in a separate PR

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fine as-is

// do PVs last because if we're also backing up PVCs, we want to backup
// PVs within the scope of the PVCs (within the PVC action) to allow
// for hooks to run
if strings.ToLower(resource.Name) == "persistentvolumes" {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to be safe, let's make sure that group.GroupVersion is v1

@skriss skriss force-pushed the cluster-resource-selection branch 2 times, most recently from e061a6c to 9b3a6ba Compare October 11, 2017 16:17
@skriss
Copy link
Member Author

skriss commented Oct 11, 2017

retest this please

Signed-off-by: Steve Kriss <steve@heptio.com>
@skriss skriss changed the title [WIP] when backing up PVCs, also back up claimed PVs when backing up PVCs, also back up claimed PVs Oct 11, 2017
@ncdc ncdc merged commit 2a975a2 into vmware-tanzu:master Oct 11, 2017
@skriss skriss deleted the cluster-resource-selection branch October 23, 2017 17:58
dymurray pushed a commit to dymurray/velero that referenced this pull request Aug 25, 2020
bump restic timeouts and add diagnose tools
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Modify backup/Action interface
2 participants