Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improved backup resource selection #15

Closed
kstewart opened this issue Aug 9, 2017 · 22 comments
Closed

Improved backup resource selection #15

kstewart opened this issue Aug 9, 2017 · 22 comments
Assignees

Comments

@kstewart
Copy link

kstewart commented Aug 9, 2017

If you do ark backup create foo --include-namespaces andy, that will back up everything in the 'andy' namespace AND all cluster-scoped resources (the following list is from my local v1.7.3, supporting create + list):

  • apiservices
  • certificatesigningrequests
  • clusterrolebindings
  • clusterroles
  • customresourcedefinitions
  • namespaces
  • nodes
  • persistentvolumes
  • podsecuritypolicies
  • storageclasses
  • thirdpartyresources

If you want to back up a subset of resources, you have to specify some combination of --include-resources, --exclude-resources, and/or --selector.

If we decide that one of Ark's core use cases is to allow one or more namespaces to be backed up, we should find a way to limit the resources backed up to just those used by the namespace(s) in question.

@ncdc
Copy link
Contributor

ncdc commented Aug 10, 2017

Did some brainstorming with @skriss:

If you're trying to back up a specific namespace, you probably don't want to include all cluster-scoped resources. A better approach would probably be to specify the namespace and use a label selector. This ensures that the cluster-scoped resources match the selector.

Where this doesn't work is with dynamically-provisioned PVs, because they don't assume the labels from the associated PVC. To get around this, I think we should modify the Action() signature to return an optional slice of additional items to back up. In the case of a pod-like resource that references a PVC, we would return that we want to back up the associated PV. We'd need to keep track of a universal list of all resource/name/name items backed up to ensure we don't back up the same PV multiple times.

@skriss
Copy link
Member

skriss commented Aug 31, 2017

@ncdc WDYT about making "cluster" an includeable/excludeable scope, like namespaces? We could change --include-namespaces and --exclude-namespaces to --include/exclude-scopes, and cluster would be a special value. This simplifies excluding cluster-scoped resources (currently you have to either exclude all the resource types or have a label selector that doesn't match them).

This would not replace the functionality of backing up the PV's claimed by PVCs -- just seems like a useful feature in thinking through all of the various use cases around resource selection.

@ncdc
Copy link
Contributor

ncdc commented Sep 5, 2017

@skriss I've considered finding an easy way to include/exclude cluster-scoped resources. I can't say I love using "scope" as part of the flag name or concept exposed to end users.

It would be nice to get some additional feedback on how and what people are backing up and restoring. cc @jbeda @rosskukulinski @djschny

@rosskukulinski
Copy link
Contributor

@ncdc I don't have a ton of signal for you at the moment. Right now, I'm thinking of the backup/restore scenario as two personas:

  1. Cluster Admin who's responsible for the life of the entire cluster. Being able to rehydrate a new cluster from a backup with everything included is the goal. (e.g. Nordstrom, Etsy)

  2. Team Admin who's responsible for one or more namespaces. Being able to backup/restore their namespace(s) for DR or for CI/CD is the goal. Cloning a namespace within the same cluster (CI/CD) only requires PV/PVC from cluster scope. Cloning namespaces between clusters likely requires more: PV/PVC, RBAC objects, maybe StorageClass (storage arguably should be provided by cluster admin).

Haven't thought in depth on this, but I'm liking the namespace + label selector combo. It does require some planning on label definitions, but seems to be the most "kubernetes-native" way to do this.

@dgoodwin
Copy link
Contributor

We have a use case where we are handling automated backups for specific namespaces (actually archivals and transfers), but we do not have control over what is running in those namespaces and thus there wouldn't be any label we could reliably expect to help us filter out all the cluster resources. Might be a situation that could pop up for other users in situations where backups are handled by a separate group to those using the namespaces to run apps.

Would be useful for us to be able to specify just a namespace and omit the cluster objects and at the very least, reduce the storage footprint. (not yet sure the impact having those in there will have, still digging in)

@ncdc
Copy link
Contributor

ncdc commented Sep 28, 2017

Some ideas:

  • split included/excludedResources to included/excludedNamespacedResources and included/excludedClusterResources (kinda icky)
  • have a bool to indicate you want cluster-scoped resources (also kinda icky but I like it better than option 1)

@dgoodwin
Copy link
Contributor

A bool flag sounds great to me, will hold a little bit and see if I can send a PR for that if no other ideas pop up.

@dgoodwin
Copy link
Contributor

dgoodwin commented Oct 2, 2017

Another interesting side effect of the cluster objects, if you backup just a namespace with something like: ark backup create jenkins-4 --include-namespaces jenkins, (note no label selector) and you have more than one project using a persistent volume, you will get a snapshot created for every PVC in the cluster.

Could we generalize that if the backup specifies namespaces, only snapshot pvc's from those namespaces?

@ncdc
Copy link
Contributor

ncdc commented Oct 2, 2017

@dgoodwin yes, we are planning on that. WIP in #65

@skriss
Copy link
Member

skriss commented Oct 10, 2017

@ncdc @dgoodwin WDYT about a three-way flag for cluster-scoped resources (this is inline with our --snapshot-volumes flag):

--include-cluster-resources=true: include them in the backup (assuming they're included per included/excluded resources)
--include-cluster-resources=false: don't include them in the backup (this would take precedence over included/excluded resources)
--include-cluster-resources=auto (default): if all namespaces are included per includes/excludes, DO include cluster resources; if not all namespaces are included, DON'T include cluster resources

I think this gives the expected behavior by default, with the ability to override it if necessary.

I'm happy to implement.

@dgoodwin
Copy link
Contributor

Sounds ok to me especially if it's matching established precedent.

My situation actually boiled down to needing to exclude all but a few cluster level resources, but I think the above would be generally useful just the same.

@ncdc
Copy link
Contributor

ncdc commented Oct 11, 2017

SGTM

@skriss
Copy link
Member

skriss commented Oct 11, 2017

@ncdc this will be a breaking change when specifying included namespaces, but I think it's a reasonable break given what it is and where we are - do you agree?

@ncdc
Copy link
Contributor

ncdc commented Oct 11, 2017

I think it's reasonable. I also think auto should really be "do our best to back up every cluster-scoped resource that is relevant to the set of items being backed up". In the short term, that means traversing from pod to pvc to pv to get the relevant PVs. Longer-term, I could envision us having some sort of model where we do something like this:

  1. Set resource priorities to ...,serviceaccounts,widgets,...
  2. Back up a serviceaccount, ns=foo, name=bar
    1. A custom action adds that to a list
  3. Run a custom filter against cluster-scoped resource "widgets"
    1. If widget X has a reference to any serviceaccount from the list above, include it in the backup

This would need to be extensible, pluggable, and configurable, but I think it would be pretty powerful (and fit the OpenShift oauth token use case)

@skriss
Copy link
Member

skriss commented Oct 11, 2017

do you agree that if the flag is set to false, then PV's should not be captured as part of their PVCs?

@ncdc
Copy link
Contributor

ncdc commented Oct 11, 2017

Ugh, this is where it gets messy. Maybe it would be easier if we only had 2 modes - you get all cluster-scoped resources based on --include-resources and --exclude-resources, or you only get those associated with what is being backed up? That way, at least you don't get surprised when you say "false" and expect to still have your PVs? Or is that equally confusing?

@skriss
Copy link
Member

skriss commented Oct 11, 2017

I actually think that "false -> no PVs" makes sense and isn't messy:

If you're backing up 1 or multiple namespaces:

  • --include-cluster-resources=true: same as current behavior (cluster-scoped resources will be included according to included/excluded resources, label selector)
  • --include-cluster-resources=false: NO cluster-scoped resources will be included in the backup
  • --include-cluster-resources=auto: only cluster-scoped resources related to the namespaces being backed up are included

If you're backing up all namespaces :

  • --include-cluster-resources=true: same as current behavior (cluster-scoped resources will be included according to included/excluded resources, label selector)
  • --include-cluster-resources=false: NO cluster-scoped resources will be included in the backup
  • --include-cluster-resources=auto: same as current behavior (cluster-scoped resources will be included according to included/excluded resources, label selector)

@ncdc
Copy link
Contributor

ncdc commented Oct 11, 2017

ok, now that you spelled it out, sgtm

@skriss
Copy link
Member

skriss commented Oct 11, 2017

then over time, per your previous comment, we can improve the behavior of "auto" to be more intelligent about cluster-scoped resource selection

@ncdc
Copy link
Contributor

ncdc commented Oct 17, 2017

@skriss do you think we've addressed this and can close it?

@skriss
Copy link
Member

skriss commented Oct 20, 2017

@ncdc yeah I think so. we may make further improvements to the "auto" behavior down the line but we can track separate issues if/when we do that.

@skriss
Copy link
Member

skriss commented Oct 20, 2017

closing.

@skriss skriss closed this as completed Oct 20, 2017
dymurray added a commit to dymurray/velero that referenced this issue Apr 16, 2019
…ter-rebase

Fixes logger configuration for restore plugins
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants