Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding in support for backup download #50

Merged
merged 1 commit into from
Sep 12, 2017
Merged

Adding in support for backup download #50

merged 1 commit into from
Sep 12, 2017

Conversation

jrnt30
Copy link
Contributor

@jrnt30 jrnt30 commented Aug 16, 2017

Fixes #16

NOTE: The original work has been rebased off of #40 which needs completion first. If it's preferable I can close this PR and create a net new one to keep the conversation more relevant

  • Adding in support for a new download subcommand of backup
  • Adjusted signing to allow for multiple types
  • Adding in git sha version during build more granular version debugging

@heptibot
Copy link

Can one of the admins verify this patch?

@ncdc
Copy link
Contributor

ncdc commented Aug 16, 2017

Hi @jrnt30, thanks for submitting! One of the (unwritten; need to write them down!) design principles we have for Ark is to limit the need for cloud provider credentials solely to the Ark pod. We want to keep them away from users. If/when we add multi-tenancy/multi-user support to Ark, most users won't have access to the cloud provider credentials.

I've been working on a similar PR - downloading a backup's log file from object storage - and I initially approached it by trying to add a web server to the pod. The ark CLI would talk to the web server (via ingress or some other means), the server would download the log file from object storage and stream it back to the CLI. While it was easy to do an insecure web server with no TLS, authentication, or authorization, we need those things, which brings me to...

I had a lengthy chat with @jbeda, @mattmoyer, and @timothysc yesterday about how to allow ark CLI users to download a backup, its logs, etc. We discussed various options:

  1. Store the resource as a CRD (e.g. BackupLog in my case)
  2. Web server as I mentioned above
    1. Delegate authn/authz to kube apiserver
    2. Aggregate web server with kube apiserver
  3. Signed URLs from object storage (e.g. as described here for gcp, azure, and s3)

We've learned the hard way from past experiences that 1 is not a viable option. There's a strong likelihood that you'll exceed the typical data capacity in etcd at some point this way. Also, we probably shouldn't be sending potentially large amounts of data through the kube apiserver.

Option 2 is getting closer, but the amount of work you have to do to operate a custom server (TLS certs and so on) is high. Also, I think the cost of maintaining this is higher than we'd want to deal with right now - the k8s.io/apiserver framework is still under heavy development.

This brings us to option 3, which is what we think we need to do for the time being. We can start by creating pre-signed URLs that expire when the backup itself expires. We can store these URLs in the backup.status struct. We'll need 1 for the backup tarball and 1 for the log file.

If you're ok with this approach, you could make the changes for the backup tarball here, and I can do the same thing for the backup log. WDYT?

@ncdc
Copy link
Contributor

ncdc commented Aug 16, 2017

We would need to agree on how we generate pre-signed URLs. I suggest we add this to ObjectStorageAdapter:

CreateSignedURL(bucket, key string, lifetime time.Duration) (string, error)

@jrnt30
Copy link
Contributor Author

jrnt30 commented Aug 16, 2017

This is an interesting problem and I appreciate the context and explanation. I hadn't considered this for that particular use case around the multi-tenant environment, your point makes a lot of sense and is something that would be great to address in a transparent fashion as you noted.

For now, the pre-signed URL seems like a viable solution, although I have only used AWS "for real" and would need to take a look at the other providers (and Minio, etc.) to see if that is something that is universally available. I will think through some of the implications of pre-generating the pre-signed URLs for those files. It certainly makes the integration between the ARK client and ARK server easier, as the download command would just need to be adjusted a little bit.

I need to look into the lifecycle of the CRDs deeper to understand how that plays in with the server. My initial thought would be that adding in support to generate the signed url on demand with a lower TTL instead of pre-generating them would be my preference, but I don't fully understand what the implications of doing that would be yet.

Option 2 - The transparent (to the user) proxy via Ark would be handy although for large backups that may be a non-trivial amount of traffic being generated through Ark simply to provide that layer of abstraction. One approach could be to have Ark API to do the auth & then redirect the user to a provider pre-signed URL. If there are cloud-providers that don't allow this then we could actually have Ark serve as the "real" proxy. As you continue to work on that, I'd be interested in learning a bit more about how that interaction works.

@ncdc
Copy link
Contributor

ncdc commented Aug 16, 2017

My initial thought would be that adding in support to generate the signed url on demand with a lower TTL instead of pre-generating them would be my preference, but I don't fully understand what the implications of doing that would be yet.

We discussed this briefly yesterday. If we don't want to have a pre-signed URL that lives as long as the backup does, we have some options:

  1. Create the pre-signed URL with a short lifespan, and have one of the controllers refresh it periodically
  2. A new BackupDownloadRequest CRD
    1. ark POSTs it to kube apiserver and watches it for changes
    2. A new controller sees the request, generates a pre-signed URL, and updates the BackupDownloadRequest with the URL
    3. ark's watch sees the changed BackupDownloadRequest and it uses its URL to download the file
    4. The controller periodically GCs expired requests

@ncdc
Copy link
Contributor

ncdc commented Aug 16, 2017

Option 2 - The transparent (to the user) proxy via Ark would be handy although for large backups that may be a non-trivial amount of traffic being generated through Ark simply to provide that layer of abstraction. One approach could be to have Ark API to do the auth & then redirect the user to a provider pre-signed URL.

You sort of get this OOTB with storing the pre-signed URL in either the Backup or my proposed BackupDownloadRequest. If you have access to get ark.heptio.com/v1 Backups, you're authorized to download the backup.

If there are cloud-providers that don't allow this then we could actually have Ark serve as the "real" proxy. As you continue to work on that, I'd be interested in learning a bit more about how that interaction works.

I don't see us doing this any time soon. API aggregation has a high enough barrier to entry that I think it would turn people away right now (from an installation perspective).

@ncdc
Copy link
Contributor

ncdc commented Aug 16, 2017

FYI I'm about 2/3 of the way through implementing CreateSignedURL for the 3 cloud providers.

@ncdc
Copy link
Contributor

ncdc commented Aug 17, 2017

FYI not feeling well today. Will try to get something posted tomorrow or later tonight if I'm better.

@jrnt30
Copy link
Contributor Author

jrnt30 commented Aug 17, 2017

hope you get to feeling better!

@jrnt30 jrnt30 changed the title Adding in support for backup download WIP: Adding in support for backup download Aug 21, 2017
- Adding in support for a new `download` subcommand of backup
- Adjusted signing to allow for multiple types
- Adding in git sha version during build more granular version
  debugging

Signed-off-by: Justin Nauman <justin.r.nauman@gmail.com>
@jrnt30 jrnt30 changed the title WIP: Adding in support for backup download Adding in support for backup download Sep 12, 2017
@@ -194,8 +194,15 @@ func (br *backupService) DeleteBackupDir(bucket, backupName string) error {
return errors.NewAggregate(errs)
}

func (br *backupService) CreateBackupLogSignedURL(bucket, backupName string, ttl time.Duration) (string, error) {
return br.objectStorage.CreateSignedURL(bucket, getBackupLogKey(backupName), ttl)
func (br *backupService) CreateBackupSignedURL(backupType api.DownloadTargetKind, bucket, backupName string, ttl time.Duration) (string, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm working on adding restore log support, and I'm planning on having the restore log file name have the format restore-<restore name>-logs.gz e.g. restore-mybackup-20170912150214-logs.gz. I suppose after we merge this PR, we could rename this to just CreateSignedURL. WDYT?

@@ -19,14 +19,19 @@ VERSION ?= v0.3.3
GOTARGET = github.com/heptio/$(PROJECT)
OUTPUT_DIR = $(ROOT_DIR)/_output
BIN_DIR = $(OUTPUT_DIR)/bin
GIT_SHA=$(shell git rev-parse --short HEAD)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't expecting to see the git bits in here, but if you want to kill 2 birds with 1 stone, we can do that...

@@ -53,7 +53,7 @@ type BackupService interface {

// CreateBackupLogSignedURL creates a pre-signed URL that can be used to download a backup's log
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: adjust function name & description in godoc

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll fix it in a follow-up

@skriss
Copy link
Member

skriss commented Sep 12, 2017

@jrnt30 @ncdc just the one nit on godoc, which @ncdc can tweak in his restore logs PR. LGTM!

@ncdc ncdc merged commit 5899bea into vmware-tanzu:master Sep 12, 2017
@ncdc ncdc modified the milestone: v0.4.0 Sep 12, 2017
djwhatle pushed a commit to djwhatle/velero that referenced this pull request Feb 25, 2020
Check for nil LastMaintenanceTime in dueForMaintenance
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add "download backup" command to CLI
4 participants