Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Store PodVolumeBackups in obj storage & use as source of truth #1577

Merged
merged 14 commits into from
Jul 24, 2019

Conversation

carlisia
Copy link
Contributor

@carlisia carlisia commented Jun 12, 2019

  • Deprecate the annotation on the pod within the backup tarball (eventually remove)

  • Serialize the PodVolumeBackups as <backup-name>-podvolumebackups.json.gz that stores a serialized list of PodVolumeBackups and store it in object storage

  • Sync to the cluster through the backup sync controller

  • Modify the code that relies on original annotation to map pods and volumes to restic snapshots/Modify restore to use this source, but maintain existing code so we can restore “old” backups

  • Implementation needs to be backwards compatible with 1.0 ability for restoring backups

Fixes #1169.

Signed-off-by: Carlisia carlisiac@vmware.com

@carlisia carlisia added Restic Relates to the restic integration Restic - GA needed for restic integration to be considered GA labels Jun 12, 2019
@carlisia carlisia changed the title Deprecate pod annotation system Store PodVolumeBackups in obj storage & use as source of truth Jun 12, 2019
Copy link
Member

@skriss skriss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added some specific comments, but to summarize - users will still use the backup.velero.io/backup-volumes annotation to indicate which pod volumes they want to back up using restic. That's not changing. What is changing is that we will no longer use the snapshot.velero.io/<volume-name> annotation to store the restic snapshot ID when a backup is taken; the <backup-name>-podvolumebackups.json.gz file will be used instead.

pkg/restic/common.go Outdated Show resolved Hide resolved
site/docs/master/restic.md Outdated Show resolved Hide resolved
site/docs/master/restic.md Outdated Show resolved Hide resolved
site/docs/master/restic.md Outdated Show resolved Hide resolved
site/docs/master/restic.md Outdated Show resolved Hide resolved
site/docs/master/restic.md Outdated Show resolved Hide resolved
site/docs/master/restic.md Outdated Show resolved Hide resolved
pkg/restic/common.go Outdated Show resolved Hide resolved
pkg/restic/common.go Outdated Show resolved Hide resolved
@skriss
Copy link
Member

skriss commented Jun 13, 2019

👍 everything so far looks fine now.

I would definitely refer to the code for the <backup-name>-volumesnapshots.json.gz file as you go through the rest of this - while this won't be identical, it will be similar and tracing how that file gets created/saved/restored from should give you a pretty good idea of what code needs to change.

A few relevant references for the current volumesnapshots code:
https://github.com/heptio/velero/blob/master/pkg/backup/item_backupper.go#L448-L459
https://github.com/heptio/velero/blob/master/pkg/controller/backup_controller.go#L568-L588
https://github.com/heptio/velero/blob/master/pkg/persistence/object_store.go#L193
https://github.com/heptio/velero/blob/master/pkg/controller/restore_controller.go#L435
https://github.com/heptio/velero/blob/master/pkg/restore/restore.go#L832-L869

@carlisia
Copy link
Contributor Author

I super appreciate the links, thank you! 🙏

@carlisia
Copy link
Contributor Author

image

The content of the json file looks exactly like the podvolumebackups CR.

@carlisia
Copy link
Contributor Author

@skriss, @nrb question: do we want to add any podvolumebackup info to the velero backup describe output? If so, what? I'm thinking useful options could be number of PodVolumeBackup items, as in:

Pod Volume Backups: 1

@skriss
Copy link
Member

skriss commented Jun 18, 2019

Per slack we're already showing it, just labeled as "restic backups"

Copy link
Member

@skriss skriss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few more comments - overall looks like it's headed in the right direction!

pkg/controller/backup_controller.go Outdated Show resolved Hide resolved
pkg/backup/request.go Outdated Show resolved Hide resolved
pkg/controller/backup_controller.go Outdated Show resolved Hide resolved
pkg/controller/backup_controller.go Outdated Show resolved Hide resolved
pkg/persistence/object_store.go Outdated Show resolved Hide resolved
pkg/persistence/object_store_layout.go Outdated Show resolved Hide resolved
Carlisia added 5 commits July 12, 2019 10:48
Deprecating the annotation on the pod within the backup tarball since we
are going to stop using this annotation to identify which volumes are
restic backups.

Signed-off-by: Carlisia <carlisiac@vmware.com>
Signed-off-by: Carlisia <carlisiac@vmware.com>
Signed-off-by: Carlisia <carlisiac@vmware.com>
Signed-off-by: Carlisia <carlisiac@vmware.com>
Signed-off-by: Carlisia <carlisiac@vmware.com>
Signed-off-by: Carlisia <carlisiac@vmware.com>
@carlisia
Copy link
Contributor Author

@skriss question wrt Sync to the cluster through the backup sync controller: I'm not clear on what is new that needs to be done with the sync that is already not being done.

@skriss
Copy link
Member

skriss commented Jul 15, 2019

question wrt Sync to the cluster through the backup sync controller: I'm not clear on what is new that needs to be done with the sync that is already not being done.

The backup sync controller will need to be updated so when it's syncing a backup from object storage into the cluster, it also reads the contents of the PodVolumeBackups file and ensures that the custom resources contained in that file exist in the cluster - if they don't exist, they'll need to be created.

site/docs/master/restic.md Outdated Show resolved Hide resolved
Carlisia added 3 commits July 16, 2019 14:18
Signed-off-by: Carlisia <carlisiac@vmware.com>
Signed-off-by: Carlisia <carlisiac@vmware.com>
Signed-off-by: Carlisia <carlisiac@vmware.com>
Copy link
Member

@skriss skriss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added another round of comments.

I'm thinking we may want to implement the changes to pkg/restore in a separate PR to keep this one from getting too big. This one should be fine on its own.

pkg/backup/builderBackup.go Outdated Show resolved Hide resolved
pkg/backup/builderPodVolumes.go Outdated Show resolved Hide resolved
pkg/backup/item_backupper.go Outdated Show resolved Hide resolved
pkg/backup/item_backupper_test.go Outdated Show resolved Hide resolved
pkg/backup/request.go Outdated Show resolved Hide resolved
pkg/persistence/object_store.go Outdated Show resolved Hide resolved
pkg/persistence/object_store.go Outdated Show resolved Hide resolved
pkg/persistence/object_store.go Outdated Show resolved Hide resolved
pkg/restic/backupper.go Outdated Show resolved Hide resolved
pkg/restic/common.go Show resolved Hide resolved
@skriss
Copy link
Member

skriss commented Jul 17, 2019

Let me know if you want to talk through/pair on any of the comments!

Signed-off-by: Carlisia <carlisiac@vmware.com>
@carlisia
Copy link
Contributor Author

I tried to address the comments, PTAL.

I'm stuck on how to resolve this mess, any hint: https://github.com/heptio/velero/pull/1577/files#diff-d0a3ff018929aec3abbaa4c8a2c299ccR575

I'm suspecting the answer will be initialize the entire struct with mocked objects for each element?

Copy link
Member

@skriss skriss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code's looking good! I had a few comments but they should be pretty minor.

I haven't reviewed the tests again yet - doing that next.

pkg/controller/backup_sync_controller.go Show resolved Hide resolved
pkg/persistence/object_store.go Show resolved Hide resolved
pkg/restic/backupper.go Outdated Show resolved Hide resolved
pkg/restic/common.go Outdated Show resolved Hide resolved
// // VolumeSnapshots: mock.Anything,
// }
// TODO: [carlisia] resolve this mock
backupStore.On("PutBackup", mock.Anything).Return(nil)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think what you want to do here is define a function that can be used as an arg to mock.MatchedBy -- so something like the following:

func hasNameAndCompletionTimestamp(info persistence.BackupInfo) bool {
    return info.Name == test.backup.Name &&
               strings.Contains(info.Metadata.String(), `"completionTimestamp": "2006-01-02T22:04:05Z"`)    
}

And then this line becomes:

backupStore.On("PutBackup", mock.MatchedBy(hasNameAndCompletionTimestamp)).Return(nil)

pkg/controller/backup_sync_controller.go Outdated Show resolved Hide resolved
pkg/persistence/object_store.go Show resolved Hide resolved
Signed-off-by: Carlisia <carlisiac@vmware.com>
Signed-off-by: Carlisia <carlisiac@vmware.com>
Copy link
Member

@skriss skriss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple more comments, all test-related. If needed we can merge as-is and then follow up.

pkg/backup/item_backupper_test.go Outdated Show resolved Hide resolved
pkg/backup/item_backupper_test.go Outdated Show resolved Hide resolved

// Ensure we have a CompletionTimestamp when uploading.
// Failures will display the bytes in buf.
completionTimestampIsPresent := func(buf *bytes.Buffer) bool {
return strings.Contains(buf.String(), `"completionTimestamp": "2006-01-02T22:04:05Z"`)
completionTimestampIsPresent := func(info persistence.BackupInfo) bool {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's rename to hasNameAndCompletionTimestamp, and also update the comment just above.

pkg/controller/backup_controller_test.go Show resolved Hide resolved
pkg/controller/backup_controller_test.go Outdated Show resolved Hide resolved
@carlisia
Copy link
Contributor Author

Thank you, making those changes!

Signed-off-by: Carlisia <carlisiac@vmware.com>
@carlisia
Copy link
Contributor Author

I'm trying to make sense of this output that I'm getting intermittently: https://gist.github.com/carlisia/8eb210744758a70e41af47216a4107ae

It happens when I run TestBackupItemNoSkips on the item_backupper_test.

@skriss
Copy link
Member

skriss commented Jul 24, 2019

I see:

expected: "resources/pods/cluster/.json"
actual  : "resources/pods/namespaces/foo/bar.json"

Which looks related to #1577 (comment) - you need to fully revert changes that test case.

I also see a typo error related to the backup controller tests -- "mock.Anythin" (should be "mock.Anything")

@carlisia
Copy link
Contributor Author

Ah yes, I had reverted that line but not expectedTrackedPVCs: sets.NewString(key("foo", "bar"), key("foo", "baz")),. Added it, pushing it in a sec.

If you search my diff, there's no resources/pods/cluster/.json anywhere. Same when I do a global search in my branch. This is puzzling.

There is no mock.Anythin in my diff. I did have this typo yesterday, but corrected it. It's as if an old test result is being outputted.

Signed-off-by: Carlisia <carlisiac@vmware.com>
@skriss
Copy link
Member

skriss commented Jul 24, 2019

There is no mock.Anythin in my diff. I did have this typo yesterday, but corrected it. It's as if an old test result is being outputted.

Often this implies the tests are failing to compile - I'll try your pushed code locally

@carlisia
Copy link
Contributor Author

Still getting the failure intermittently (as I expected since I didn't change anything relevant).

I think it's about this:

mock: Unexpected Method Call
-----------------------------

PutBackup(persistence.BackupInfo)
		0: persistence.BackupInfo{Name:"backup-1", Metadata:(*bytes.Buffer)(0xc000600000), Contents:(*os.File)(0xc0000105a0), Log:(*os.File)(0xc0000d4610), PodVolumeBackups:(*bytes.Buffer)(0xc000374f90), VolumeSnapshots:(*bytes.Buffer)(0xc000374e10)}

The closest call I have is: 

PutBackup(persistence.BackupInfo)
		0: persistence.BackupInfo{Name:"backup-1", Metadata:io.Reader(nil), Contents:io.Reader(nil), Log:io.Reader(nil), PodVolumeBackups:io.Reader(nil), VolumeSnapshots:io.Reader(nil)}

Which I think it's complaining that the BackupInfo is not being built as expected when I'm calling that PutBackup mock.

@skriss
Copy link
Member

skriss commented Jul 24, 2019

Everything runs fine for me - tried running things 100x to look for flakes. It's also passing CI

@nrb
Copy link
Contributor

nrb commented Jul 24, 2019

What commands are you both running for these tests?

@carlisia
Copy link
Contributor Author

Yeah, good to go!

I think vscode was caching something. Closed it and opened it and I don't get it anymore.

I ran it a few times on my terminal and ran fine.

Copy link
Member

@skriss skriss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! @nrb @prydonius PTAL. We'll be following up with #1691 for the restore side of things.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Restic - GA needed for restic integration to be considered GA Restic Relates to the restic integration
Projects
None yet
Development

Successfully merging this pull request may close these issues.

store PodVolumeBackups in obj storage & sync into cluster
4 participants