Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] Implement consistent backup/forget policy #953

Closed
middelink opened this issue May 8, 2017 · 1 comment
Closed

[RFC] Implement consistent backup/forget policy #953

middelink opened this issue May 8, 2017 · 1 comment
Labels
type: feature enhancement improving existing features

Comments

@middelink
Copy link
Member

This is a request for comments to improve (streamline, make more consistent) the current forget expire policy work.

Background

Currently restics ExpirePolicy allows one to expire on Tag, Last, Hourly, Daily, Weekly, Monthly and Yearly policies. For each of the given elements one can specify how many of each one wants.

Issue

The items in the ExpirePolicy are implemented serially, so first the Tags are processed, then the first (most recent) "Last" snapshots are retained and then the "Hourly" etc etc. This means, given an policy of say { Last: 1, Daily: 7, Weekly: 5, Monthly 99 }, and a year worth of running a backup each day, one would expect to see Monthly backups from the last backup of the Month, so 2016-01-31, 2016-02-29, 2016-03-31 etc etc. However, due to the way the ExpirePolicy is executed, the result is:
[...<2016-12-25 (M)> <2016-11-27 (M)> <2016-10-30 (M)> <2016-09-25 (M)> <2016-08-28 (M)> <2016-07-31 (M)> <2016-06-26 (M)> <2016-05-29 (M)> <2016-04-24 (M)> <2016-03-31 (M)> <2016-02-29 (M)> <2016-01-31 (M)>]
Note the for example April 24, definitely not the last day of the month...

The reason this happens is that snapshots are marked off one by one, so we keep the most recent snapshot (due to Last); then 7 daily ones; then 5 weekly ones. Note these weekly ones are already removing snapshots, so by the time the 5 weeks are gone, the Monthly checker can only work with the remaining snapshots of the month.

Idea

Instead of applying policies sequentially, do them in parallel. Go over the (sorted) snapshots one by one and mark snapshots as matching one or more policies which apply. This means in my example that the most recent snapshot matches both Last, Daily, Weekly and Monthly and then start counting the policies down. In the end this results in a much more expected list:
[... <2016-12-31 (M)> <2016-11-30 (M)> <2016-10-31 (M)> <2016-09-30 (M)> <2016-08-31 (M)> <2016-07-31 (M)> <2016-06-30 (M)> <2016-05-31 (M)> <2016-04-30 (M)> <2016-03-31 (M)> <2016-02-29 (M)> <2016-01-31 (M)> ]

Also note that the property of counting snapshots down instead of looking at their actually date is retained. So if we have 6 backups made on Sundays, and one would use --keep-daily 5, only the last backup would be removed, not 5.

Rationale

Consistency is all in backups. If one makes regular daily backups, one expects snapshots of the last day of the week, month, year. Not some arbitrary snapshot which fell out of some complicated formula.

Demo program

Attached you find the source of my demo program. It starts out by creating 400 daily snapshots, the most recent one a year old, applying the policy, (use case for running forget once in a blue moon) and then for a full year, adding a single snapshot and applying the policy per day (use case for folks with more than one machine to backup...) Note for demo purposes, the program assigns a reason to each snapshot, so we can see why it was retained.
policy.zip

old policy:

[<2017-05-08 (L)> <2017-05-07 (L)> <2017-05-06 (L)> <2017-05-05 (D)> <2017-05-04 (D)> <2017-05-03 (D)> <2017-05-02 (W)> <2017-04-30 (W)> <2017-04-23 (W)> <2017-04-16 (W)> <2017-04-09 (W)> <2017-04-02 (M)> <2017-03-26 (M)> <2017-02-26 (M)> <2017-01-29 (M)> <2016-12-25 (M)> <2016-11-27 (M)> <2016-10-30 (M)> <2016-09-25 (M)> <2016-08-28 (M)> <2016-07-31 (M)> <2016-06-26 (M)> <2016-05-29 (M)> <2016-04-24 (M)> <2016-03-31 (M)> <2016-02-29 (M)> <2016-01-31 (M)> <2015-12-31 (M)> <2015-11-30 (M)> <2015-10-31 (M)> <2015-09-30 (M)> <2015-08-31 (M)> <2015-07-31 (M)> <2015-06-30 (M)> <2015-05-31 (M)> <2015-04-30 (M)>]

new policy:

[<2017-05-08 (LDWM)> <2017-05-07 (LDW)> <2017-05-06 (LD)> <2017-04-30 (WM)> <2017-04-23 (W)> <2017-04-16 (W)> <2017-03-31 (M)> <2017-02-28 (M)> <2017-01-31 (M)> <2016-12-31 (M)> <2016-11-30 (M)> <2016-10-31 (M)> <2016-09-30 (M)> <2016-08-31 (M)> <2016-07-31 (M)> <2016-06-30 (M)> <2016-05-31 (M)> <2016-04-30 (M)> <2016-03-31 (M)> <2016-02-29 (M)> <2016-01-31 (M)> <2015-12-31 (M)> <2015-11-30 (M)> <2015-10-31 (M)> <2015-09-30 (M)> <2015-08-31 (M)> <2015-07-31 (M)> <2015-06-30 (M)> <2015-05-31 (M)> <2015-04-30 (M)>]
@fd0
Copy link
Member

fd0 commented May 10, 2017

Huh, nice idea! Please create a PR for it, and we need to update some documentation and maybe write a new blog post for the next release, since this changes behavior :)

btw: This is an awesome proposal!

@fd0 fd0 added the type: feature enhancement improving existing features label May 10, 2017
@fd0 fd0 closed this as completed in #957 May 15, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: feature enhancement improving existing features
Projects
None yet
Development

No branches or pull requests

2 participants