-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(gc): improve periodic GC logic #73
Conversation
Stebalien
commented
Oct 11, 2019
- Don't timeout a full user-requested GC.
- Always make sure that closing the datastore can interrupt a GC.
- Instead of timing out periodic GC, keep going with a delay between iterations.
1. Don't timeout a _full_ user-requested GC. 2. Always make sure that closing the datastore can interrupt a GC. 3. Instead of timing out periodic GC, keep going with a delay between iterations.
cc @aarshkshah1992. Could you review this? |
MaxGcDuration: 1 * time.Minute, | ||
GcInterval: 45 * time.Minute, | ||
GcInterval: 15 * time.Minute, | ||
GcSleep: 10 * time.Second, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I reduced these as probabilistically sampling a log every 10 seconds shouldn't be that expensive.
Rational behind this algorithm:
- If we assume that deletes are randomly distributed, having one value log ready for garbage collection should corallite with other value logs being ready.
- After we do a full pass through all value logs, we shouldn't need to GC for a while.
That's why I have the short sleep/long sleep system.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Stebalien In your first point, do you mean to say that if a sample of a value log file "hits" the discard ratio, the probability that a sample in the next log file will do so too goes up ?
Please can you explain this in a bit more detail ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but, actually, I'm not so sure about my assumption.
- Assumption: Deletes are randomly distributed between all the value logs.
- Conclusion: At any given point in time, all value logs should have approximately the same number of discarded items. Therefore, if any one value log is ready to be garbage collected, others are also likely to be ready for garbage collection.
That's not quite correct. Values in a given value log are temporally correlated so deletes aren't likely to be completely random. However, the fact that enough time has passed for one value log to collect garbage is still a good indication that another value log may also have collected enough garbage for compaction.
@Stebalien LGTM. Just one question to improve my own understanding of how badger GC works. |