Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Delta housekeeping initial version #101

Open
wants to merge 26 commits into
base: master
Choose a base branch
from

Conversation

lorenzorubi-db
Copy link

utility on top of discoverx to run Delta Housekeeping across multiple tables

Analysis that provides stats on Delta tables / recommendations for improvements, including:

  • stats:size of tables and number of files, timestamps of latest OPTIMIZE & VACUUM operations, stats of OPTIMIZE)
  • recommendations on tables that need to be OPTIMIZED/VACUUM'ed
  • are tables OPTIMIZED/VACUUM'ed often enough
  • tables that have small files / tables for which ZORDER is not being effective

@lorenzorubi-db
Copy link
Author

@edurdevic same as PR #95 opened with my user
latest commit takes care of your final comments
thanks!

@lorenzorubi-db
Copy link
Author

lorenzorubi-db commented Jan 28, 2024

hi @edurdevic
I still need to review further (and document better) but would like that you take a look so that we agree with the approach
in the end the refactoring was much bigger to what I expected... anyhow now apply gives back a single dataframe with 3 boolean columns:

  • rec_optimize with rows that need action with OPTIMIZE
  • rec_vacuum analogous for VACUUM
  • rec_misc other recommendations

plus 3 string columns with the reasons for each
thanks!

@lorenzorubi-db
Copy link
Author

@edurdevic ready to review, thanks

@lorenzorubi-db
Copy link
Author

@edurdevic pls take another look, thanks

Copy link
Contributor

@edurdevic edurdevic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

discoverx/delta_housekeeping.py Outdated Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants