Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Backport] Kill tasks honor the buffer period of unused segments (#15710) #15811

Merged

Commits on Jan 31, 2024

  1. Kill tasks honor the buffer period of unused segments (apache#15710)

    * Kill tasks should honor the buffer period of unused segments.
    
    - The coordinator duty KillUnusedSegments determines an umbrella interval
     for each datasource to determine the kill interval. There can be multiple unused
    segments in an umbrella interval with different used_status_last_updated timestamps.
    For example, consider an unused segment that is 30 days old and one that is 1 hour old. Currently
    the kill task after the 30-day mark would kill both the unused segments and not retain the 1-hour
    old one.
    
    - However, when a kill task is instantiated with this umbrella interval, it’d kill
    all the unused segments regardless of the last updated timestamp. We need kill
    tasks and RetrieveUnusedSegmentsAction to honor the bufferPeriod to avoid killing
    unused segments in the kill interval prematurely.
    
    * Clarify default behavior in docs.
    
    * test comments
    
    * fix canDutyRun()
    
    * small updates.
    
    * checkstyle
    
    * forbidden api fix
    
    * doc fix, unused import, codeql scan error, and cleanup logs.
    
    * Address review comments
    
    * Rename maxUsedFlagLastUpdatedTime to maxUsedStatusLastUpdatedTime
    
    This is consistent with the column name `used_status_last_updated`.
    
    * Apply suggestions from code review
    
    Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com>
    
    * Make period Duration type
    
    * Remove older variants of runKilLTask() in OverlordClient interface
    
    * Test can now run without waiting for canDutyRun().
    
    * Remove previous variants of retrieveUnusedSegments from internal metadata storage coordinator interface.
    
    Removes the following interface methods in favor of a new method added:
    - retrieveUnusedSegmentsForInterval(String, Interval)
    - retrieveUnusedSegmentsForInterval(String, Interval, Integer)
    
    * Chain stream operations
    
    * cleanup
    
    * Pass in the lastUpdatedTime to markUnused test function and remove sleep.
    
    ---------
    
    Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com>
    2 people authored and LakshSingla committed Jan 31, 2024
    Configuration menu
    Copy the full SHA
    c3b325b View commit details
    Browse the repository at this point in the history