Add a configurable bufferPeriod between when a segment is marked unused and deleted by KillUnusedSegments duty #12599

capistrant · 2022-06-02T21:09:03Z

This PR addresses part of the proposal in #12526

Description

Made an incompatible change to druid_segments table

Added a new column, ~~last_used VARCHAR(255)~~ used_flag_last_updated VARCHAR(255) to the druid_segments table. When creating the druid_segments table from scratch, this column is not nullable. However, to more easily facilitate an upgrade to a version of Druid with this code, the migration steps to ALTER druid_segments allow the column to have null values. Perhaps we can document an optional post upgrade step to ALTER the table to not allow nulls that is contingent upon completing the existing optional post-upgrade step to populate last_used for already unused rows. The coordinator will incrementally update these nulls to valid date strings for all already unused segments post-upgrade. This automatically brings the metastore into compliance with the new version of druid.

This column is a UTC date string corresponding to the last time that the used column was modified.

Added a configuration to the `druid.coordinator.kill.` family, `druid.coordinator.kill.bufferPeriod`

druid.coordinator.kill.bufferPeriod is a Duration that defines the amount of time that a segment must have been unused for before KillUnusedSegments will potentially kill it. For example, with the default PT24H. If I mark a segment X unused at 2022-06-01T00:05:00.000Z. By rule, this segment cannot be killed until at or after 2022-06-02T00:05:00.000Z.

The most prominent use of this new configuration is to prevent deletion of data from Druid by mistake. It can be thought of like the trash folder in HDFS. Marking segments unused is the rm. Without trash, the data is just gone. With trash, we can recover. The period is the amount of time before pending trash sent away for good.

Added a single threaded Executor on the coordinator for updating null values of new column

Segments who are already unused at the time of the upgrade will have a null value for used_flag_last_updated. In this state, these segments will never be killed. The Coordinator will start a background thread to update all of these null values for segments who are already marked unused (segments with null values who are still used don't need a valid date string for this column. They will get one if/when they are marked unused).

Alternative Design

Alternatively, I could have embedded last_used within the payload. The functionality at the end of the day would be the same, but I have listed some pros and cons below.

Pros

No table schema change needed to upgrade Druid, making life easier for an operator.

Cons

If you were adding this feature in the initial pre-release development of Druid, you'd probably steer away from accumulating too much in the payload column and instead create native columns in the database table
- embedding it in this column is almost hiding it. Or at least making it harder for others to discover and understand.
You need to pull the payload back in the query for unused segments. Increased db I/O, network, druid memory and compute demands
You need to process the payloads in Druid code instead of simply filtering on the column value if it were native to the metastore table

Upgrade Plan

Requiring a database table alteration is a breaking change. It introduces an extra requirement for an operator to upgrade from an older version of Druid. I have done my best to limit the complexity of upgrading and provide as much assistance as possible to the Druid operators out there.

Requiring a database table alteration is a breaking change for any cluster whose coordinator doesn't have DDL permission on startup. If there are no such privileges, the operator of the cluster will need to handle adding the column in order to update to 0.24+. I have done my best to limit the complexity of upgrading and provide as much assistance as possible to the Druid operators out there.

No work necessary case

There is a case where there is no extra work necessary for the Druid operator to upgrade. If the operator has used the default of true for druid.metadata.storage.connector.createTables, and their metastore user has DDL privs, the table will automatically be altered at startup if the ~~last_used~~ used_flag_last_updated column is missing from the druid_segments table.

The coordinator and overlord will startup just fine. And all future changes to the used column (plus new segment creation) will populate ~~last_used~~ used_flag_last_updated. For existing segments who match used = true, the value of ~~last_used~~ used_flag_last_updated will be null. I have coded up the logic for finding segments to kill in a way that ignores the bufferPeriod for these segments with null column values. This means that KillUnusedSegments will use the same logic as pre-upgrade when looking for killable segments in the metadata store. The segment killing mechanism will ignore these rows with null used_flag_last_updated. The Coordinator will eventually populate the column with a real date string, thus starting the bufferPeriod clock.

Some work necessary case

If the operator has either:

set druid.metadata.storage.connector.createTables to false
not given their metastore user DDL privs

They will need to execute an ALTER statement themselves before the upgrade. To make things easy for them, I have created a new Druid cli tool called UpdateTables that can perform this for them by executing the same alter table code path as the no work necessary case described above. This tool, as well as the actual ALTER command - in case the operator wants to do it themselves - is documented in a new upgrade-prep.md resource.

Optional post upgrade action

If the operator is interested in populating all of their legacy unused segments with a last_used date string following the upgrade, I have included another action in the new UpdateTables cli tool that will do so. Again, I have documented the tool and provided the actual UPDATE statement that they could use manually. This step is completely optional and only needed if the operator desires that their legacy unused segments come under the purview of the new bufferPeriod config post-upgrade.

This is no longer required now that the coordinator will handle populating used_flag_last_updated with already unused segments at the time of the upgrade.

Key changed/added classes in this PR

MetadataStorageConnector - added 2 methods to interface
IndexerSQLMetadataStorageCoordinator - create new segments with last_used column populated
SQLMetadataConnector - code for altering and validating the druid_segments table
SQLMetadataSegmentPublisher - publish segments with last_used column populated
SegmentsMetadataManager - update signature of interface method
SqlSegmentsMetadataManager - manage and use last_used column properly
SqlSegmentsMetadataQuery.java - manage last_used properly
DruidCoordinatorConfig - new config
Main - new cli tool
UpdateTables - new cli tool

This PR has:

been self-reviewed.
added documentation for new or modified features or behaviors.
added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
added integration tests.
been tested in a test Druid cluster.

…and kill of segment

server/src/main/java/org/apache/druid/metadata/SQLMetadataConnector.java

server/src/test/java/org/apache/druid/metadata/SqlSegmentsMetadataManagerTest.java

gianm · 2022-06-14T00:23:05Z

#12404 (currently-open PR) also contemplates doing schema changes in the metadata store. @capistrant I see you've commented in there too. Will the feature in this PR be able to use that other PR's mechanism? (Once it's merged?)

gianm · 2022-06-14T00:28:06Z

Alternatively, should #12404 use the schema migration mechanism from this patch? I haven't reviewed both closely. I'm not sure which mechanism is preferable. Curious what @AmatyaAvadhanula and @imply-cheddar and @kfaraz think too.

gianm · 2022-06-14T00:36:01Z

As to the main change: what do you think of calling the new column last_updated? IMO, that's clearer, since last_used sounds like it might be the last time the segment was loaded or queried or something like that.

Generally, this change would be very welcome. Measuring time-to-kill from the last update time makes more sense than measuring it from the data interval.

capistrant · 2022-06-15T12:37:52Z

As to the main change: what do you think of calling the new column last_updated? IMO, that's clearer, since last_used sounds like it might be the last time the segment was loaded or queried or something like that.

Generally, this change would be very welcome. Measuring time-to-kill from the last update time makes more sense than measuring it from the data interval.

I'm certainly open to different naming. last_changed spooks me because there could be a future where we change a row for some other reason than flipping the used flag. Perhaps used_last_updated, [marked_]unused_since (although this one might require us to re-think what the timestamp is when used==true).

capistrant · 2022-06-15T15:59:20Z

#12404 (currently-open PR) also contemplates doing schema changes in the metadata store. @capistrant I see you've commented in there too. Will the feature in this PR be able to use that other PR's mechanism? (Once it's merged?)

My plan is to use as much of this PRs process as is feasible. Our approaches are nearly identical for the actual alteration of the table on startup of the controlling service (overlord for that one, coordinator for mine). The difference really emerges in the migration process. The linked PR is more complicated than mine because it is replacing existing always-on functionality. That requires them to be deliberate about how they transition from querying sql the old way to the new way. They background a thread on startup that incrementally brings the table up to spec by populating rows. I could certainly implement a similar approach instead of allowing the operator to optionally make their existing unused segments compatible with the buffer period by using a druid cli tool or an adhoc sql statement.

Long story short, I don't think there is anything 12404 would/should take from my implementation. However, there is certainly shared code that I can take from them once they are merged. I will contemplate the idea of a background thread that brings the table up to spec, it would really be pretty straight forward. If there are no unused rows with null last_used exit immediately. Otherwise use 12404 incremental batching process to update with current date until there are no null values in the column. That would eliminate overhead/confusion for our operators

capistrant · 2022-06-23T14:41:11Z

Todo reminder for myself: evaluate feasibility of integration test of UpdateTable.java cli tool

… tests

…ss columns

abhishekagarwal87 · 2022-11-30T08:37:26Z

@capistrant - are you still working on this PR? Asking since this is a really useful feature. And after this, we can enable the auto-killing of unused segments.

capistrant · 2022-12-13T19:21:12Z

@capistrant - are you still working on this PR? Asking since this is a really useful feature. And after this, we can enable the auto-killing of unused segments.

@abhishekagarwal87 yes I am still very much interested in getting these merged upstream. I have continued to try and keep it up to date with master. I see that I have some recent conflicts that I have to resolve. I will try to fit that in this week

abhishekagarwal87 · 2023-02-14T05:19:21Z

@capistrant - Just checking to see if you are still planning to take this through. If not, are you open to someone else picking it up?

jon-wei · 2023-07-25T01:01:12Z

@capistrant - Checking again to see if you're interested in picking this up again, if not, would you mind if I picked it up?

jon-wei · 2023-08-14T18:45:21Z

@capistrant Thanks for the update, taking a look at this today

jon-wei

@capistrant LGTM, can you address https://github.com/apache/druid/pull/12599/files#r940734962

…ssues

maytasm

LGTM

maytasm · 2023-08-17T06:44:10Z

@jon-wei @capistrant Is this PR good to merge?

@jon-wei , I tried clicking on https://github.com/apache/druid/pull/12599/files#r940734962 in your previous comment but i'm not seeing any comment when I open the link

Follow up changes to #12599 Changes: - Rename column `used_flag_last_updated` to `used_status_last_updated` - Remove new CLI tool `UpdateTables`. - We already have a `CreateTables` with similar functionality, which should be able to handle update cases too. - Any user running the cluster for the first time should either just have `connector.createTables` enabled or run `CreateTables` which should create tables at the latest version. - For instance, the `UpdateTables` tool would be inadequate when a new metadata table has been added to Druid, and users would have to run `CreateTables` anyway. - Remove `upgrade-prep.md` and include that info in `metadata-init.md`. - Fix log messages to adhere to Druid style - Use lambdas

maytasm · 2024-01-23T23:44:36Z

Should used_flag_last_updated column be part of the index in the segment table?

capistrant added 4 commits May 26, 2022 09:33

Add new configurable buffer period to create gap between mark unused …

c836aa8

…and kill of segment

Changes after testing

e18c6f1

fixes and improvements

f974016

changes after initial self review

7b69b68

capistrant added Incompatible Release Notes Design Review Area - Segment Balancing/Coordination Area - Metadata labels Jun 2, 2022

Merge branch 'master' into implements-12526

c78b282

capistrant commented Jun 7, 2022

View reviewed changes

server/src/main/java/org/apache/druid/metadata/SQLMetadataConnector.java Outdated Show resolved Hide resolved

capistrant commented Jun 7, 2022

View reviewed changes

server/src/main/java/org/apache/druid/metadata/SQLMetadataConnector.java Outdated Show resolved Hide resolved

capistrant commented Jun 7, 2022

View reviewed changes

server/src/main/java/org/apache/druid/metadata/SQLMetadataConnector.java Outdated Show resolved Hide resolved

capistrant commented Jun 7, 2022

View reviewed changes

server/src/test/java/org/apache/druid/metadata/SqlSegmentsMetadataManagerTest.java Outdated Show resolved Hide resolved

capistrant added 6 commits June 8, 2022 07:16

self review changes

d8b4cc7

update sql statement that was lacking last_used

c571162

shore up some code in SqlMetadataConnector after self review

95ccd4e

Merge branch 'master' into implements-12526

5d38b4c

fix derby compatibility and improve testing/docs

8a8e518

fix checkstyle violations

7b52874

capistrant added 4 commits June 22, 2022 06:46

Merge branch 'master' into implements-12526

2fb019e

Fixes post merge with master

4e0efb0

add some unit tests to improve coverage

0621295

ignore test coverage on new UpdateTools cli tool

a1e9735

capistrant added 5 commits October 11, 2022 11:46

fix compilation issues after merge with master

58e0942

Merge branch 'master' into implements-12526

666837e

Merge branch 'master' into implements-12526

4647bef

discovered and fixed metastore inserts that were breaking integration…

2e4c6ec

… tests

fixup forgotten insert by using pattern of sharing now timestamp acro…

225b8be

…ss columns

capistrant added 3 commits December 15, 2022 11:43

Merge branch 'master' into implements-12526

2899c11

Merge branch 'master' into implements-12526

cd43ba5

fix issue introduced by merge

862f6e9

capistrant added 2 commits August 11, 2023 13:36

Merge branch 'master' into implements-12526

3e0f21f

fixup after merge with master

5940760

github-actions bot added the Area - Documentation label Aug 13, 2023

jon-wei approved these changes Aug 15, 2023

View reviewed changes

add some directions to docs in the case of segment table validation i…

c30817c

…ssues

maytasm mentioned this pull request Aug 16, 2023

Enable Continuous auto kill #14831

Merged

10 tasks

maytasm approved these changes Aug 17, 2023

View reviewed changes

jon-wei merged commit 9c124f2 into apache:master Aug 18, 2023
74 checks passed

kfaraz mentioned this pull request Aug 18, 2023

Clean up after add kill bufferPeriod #14868

Merged

10 tasks

This was referenced Aug 30, 2023

Enhance the logic used by the Coordinator when determining if an unused segment can be killed (permanently deleted) #12526

Closed

Add new configuration for kill task to skip segments that has been last modified within the config value time #9889

Closed

LakshSingla added this to the 28.0 milestone Oct 12, 2023

LakshSingla mentioned this pull request Nov 4, 2023

[DRAFT] 28.0.0 release notes #15326

Closed

abhishekrb19 mentioned this pull request Jan 17, 2024

Kill tasks honor the buffer period of unused segments #15710

Merged

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a configurable bufferPeriod between when a segment is marked unused and deleted by KillUnusedSegments duty #12599

Add a configurable bufferPeriod between when a segment is marked unused and deleted by KillUnusedSegments duty #12599

capistrant commented Jun 2, 2022 •

edited

Loading

gianm commented Jun 14, 2022

gianm commented Jun 14, 2022

gianm commented Jun 14, 2022 •

edited

Loading

capistrant commented Jun 15, 2022 •

edited

Loading

capistrant commented Jun 15, 2022

capistrant commented Jun 23, 2022

abhishekagarwal87 commented Nov 30, 2022

capistrant commented Dec 13, 2022

abhishekagarwal87 commented Feb 14, 2023

jon-wei commented Jul 25, 2023

jon-wei commented Aug 14, 2023

jon-wei left a comment

maytasm left a comment

maytasm commented Aug 17, 2023

maytasm commented Jan 23, 2024

Add a configurable bufferPeriod between when a segment is marked unused and deleted by KillUnusedSegments duty #12599

Add a configurable bufferPeriod between when a segment is marked unused and deleted by KillUnusedSegments duty #12599

Conversation

capistrant commented Jun 2, 2022 • edited Loading

Description

Made an incompatible change to druid_segments table

Added a configuration to the druid.coordinator.kill. family, druid.coordinator.kill.bufferPeriod

Added a single threaded Executor on the coordinator for updating null values of new column

Alternative Design

Upgrade Plan

No work necessary case

Some work necessary case

Optional post upgrade action

Key changed/added classes in this PR

gianm commented Jun 14, 2022

gianm commented Jun 14, 2022

gianm commented Jun 14, 2022 • edited Loading

capistrant commented Jun 15, 2022 • edited Loading

capistrant commented Jun 15, 2022

capistrant commented Jun 23, 2022

abhishekagarwal87 commented Nov 30, 2022

capistrant commented Dec 13, 2022

abhishekagarwal87 commented Feb 14, 2023

jon-wei commented Jul 25, 2023

jon-wei commented Aug 14, 2023

jon-wei left a comment

Choose a reason for hiding this comment

maytasm left a comment

Choose a reason for hiding this comment

maytasm commented Aug 17, 2023

maytasm commented Jan 23, 2024

capistrant commented Jun 2, 2022 •

edited

Loading

Added a configuration to the `druid.coordinator.kill.` family, `druid.coordinator.kill.bufferPeriod`

gianm commented Jun 14, 2022 •

edited

Loading

capistrant commented Jun 15, 2022 •

edited

Loading