Compaction with MSQ #14810

AmatyaAvadhanula · 2023-08-14T19:43:23Z

Adds support for compaction using MSQ.

Description

This PR adds logic to convert the parallel index specs generated by compact tasks into MSQ REPLACE statements.
It then uses the newly added RouterClient to submit these MSQ controllers to /druid/v2/sql/task

Release note

Add the following context to the compact task or auto-compaction config to enable compaction using MSQ.
"context": {
"useMSQCompaction": true
}

This PR has:

gianm · 2023-08-16T15:31:20Z

Architecturally it seems weirdly complicated for the compact task to start a new top-level controller task. It will also probably lead to issues with locking.

One of these seems like a better approach to me. What do you think of these approaches?

Have the compact task itself be an MSQ controller task (similar to how it can itself be an index_parallel controller task). It would need to generate a native query rather than a SQL query for this to work, because it doesn't have direct access to the SQL planner.
Forget the compact task, focus on auto-compaction. Have the Coordinator submit SQL statements rather than compact tasks.

gianm · 2023-08-16T17:00:47Z

Adding a bit to the prior comment: personally I think the second approach, where we don't use compact tasks at all, is best. IMO, the ideal way to do it is something like this:

First, incorporate enough metadata into the metadata store, such that compaction doesn't need to fetch segments from deep storage to figure out what to do. There are a couple of approaches we could take here, including a catalog-based approaches (where metadata is explicitly specified) or a metadata-stashing approach where we save segment row signatures in the metadata store when segments are published. This latter idea would be useful for a bunch of other reasons.

Second, introduce a VACUUM [table] or COMPACT [table] command in SQL. It should take an optional interval and it should leverage the metadata from the first step in order to determine what to do.

Third, have auto-compaction at the Coordinator issue one of these SQL commands.

cryptoe · 2023-12-17T02:05:39Z

@AmatyaAvadhanula , I am closing this PR since the approach mentioned above would require patches unrelated to this PR.
Please feel free to reopen the PR in-case the work is revived.

AmatyaAvadhanula added 2 commits August 15, 2023 01:03

Compaction with MSQ

f6c0017

Submit MSQ tasks from compact task using the new RouterClient

fd10401

AmatyaAvadhanula changed the title ~~[WIP] Compaction with MSQ~~ Compaction with MSQ Aug 15, 2023

AmatyaAvadhanula added 2 commits August 15, 2023 18:28

Resolve merge conflicts

7753bb0

Add task context flag

97dcb69

cryptoe closed this Dec 17, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Compaction with MSQ #14810

Compaction with MSQ #14810

AmatyaAvadhanula commented Aug 14, 2023 •

edited

Loading

gianm commented Aug 16, 2023

gianm commented Aug 16, 2023

cryptoe commented Dec 17, 2023

Compaction with MSQ #14810

Compaction with MSQ #14810

Conversation

AmatyaAvadhanula commented Aug 14, 2023 • edited Loading

Description

Release note

gianm commented Aug 16, 2023

gianm commented Aug 16, 2023

cryptoe commented Dec 17, 2023

AmatyaAvadhanula commented Aug 14, 2023 •

edited

Loading