Batch ingestion replace #12137

loquisgon · 2022-01-11T01:29:24Z

Druid batch ingestion replace high level design

This PR adds tombstones to Druid in order to support complete replace semantics. It supports tombstones for primary partitioning and all forms of secondary partitioning (dynamic/linear, hash and range).

Current known limitations (as of March 1, 2022)

The intention is to merge with the current limitations but to remove them in future releases.

The broker is filtering out tombstones from its timeline.

This is a limitation that can be fixed in this PR or leave as future work. The consequence of this is that metadata queries will not consider tombstones. In the extreme case where a datasource's visible segments are all tombstones (i.e. the datasource is practically empty) then queries will fail with an exception. Future work can address this by enabling metadata queries for tombstones or some other means.

Metadata queries will not reflect metadata in tombstones

This is a consequence of the fact that brokers are filtering out tombstones. This will be fixed in a future patch.

`SEGMENT` locking is not supported

Replace will force SEGMENT locking to use TIME_CHUNK locking (a log message will be emitted in the task log. Future work can address this.

Cleanup of tombstones

If all the segments are being completely overshadowed by tombstones and the overshadowed segments are all marked as not used then the tombstones can be removed. From a cleanup perspective tombstones are the same as regular segments so the general cleanup process should suffice for tombstones as well (i.e. KillUnusedSegments coordinator duty will clean them up.)

Tombstone remains after append

If data is appended to an interval with a tombstone the tombstone remains and new data segments are appended. No query issues have been observed.

Upgrade/Downgrade process

Upgrade

The dropExisting flag is being reused for tombstones since tombstones subsume the previous semantics but without the race conditions inherent in the original implementation. This should be fine. If there is a task executing with that flag set while an upgrade is going on, since the tombstones (or dropping segments) are all executed in the main task (single task in sequential, supervisor in parallel ingestion), the outcome depends on which code the main task was in. If main task was still in the "old" code (because of a rolling upgrade) then existing segments will be dropped (which is fine since that's what the user wanted and the drop APIs are still in the code). If it is new code, then tombstones will be created which will also meet the expectations since tombstones subsume dropping segments.

Downgrade

If you upgrade to a version with tombstone support and you use the tombstone feature (i.e. tombstones are created) and then you want to downgrade your system should continue to work but with potential exceptions being thrown regarding the tombstones created and not recognized by previous versions. This should not make Druid stop working but will pollute the logs with the exceptions. The only way to stop those exceptions would be to manually delete the tombstones from the segments metadata table for the corresponding datasources.

Problem statement

Assume there is an existing data source that has data across its time chunks. Assume that you want to replace data for this data source from time t1 to time t2 where t2 > t1. Assume that the some of the replacement time chunks generated by the input data set and its ingestion spec will not have any data. Also assume that the original data set contained data for some of those replacement time chunks. Then the original intervals for those replacement time chunks will not be "replaced, they will keep their data in the original data source. Thus the end result is that today it is not possible to support the intended semantics of a "replace" functionality. This design is intended to make the necessary changes to Druid to support replace semantics for data.

The following example may help clarify the problem.

Example of today functionality

Assume we have the following data set (with DAY segment granularity):

time	country	issues
4/1/20	China	3
4/1/20	Russia	4
4/1/20	USA	2
4/2/20	China	8
4/2/20	China	10
4/2/20	China	12
4/3/20	USA	8
4/3/20	Russia	14
4/3/20	France	20

Now assume we re-ingested same data/segment granularity as above but with a filter removing all countries that have "China" as their value. If we do this we may expect all rows with "China" to be removed from our data set. However this is what we get:

time	country	issues
4/1/20	Russia	4
4/1/20	USA	2
4/2/20	China	8
4/2/20	China	10
4/2/20	China	12
4/3/20	USA	8
4/3/20	Russia	14
4/3/20	France	20

Only the first row with country "China" was removed! This is consistent with today's design but it may be surprising especially for new Druid users. What is going on is that the filter is removing (i.e. 'throw away" in Druid ingestion terminology) all rows with country values "China" so there is no ingestion data for the existing segment for DAY 4/2/20 (i.e. a DAY "time chunk") and therefore the original segment for that time-chunk remains in the data source.

What we need is a concept of "replace with nothing". In many databases this is called a "tombstone". What we need is a functionality that we need to tell Druid in batch ingest: "please replace all time chunks in this interval even if there is no data for them.". This is what this feature is all about.

Assuming that we have changed Druid to have the replace functionality using tombstones and assume also that the existing flag dropExisting has been redefined to support it. So given that we have this functionality, all we have to do with the above re-ingest spec is to set the parameter dropExisting to true and expect to get the new data:

time	country	issues
4/1/20	Russia	4
4/1/20	USA	2
4/3/20	USA	8
4/3/20	Russia	14
4/3/20	France	20

Details & design idea

A replace batch ingestion spec (or simply replace spec) is a batch ingestion spec with the following characteristics:

Input data (the usual stuff)
Batch ingestion intervals in the ioConfig section of the spec (the usual stuff) (we call this the "replace intervals")
A flag in the ioConfig (reuse dropExisting) that indicates that this is replace of time chunks (this is a new behavior for an existing flag supporting similar but not quite exactly the same behavior using a different implementation)

The idea is that after all sub-tasks for a particular ingestion completes and it is ready to publish the segments then, before publishing, the task identifies real segments in input intervals as well as "empty" intervals that should contain tombstones. The tombstones will be represented as DataSegment like real segments but inside its metadata they are clearly identifiable as a "tombstone". After this point, the tombstone segments are added to the real segments and published together by the Overlord. Then the coordinator takes over and assigns them to historicals which "load" tombstones (i.e. being aware that tombstones are special and are not actually stored in the file system) and announce them to the broker. The broker would then incorporate tombstones into its time line and either filtering them out when queries arrive or not filtering them out and send the queries to the historicals which know that queries in tombstone intervals should return no data.

Thus the following are our design goals:

Tombstones will be integrated as close as possible to current functionality. Tombstones are represented with DataSegment (but with no backing file in deep storage or in local segment caches) and should be transparent to most of the existing code
Changes to Druid thus will be the minimal required to keep existing functionality working (i.e. no backward incompatible changes introduced) and make the new replace functionality with tombstones work
Performance would be minimally impacted

Components and their corresponding changes

Middlemanager/peon (create, push tombstones, publish to overlord)

Using the input intervals & the segment granularity for the replace ingestion, identify the time chunks intervals for the ingestion
Subtract from those the ones that are not empty: i.e. any time chunk that overlaps with a segment produced by the replace spec that has data. Note that the segments produced by the ingestion are still not published and are different from any existing, used, segment for the data source.
Technically, after the step above we have the intervals that can be used to create the tombstones. In reality we don't need to create tombstones for them all. We just need to create tombstones for those tombstone intervals that "overlap" with existing, used, segments with an immediately older version (i.e. current segments). So from the intervals above also remove those intervals that do not overlap with any current, used, visible, segment.
Now we have the "tombstone intervals"
For each of these tombstone intervals create a DataSegment with shardSpec of a new type TombstoneShardSpec. Call these DataSegments tombstones . The DataSegments for tombstones are created using versions from the TIME-CHUNK locks, thus SEGMENT locking is not supported. To minimize coordinator issues we gave tombstones a size of 1.
After all the tombstones are created, add these segments to the regular new segments that were created for the ingestion in progress
Then publish all the augmented set of DataSegments as usual
As usual, publishing by the overlord guarantees that all those data segments are inserted in the metadata (this is current normal behavior for publishing segments)
Replace will only work with ingestion specs that provide input intervals in their GranularitySpec. If dropExisting is set to true with an ingestion spec with no input intervals then the ingestion spec is immediately rejected with an error.
Replace will force SEGMENT locking to use TIME_CHUNK locking (a log message will be emitted in the task log.

Coordinator

Assignment: The coordinator by means of the load rules detects that new segments exist for the data source. The new segments may contain tombstones, the coordinator does not care about this and uses its usual assignment strategy to assign the data source’s segments (some of them are tombstones) to historically.
The only potential change in the coordinator may be having an assignment strategy for tombstones so that they all do not end up in the same historical but this may be left for future work.

Historical query path

The historical detects that a new segment has been assigned to it. If the segment is a tombstone then the historical creates an empty QueriableIndex marked as “tombstone” (using a new QueriableIndex method: boolean isFromTombstone()) and announces the tombstone
If the historical receives a query for the data source, the historical’s ServerManager detects the tombstone queryable indices (using the “isFromTombstone” method) and for each of those tombstones it creates a NoOpQueryRunner (this is an existing query runner that always returns an empty result set, exactly what is needed for tombstones). Thus the part of the query hitting a tombstone queryable index will return an empty sequence since that is how the NoOpQueryRunner works and it is the expected result set from a tombstone since tombstones do not have any rows.

Broker

The broker will add the tombstones announced by the historicals like any other segment that has been announced. However the broker needs to be aware of those segments that are tombstones so the segments need to be marked as tombstones whenever appropriate. The broker will avoid processing tombstones after they are incorporated in its timeline since tombstones are not useful in queries because they will not return any data for any query. The broker accomplishes this by filtering out by creating a special VersionedIntervalTimeline that avoids them in its lookup interval method.

Compaction:

The compaction task will traverse the used segments for a data source and it will simply skip those segments that are not tombstones since they will have no data. The previous tombstones might be overshadowed depending on the new compaction granularity or depending on whether there was data appended to the tombstone time chunk after the initial tombstone creation (in the latter case the appended data since it is in the same time chunk will completely overshadow the exiting tombstone). Overshadowed tombstones will be processed exactly as normal data segments after this.

Cleaning tombstone datasegments

Since tombstones are DataSegments in the segment metadata Druid table they are candidates for overshadowing, being marked unused, etc. like any other DataSegment. Initially we will leave it to Druid to clean them up the same as non-tombstone segments. If needed, in the future we may add more aggressive cleanup mechanisms for tombstones.

Design alternatives

Probably the first idea that comes to mind instead of tombstones is simply dropping existing segments when replace is desired. This was actually already tried and implemented (PR). However as you can see in the discussion that functionality is currently marked as "experimental" due to a race condition between dropping and loading segments that may make data temporarily unavailable. Besides dropping segments by itself is not enough to support the full replace functionality. For instance when replacement data using finer granularity may not be always possible to drop the existing, coarser, segment (the replace PR has an example in one of the comments). So dropping segments is more like a subset of replace use cases therefore the new replace functionality discussed in this document subsumes the current dropExisting functionality but eliminating the race condition.

We also considered variations of assigning, announcing and processing queries for datasources with tombstones. The tombstone creation was pretty stable and we feel strongly that using DataSegment to represent tombstones and the normal publishing process works well. For the assigning & announcement we considered having the coordinator announcing them and the broker incorporating them into its timeline. We discarded this option though using the reasoning to stick with the current design as much as possible.

Implementation status

As of Feb 15, 2022

Functionality is restricted to TIME_CHUNK locking and tombstone versions are obtained from the lock's versions. A new shardSpec, TombstoneShardSpec is introduced.

As of Feb 12, 2022

Segment locking is now allowed for all forms of secondary partitioning. Tombstone allocation is done using the corresponding allocators normally used in the particular code flow where tombstones are created.

As of Feb 1, 2022

All functionality as designed has been implemented with the following potential exceptions. The PR build is green. All existing tests for the similar dropExisting flag have been kept and refactored to match replace semantics with tombstones (dropExisting should continue working as is from a functional perspective even though it is now implemented as tombstone). New unit test have been added for new components introduced.

This functionality will not support segment locking. If segment locking is requested at the same time that replace is requested, timeChunk locking will be forced.

Implementation status

As of January 31, 2022

All functionality as designed has been implemented with the following potential exceptions. The PR build is green. All existing tests for the similar dropExisting flag have been kept and refactored to match replace semantics with tombstones (dropExisting should continue working as is from a functional perspective even though it is now implemented as tombstone). New unit test have been added for new components introduced.

The most visible potential gap at this point is lack of complete support for SEGMENT locking. We could prevent replace for the case of SEGMENT locking and leave this support for a future release or just further work on the PR to fully support segment locking.

This PR has:

[x ] been self-reviewed.
- using the concurrency checklist (Remove this item if the PR doesn't have any relation to concurrency.)
[x ] added documentation for new or modified features or behaviors.
[ x] added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
added or updated version, license, or notice information in licenses.yaml
[ x] added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
[ x] added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
[x ] added integration tests.
[x ] been tested in a test Druid cluster.

JulianJaffePinterest · 2022-01-15T22:50:56Z

Perhaps I'm missing something obvious, but why can't this be done with minimal code changes via reusing the used flag? Batch ingestion can mark the previous segments as unused which should be the equivalent of the tombstoning you're describing here.

loquisgon · 2022-01-19T23:20:52Z

Perhaps I'm missing something obvious, but why can't this be done with minimal code changes via reusing the used flag? Batch ingestion can mark the previous segments as unused which should be the equivalent of the tombstoning you're describing here.
Something like what you propose was already tried and resulted in potential data availability issues, see here: #11025 (comment)

JulianJaffePinterest · 2022-01-20T06:37:50Z

@loquisgon but how does this solution avoid the same issue? Tombstones will be announced immediately, before the replacement segments are loaded, and so you still have the scenario where old data is unavailable but new data has not yet been loaded. To my understanding, you are attempting to retrofit atomic table replacement onto atomic partition replacement either way and so the same data availability issues exist here. To help me understand, what happens in your code if the historicals are under load and so take minutes to load the new physical segments? How are the tombstone segments held back in this case?

More broadly, it would be a useful feature for Druid to have the ability to atomically swap a table, not just a partition (which is just a restatement of what you're trying to accomplish here I think). A common way this is done in other systems is to fully load the new version of a table and only once that's done update metadata pointers from the old to the new. Was a similar design considered here? Why or why not? If you've already considered and rejected this approach, can we add a paragraph to the design alternatives discussing why?

loquisgon · 2022-01-21T20:05:42Z

@JulianJaffePinterest This feature is about replacing time chunks that happen to fall in a given interval with data provided in that interval, re-ingesting with dropExisting set to true. It is not intended to be used to replace whole tables but I guess that if the interval in the re-ingestion is large enough it will do that. It is also not intended to have transactional semantics for multiple time chunk replacement. The intention is that the feature will integrate nicely with the current transactional semantics where all segments in a given timechunk are replaced all at once but with no guarantees that segments in different time chunks will be replaced atomically (as described here). So since tombstones are inserted and manipulated most of the time the same as other DataSegments they are intended to work the same from that perspective. The other feature that I mentioned previously, the issue was that a race between dropping old segments and loading new segments sometimes caused data unavailability. This new replace implementation using tombstones, since it does not drop segments, rather it replaces segments with new versions for time chunks, does not have that race condition.

lgtm-com · 2022-01-27T03:39:11Z

This pull request introduces 1 alert when merging 9f2db07 into a813816 - view on LGTM.com

new alerts:

1 for Dereferenced variable may be null

JulianJaffePinterest · 2022-01-27T07:05:07Z

@loquisgon doesn't the same race condition exist though? Since tombstones will be loaded instantly but actual data segments will need to be read from deep storage, memory mapped, etc. if there's load on the historicals data will be dropped from a client perspective (e.g. replaced with a tombstone) before fresh data is available. If we take your logic to determine which data segments to tombstone and instead use it to identify which segments to mark as unused (leaving the segments that will be overshadowed marked as used), what race condition does it introduce compared to your PR?

loquisgon · 2022-01-28T21:30:58Z

@JulianJaffePinterest Yeah dropping the segments and replacing them with tombstones can be very similar in some cases. What I have been trying to say is that replace with tombstones will follow the standard way of segment replacement as explained in the Druid's design documentation. However, let me share the following scenario where replace with tombstones produces the intended semantics as explained in the PR's design discussion and drop does not.

Start with:
2015-09-12T00:00:00.000Z/2015-09-13T00:00:00.000Z
DAY granularity, rows in every hour. (i.e. a full day of data with each of the 24 hours contains rows. This is the typical wikipedia file if you are familiar with it)

Replace with:
2015-09-12T08:00:30.000Z/2015-09-12T20:00:00.000Z
HOUR granularity
Interval covers 12 hours but assume that it only contains three rows for three different hours
In this case, drop would not drop the underlying DAY granularity interval since the replace interval does not cover it completely. Thus the result is that only the hours in the replacement interval that have replacement rows (three) will be replaced (three one hour segments each with one hour will be created) and since the underlying segment (DAY) is there it will only be partially overshadowed so all the other hours will still be there. At the end, all hours would still contain data but now the three hours that were replaced with one row each would have a single row.

However, with this new replace functionality (as stated in the design of this PR and the code), the replace would generate 12 new segments, nine of them tombstones and the three segments with data, one row each). All these segments would still partially overshadow the existing DAY segment but the net effect would be that all data in the 12 hours in the replace interval would be replaced by just the three new rows in the input (all other hours in the replace would not report any data since they are covered by the tombstones).

I hope this example helps to clarify the semantic difference.

gianm · 2022-02-04T20:59:56Z

Very interesting!

Some comments on the design first. Afterwards, a musing 🙂

Compatibility scenario 1: what happens if someone using dropExisting today does a rolling cluster update? MMs are typically updated first. When an ingest task gets launched on a new MM, it will generate tombstones. The Overlords, Coordinators, and Brokers will still be running old code. Will they be able to handle the tombstones properly? If not, then we should avoid repurposing the dropExisting flag.
Compatibility scenario 2: what happens if runs a job that generates tombstones and then rolls back to an earlier version of Druid? This is sort of a more extreme version of scenario 1. The Overlords, Coordinators, Brokers, and Historicals will all be running the older code. Will they be able to work in some reasonable way or will they have trouble handling the tombstones? If there will be trouble, we'll need to make it clear that people should avoid using this feature until they're sure they won't need to roll back.
Zero-sized segments make me wonder if some BalancerStrategy implementations will have problems handling them. So I'd suggest having tests with all current BalancerStrategy impls to make sure this works, or alternatively, using size: 1.
Not all query types are expected to return zero rows when the input segment has zero rows. This may be an issue for the NoOpQueryRunner based approach. For example, segmentMetadata queries on segments with zero rows should still return the segment id, etc.
In the case where tombstones + new segments fully overshadow an existing segment, there isn't a reason to keep the tombstones loaded up forever. They could be dropped from the cluster after the segment they're meant to shadow has been unloaded. I think doing this would involve some logic adjustments to the Coordinator, and would be valuable so people don't run into proliferations of tombstones. Would you be up for adding this logic, either in this patch or in a follow-up?
Special case of the above: what happens if a datasource is 100% composed of tombstones? Ideally, those tombstones would all get dropped and the datasource would be completely unloaded.

The musing I promised: IMO this is going to be a very useful feature beyond the dropExisting use case. It'll give us a "delete" operation that is distinct from the "drop" operation. That's important because "drop" should be reversible by "re-load" but "delete" shouldn't be. For example, consider this case:

You delete some time ranges, either using one of the Coordinator DELETE APIs or by using dropExisting. Today, these are implemented by marking the relevant segments unused.
You don't ever add more data for these time ranges.
Later on, you drop the table to save space. This marks all the rest of the segments in the table unused.
Later still, you reload the table so you can query it again. This goes through and marks all the topmost segments for each time chunk as used.
Surprise! The time ranges you thought you deleted in (1) are now back!

This is confusing, and it's fixed by creating tombstones in (1) rather than marking the segments unused. That way, in step (4), the tombstones would block the unwanted segments from getting reloaded.

So, this patch updates dropExisting to create tombstones, but IMO in the future we should also be creating tombstones for any operation that the user would logically think is more of a "delete" than a "drop".

imply-cheddar · 2022-02-07T04:15:19Z

indexing-service/src/main/java/org/apache/druid/indexing/common/task/IndexTask.java

+        Map<Interval, String> tombstonesAndVersions = new HashMap<>();
+        for (Interval interval : tombstoneIntervals) {
+          NonnullPair<Interval, String> intervalAndVersion =
+              findIntervalAndVersion(


Generally speaking, at this point, the time interval should've already been locked, but the method that you are calling here has a bunch of logic that is trying to deal with a lock not yet existing. Do you have a test case that exercises the code to somehow cause an interval to be hit here that isn't already locked? I'm curious about the case of events that can lead to that, as there is likely a bug somewhere else in the stack. This code should be able to just use the locks that already exist to get a version (that version should also generally be the same for all of the locks), it should most likely throw an error if it's gotten to this point and not locked the interval yet.

@imply-cheddar At the time that I wrote that code I could not find an easy way to use the allocators (SegmentAllocator implementations) since they are typically created and used in the sub-tasks but the current code in this PR is creating them after the sub-tasks completed (in the parallel case). But I since then I thought of a potential way to integrate them in the sub-tasks (have each sub-task create the tombstones after it completed pushing all segments, create the tombstones there using its allocator, and then propagate the tombstones back to the supervisor task in the task report, then have the supervisor filter out unneeded tombstones). I will work next on using this new path, with the allocators, since the allocator should be already be prepared to deal with issues as the one you comment above. So please hold on for a little while while I try to implement this new path and provide an update to this PR.

@imply-cheddar I have updated the way allocation is done. I decided to use the "natural" path to use the allocators. That is I integrated the creation of tombstones in each of the paths for that different partition schemes: dynamic (linear), hash and range. The idea that I used is to create the tombstones at the "end" of processing of each one of those paths. At the "end" we know the actual pushed segments and the input intervals (these are now required for dropExisting true) thus we can compute and allocate the tombstones. I chose the right place where the allocator is available to create the tombstones. In general each one of the sub-tasks creates its tombstones. This means that some tombstones will not be kept because some of the other sub-tasks (of the same parallel task) could have created a real segment in that interval. Thus, at the "end" when all segments are combined from the subtasks there is a process that removes those "redundant" tomsbtones. I added unit tests to the corresponding. parallel task test and tested this manually as well in a local server.

Took back the code where tombstones are created in the sub-tasks because they generate incorrect tombstones. I thought they could be cleaned up/removed at the supervisor task level but attempting that caused missing core partitions some times. Now the allocation is similar as before, straightforward: just use the versions from the locks. For now, code does no support segment locking and if segment locking is requested the code forces time chunk locking. Segment locking support is left as future work.

loquisgon · 2022-02-07T18:18:46Z

@gianm Thanks for your comments! Rather than a lengthy response now let me briefly tell you that I will keep your points in mind in this review cycle. I promise to update the PR introductory text with a section of current limitations and future work where I will address your points above.

loquisgon · 2022-02-12T02:06:08Z

I had a build conflict and need to clean the build before proceeding...

loquisgon · 2022-02-12T20:09:11Z

Build should be fine now...

…t overlaps any of the input intervals for the spec

…ng segments. Add support for tombstones in the broker.

… killer used in the peon for killing segments.

core/src/main/java/org/apache/druid/segment/loading/TombstoneLoadSpec.java

docs/configuration/index.md

…oadSpec.java Typo Co-authored-by: Jonathan Wei <jon-wei@users.noreply.github.com>

Missing Co-authored-by: Jonathan Wei <jon-wei@users.noreply.github.com>

…edundant and more importantly, the test file was very close or exceeding the 10 min default "no output" CI Travis threshold.

loquisgon · 2022-03-09T03:06:46Z

openjdk15 test failures are due to a Travis CI issue so it is ok to merge

* Tombstone support for replace functionality * A used segment interval is the interval of a current used segment that overlaps any of the input intervals for the spec * Update compaction test to match replace behavior * Adapt ITAutoCompactionTest to work with tombstones rather than dropping segments. Add support for tombstones in the broker. * Style plus simple queriableindex test * Add segment cache loader tombstone test * Add more tests * Add a method to the LogicalSegment to test whether it has any data * Test filter with some empty logical segments * Refactor more compaction/dropexisting tests * Code coverage * Support for all empty segments * Skip tombstones when looking-up broker's timeline. Discard changes made to tool chest to avoid empty segments since they will no longer have empty segments after lookup because we are skipping over them. * Fix null ptr when segment does not have a queriable index * Add support for empty replace interval (all input data has been filtered out) * Fixed coverage & style * Find tombstone versions from lock versions * Test failures & style * Interner was making this fail since the two segments were consider equal due to their id's being equal * Cleanup tombstone version code * Force timeChunkLock whenever replace (i.e. dropExisting=true) is being used * Reject replace spec when input intervals are empty * Documentation * Style and unit test * Restore test code deleted by mistake * Allocate forces TIME_CHUNK locking and uses lock versions. TombstoneShardSpec added. * Unused imports. Dead code. Test coverage. * Coverage. * Prevent killer from throwing an exception for tombstones. This is the killer used in the peon for killing segments. * Fix OmniKiller + more test coverage. * Tombstones are now marked using a shard spec * Drop a segment factory.json in the segment cache for tombstones * Style * Style + coverage * style * Add TombstoneLoadSpec.class to mapper in test * Update core/src/main/java/org/apache/druid/segment/loading/TombstoneLoadSpec.java Typo Co-authored-by: Jonathan Wei <jon-wei@users.noreply.github.com> * Update docs/configuration/index.md Missing Co-authored-by: Jonathan Wei <jon-wei@users.noreply.github.com> * Typo * Integrated replace with an existing test since the replace part was redundant and more importantly, the test file was very close or exceeding the 10 min default "no output" CI Travis threshold. * Range does not work with multi-dim Co-authored-by: Jonathan Wei <jon-wei@users.noreply.github.com>

clintropolis added Area - Ingestion Area - Querying Area - Segment Balancing/Coordination Area - Segment Format and Ser/De and removed Area - Segment Format and Ser/De labels Jan 11, 2022

loquisgon marked this pull request as ready for review February 2, 2022 23:39

imply-cheddar suggested changes Feb 7, 2022

View reviewed changes

loquisgon force-pushed the batch_ingestion_replace branch 2 times, most recently from dcdbe09 to 68eaf36 Compare February 12, 2022 01:55

loquisgon force-pushed the batch_ingestion_replace branch from 68eaf36 to 35b70a7 Compare February 12, 2022 02:28

loquisgon marked this pull request as draft February 16, 2022 02:07

Agustin Gonzalez added 8 commits February 18, 2022 13:23

Tombstone support for replace functionality

682f957

A used segment interval is the interval of a current used segment tha…

2be134d

…t overlaps any of the input intervals for the spec

Update compaction test to match replace behavior

26177b6

Adapt ITAutoCompactionTest to work with tombstones rather than droppi…

3b99a81

…ng segments. Add support for tombstones in the broker.

Style plus simple queriableindex test

ac44e56

Add segment cache loader tombstone test

86fd028

Add more tests

4855929

Add a method to the LogicalSegment to test whether it has any data

f2e52ad

Unused imports. Dead code. Test coverage.

617eb0b

loquisgon marked this pull request as ready for review February 21, 2022 21:14

Agustin Gonzalez added 13 commits February 22, 2022 11:51

Coverage.

a977f21

Prevent killer from throwing an exception for tombstones. This is the…

151d674

… killer used in the peon for killing segments.

Fix OmniKiller + more test coverage.

3f5d1dc

Tombstones are now marked using a shard spec

c62d36b

Drop a segment factory.json in the segment cache for tombstones

0ad63a8

Merge branch 'master' into batch_ingestion_replace

8680fe4

Style

55c0a38

Style + coverage

9f7cc10

Merge branch 'temp' into batch_ingestion_replace

f4c3440

style

15f9db3

Add TombstoneLoadSpec.class to mapper in test

1f6a80e

Merge branch 'master' into batch_ingestion_replace

db30f5f

Merge branch 'master' into batch_ingestion_replace

ca1d5a2

jon-wei approved these changes Mar 7, 2022

View reviewed changes

core/src/main/java/org/apache/druid/segment/loading/TombstoneLoadSpec.java Outdated Show resolved Hide resolved

docs/configuration/index.md Outdated Show resolved Hide resolved

cheddar approved these changes Mar 7, 2022

View reviewed changes

Agustin Gonzalez and others added 6 commits March 7, 2022 17:19

Merge branch 'master' into batch_ingestion_replace

36d638c

Update core/src/main/java/org/apache/druid/segment/loading/TombstoneL…

c14cdfc

…oadSpec.java Typo Co-authored-by: Jonathan Wei <jon-wei@users.noreply.github.com>

Update docs/configuration/index.md

fcd2833

Missing Co-authored-by: Jonathan Wei <jon-wei@users.noreply.github.com>

Typo

d6a0179

Integrated replace with an existing test since the replace part was r…

c966bdb

…edundant and more importantly, the test file was very close or exceeding the 10 min default "no output" CI Travis threshold.

Range does not work with multi-dim

933a3fc

loquisgon merged commit abe76cc into apache:master Mar 9, 2022

loquisgon deleted the batch_ingestion_replace branch March 9, 2022 03:07

abhishekagarwal87 added this to the 0.23.0 milestone May 11, 2022

abhishekagarwal87 mentioned this pull request May 25, 2022

[Draft] 0.23.0 Release notes #12510

Closed

findingrish mentioned this pull request Aug 13, 2024

Filter out tombstone segments from metadata cache #16890

Merged

10 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Batch ingestion replace #12137

Batch ingestion replace #12137

loquisgon commented Jan 11, 2022 •

edited

Loading

JulianJaffePinterest commented Jan 15, 2022

loquisgon commented Jan 19, 2022

JulianJaffePinterest commented Jan 20, 2022

loquisgon commented Jan 21, 2022 •

edited

Loading

lgtm-com bot commented Jan 27, 2022

JulianJaffePinterest commented Jan 27, 2022

loquisgon commented Jan 28, 2022 •

edited

Loading

gianm commented Feb 4, 2022 •

edited

Loading

imply-cheddar Feb 7, 2022

loquisgon Feb 7, 2022 •

edited

Loading

loquisgon Feb 12, 2022

loquisgon Feb 21, 2022

loquisgon commented Feb 7, 2022

loquisgon commented Feb 12, 2022

loquisgon commented Feb 12, 2022

loquisgon commented Mar 9, 2022

Batch ingestion replace #12137

Batch ingestion replace #12137

Conversation

loquisgon commented Jan 11, 2022 • edited Loading

Druid batch ingestion replace high level design

Current known limitations (as of March 1, 2022)

The broker is filtering out tombstones from its timeline.

Metadata queries will not reflect metadata in tombstones

SEGMENT locking is not supported

Cleanup of tombstones

Tombstone remains after append

Upgrade/Downgrade process

Upgrade

Downgrade

Problem statement

Example of today functionality

Details & design idea

Components and their corresponding changes

Middlemanager/peon (create, push tombstones, publish to overlord)

Coordinator

Historical query path

Broker

Compaction:

Cleaning tombstone datasegments

Design alternatives

Implementation status

As of Feb 15, 2022

As of Feb 12, 2022

As of Feb 1, 2022

Implementation status

As of January 31, 2022

JulianJaffePinterest commented Jan 15, 2022

loquisgon commented Jan 19, 2022

JulianJaffePinterest commented Jan 20, 2022

loquisgon commented Jan 21, 2022 • edited Loading

lgtm-com bot commented Jan 27, 2022

JulianJaffePinterest commented Jan 27, 2022

loquisgon commented Jan 28, 2022 • edited Loading

gianm commented Feb 4, 2022 • edited Loading

imply-cheddar Feb 7, 2022

Choose a reason for hiding this comment

loquisgon Feb 7, 2022 • edited Loading

Choose a reason for hiding this comment

loquisgon Feb 12, 2022

Choose a reason for hiding this comment

loquisgon Feb 21, 2022

Choose a reason for hiding this comment

loquisgon commented Feb 7, 2022

loquisgon commented Feb 12, 2022

loquisgon commented Feb 12, 2022

loquisgon commented Mar 9, 2022

loquisgon commented Jan 11, 2022 •

edited

Loading

`SEGMENT` locking is not supported

loquisgon commented Jan 21, 2022 •

edited

Loading

loquisgon commented Jan 28, 2022 •

edited

Loading

gianm commented Feb 4, 2022 •

edited

Loading

loquisgon Feb 7, 2022 •

edited

Loading