Add dropOutOfOrderRecord config to drop out-of-order events #11811

tibrewalpratik17 · 2023-10-15T21:57:40Z

labels:

feature
upsert
release-notes

Related to issue: #11796

This patch adds a new config for upsert: dropOutOfOrderRecord which when set to true, doesn't persist out-of-order events in the segment. This saves disk-usage and also avoids any confusion when using skipUpsert.

codecov-commenter · 2023-10-15T22:40:46Z

Codecov Report

Merging #11811 (3a5c650) into master (36be75f) will decrease coverage by 12.89%.
Report is 10 commits behind head on master.
The diff coverage is 88.23%.

@@              Coverage Diff              @@
##             master   #11811       +/-   ##
=============================================
- Coverage     63.16%   50.27%   -12.89%     
+ Complexity     1141        6     -1135     
=============================================
  Files          2343     2267       -76     
  Lines        126420   122690     -3730     
  Branches      19442    18886      -556     
=============================================
- Hits          79849    61686    -18163     
- Misses        40900    56878    +15978     
+ Partials       5671     4126     -1545

Flag	Coverage Δ
custom-integration1	`<0.01% <0.00%> (ø)`
integration	`<0.01% <0.00%> (ø)`
integration1	`<0.01% <0.00%> (ø)`
integration2	`0.00% <0.00%> (ø)`
java-11	`50.21% <88.23%> (-12.92%)`	⬇️
java-17	`50.12% <88.23%> (-12.91%)`	⬇️
java-20	`50.13% <88.23%> (-12.89%)`	⬇️
temurin	`50.27% <88.23%> (-12.89%)`	⬇️
unittests	`67.36% <88.23%> (+4.21%)`	⬆️
unittests1	`67.36% <88.23%> (+0.02%)`	⬆️
unittests2	`?`

Flags with carried forward coverage won't be shown. Click here to find out more.

Files	Coverage Δ
...cal/upsert/BasePartitionUpsertMetadataManager.java	`40.78% <100.00%> (+0.16%)`	⬆️
...t/local/upsert/BaseTableUpsertMetadataManager.java	`76.76% <100.00%> (+0.23%)`	⬆️
...t/ConcurrentMapPartitionUpsertMetadataManager.java	`85.21% <100.00%> (+0.21%)`	⬆️
...psert/ConcurrentMapTableUpsertMetadataManager.java	`33.33% <ø> (ø)`
...rg/apache/pinot/spi/config/table/UpsertConfig.java	`86.53% <100.00%> (+0.82%)`	⬆️
...local/indexsegment/mutable/MutableSegmentImpl.java	`67.80% <60.00%> (-0.12%)`	⬇️

... and 497 files with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

...al/src/main/java/org/apache/pinot/segment/local/indexsegment/mutable/MutableSegmentImpl.java

.../src/main/java/org/apache/pinot/segment/local/upsert/BasePartitionUpsertMetadataManager.java

.../java/org/apache/pinot/segment/local/upsert/ConcurrentMapPartitionUpsertMetadataManager.java

Jackie-Jiang · 2023-10-16T21:28:53Z

...al/src/main/java/org/apache/pinot/segment/local/indexsegment/mutable/MutableSegmentImpl.java

-      canTakeMore = numDocsIndexed++ < _capacity;
-      _partitionUpsertMetadataManager.addRecord(this, recordInfo);
+      // if record doesn't need to be dropped, then persist in segment and update metadata hashmap
+      if (!_partitionUpsertMetadataManager.shouldDropRecord(recordInfo)) {


This check is already performed in doAddRecord(). Suggest modifying addRecord() to return boolean which indicates whether the record should be kept. Then we may modify this part into

Suggested change

if (!_partitionUpsertMetadataManager.shouldDropRecord(recordInfo)) {

if (_partitionUpsertMetadataManager.addRecord(this, recordInfo)) {

Hey @Jackie-Jiang at present _partitionUpsertMetadataManager.addRecord is called after addNewRow. Do you foresee any issues in changing the order here?

One scenario I can think of where this might cause an issue is that we add a record in the metadata hashmap but somehow addNewRow fails due to some indexing issues. At present, it throws an exception which doesn't call the _partitionUpsertMetadataManager.addRecord as well which makes sense.

I checked and there should be no issue changing the order because the row is not added to the metadata manager if it is out of order

I checked and there should be no issue changing the order because the row is not added to the metadata manager if it is out of order

What about a row which is not out-of-order? It gets added to metadata manager but addNewRow call fails due to some issue. Will that not cause issues when further updates are received for that record?

If it throws exception, in both cases _numDocsIndexed won't be updated, and we will end up re-indexing the same docId. That needs to be solved separately, but very good point!

Umm i think there can be a scenario something like:

Key: A, Current Segment: S1, Valid Doc ID: 2 and say addNewRow call fails for this.
MetadataMap: A -> RecordLocation (S1, 2)

Key: B, Current Segment: S1, Valid Doc ID: 2 (This also gets valid Doc ID as 2 as _numDocsIndexed didn't update).
MetadataMap: A -> RecordLocation (S1, 2), B -> RecordLocation (S1, 2)

Now say, we receive a record for key A again after sometime when we are processing segment S2.
In that scenario, we will invoke a removeDocID command for (S1, 2) here -

pinot/pinot-segment-local/src/main/java/org/apache/pinot/segment/local/upsert/ConcurrentMapPartitionUpsertMetadataManager.java

Line 264 in 51335a2

removeDocId(currentSegment, currentDocId);

which will invalidate docID for B and will stop responding it in queries.

Did I miss anything? 😅

The analysis is correct. What I meant is that currently it is anyway not handled correctly, and we should solve this problem in a separate PR. Assuming addNewRow() failed, dictionary was already updated, and the first several columns might also be added already. Currently it works because that call never fails. The proper fix should be to ensure even if it throws exception, we should continue on other columns (e.g. put a default value for the failed column), and increase the doc id anyway.

Sounds good to me! Updated the patch based on this.

tibrewalpratik17 · 2023-10-23T16:19:40Z

Hey @Jackie-Jiang updated the patch based on suggestions. Can you review once?

Add upsert config - dropOutOfOrderRecord to drop out-of-order events

bc8f872

xiangfu0 added feature release-notes Referenced by PRs that need attention when compiling the next release notes upsert labels Oct 16, 2023

xiangfu0 requested review from Jackie-Jiang and klsince October 16, 2023 01:25

tibrewalpratik17 changed the title ~~Add upsert config - dropOutOfOrderRecord to drop out-of-order events~~ Add upsert config - dropOutOfOrderRecord config to drop out-of-order events Oct 16, 2023

tibrewalpratik17 changed the title ~~Add upsert config - dropOutOfOrderRecord config to drop out-of-order events~~ Add dropOutOfOrderRecord config to drop out-of-order events Oct 16, 2023

klsince reviewed Oct 16, 2023

View reviewed changes

...al/src/main/java/org/apache/pinot/segment/local/indexsegment/mutable/MutableSegmentImpl.java Show resolved Hide resolved

.../src/main/java/org/apache/pinot/segment/local/upsert/BasePartitionUpsertMetadataManager.java Outdated Show resolved Hide resolved

address comments

06472f2

Jackie-Jiang reviewed Oct 16, 2023

View reviewed changes

tibrewalpratik17 added 2 commits October 17, 2023 17:21

refactor position of dropOOOrecord in class constructor

224d728

refactor addRecord to return boolean

3a5c650

tibrewalpratik17 requested a review from Jackie-Jiang October 19, 2023 20:29

tibrewalpratik17 requested a review from klsince October 24, 2023 22:39

klsince approved these changes Oct 24, 2023

View reviewed changes

xiangfu0 merged commit 91bba48 into apache:master Oct 25, 2023
19 checks passed

Jackie-Jiang mentioned this pull request Nov 7, 2023

Add upsert config - outOfOrderRecordColumn to track out-of-order events #11877

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add dropOutOfOrderRecord config to drop out-of-order events #11811

Add dropOutOfOrderRecord config to drop out-of-order events #11811

tibrewalpratik17 commented Oct 15, 2023

codecov-commenter commented Oct 15, 2023 •

edited

Loading

Jackie-Jiang Oct 16, 2023

tibrewalpratik17 Oct 16, 2023

tibrewalpratik17 Oct 16, 2023

Jackie-Jiang Oct 16, 2023

tibrewalpratik17 Oct 16, 2023

Jackie-Jiang Oct 16, 2023

tibrewalpratik17 Oct 16, 2023 •

edited

Loading

Jackie-Jiang Oct 17, 2023

tibrewalpratik17 Oct 18, 2023

tibrewalpratik17 commented Oct 23, 2023

	if (!_partitionUpsertMetadataManager.shouldDropRecord(recordInfo)) {
	if (_partitionUpsertMetadataManager.addRecord(this, recordInfo)) {

Add dropOutOfOrderRecord config to drop out-of-order events #11811

Add dropOutOfOrderRecord config to drop out-of-order events #11811

Conversation

tibrewalpratik17 commented Oct 15, 2023

codecov-commenter commented Oct 15, 2023 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tibrewalpratik17 Oct 16, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tibrewalpratik17 commented Oct 23, 2023

codecov-commenter commented Oct 15, 2023 •

edited

Loading

tibrewalpratik17 Oct 16, 2023 •

edited

Loading