-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cover the race condition for upsert compaction #12346
Conversation
90c5517
to
35a1ed1
Compare
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## master #12346 +/- ##
============================================
- Coverage 61.69% 61.69% -0.01%
Complexity 207 207
============================================
Files 2422 2426 +4
Lines 132217 132667 +450
Branches 20408 20499 +91
============================================
+ Hits 81573 81848 +275
- Misses 44656 44820 +164
- Partials 5988 5999 +11
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
35a1ed1
to
d89e8bc
Compare
The current code can face the race condition when the following condition is met: 1. Segment upload is complete for segment replacement and segment zk metadata is changed to point the new segment. 2. Server is still loading the segment (e.g. if offheap upsert is enabled, segment reload can take a while) 3. If minion picks up the compaction task at the above step, minion will download the new segment (replaced) while it will fetch the validDocId bitmap for old segment. To avoid this scenario, we added the extra crc check on the minion side. Minion will check if the crc from downloaded segment and crc from server api for valid doc id bitmap are the same. Changes from the PR: 1. Add the new API for fetching validDocId bitmap. the response is changed to JSON to include extra information like crc. 2. Wired the new API & added tests 3. Added the crc check on the minion task executor 4. Added the crc check on the task generator
d89e8bc
to
5c5d922
Compare
...t-common/src/main/java/org/apache/pinot/common/restlet/resources/ValidDocIdMetadataInfo.java
Outdated
Show resolved
Hide resolved
...ommon/src/main/java/org/apache/pinot/common/restlet/resources/ValidDocIdsBitmapResponse.java
Outdated
Show resolved
Hide resolved
Response.Status.BAD_REQUEST); | ||
} | ||
|
||
// Adopt the same logic as the query execution to get the valid doc ids. 'FilterPlanNode.run()' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not introduced in this PR, but this logic is incorrect. We cannot read the valid doc ids from memory because it might be modified during ingestion, and if the server re-consume the data (e.g. restart), some valid docs might be missing. We should use the valid doc ids from the snapshot. Same for the valid doc ids metadata api
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added validDocIdType
. Can you review the code?
@@ -160,6 +170,20 @@ public List<PinotTaskConfig> generateTasks(List<TableConfig> tableConfigs) { | |||
return pinotTaskConfigs; | |||
} | |||
|
|||
private String getDefaultValidDocId(UpsertConfig upsertConfig) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Jackie-Jiang I found that the snapshot is only available if enableSnapshot=true.
Should we perform upsert at the best effort or do we want to enforce enableSnapshot=true
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated
- throw error if enableSnapshot is not set to true.
- throw error if queryableDocId type is set while upsert delete column is not enabled.
981a4fd
to
2ec8c54
Compare
2ec8c54
to
84fa0c9
Compare
@@ -52,6 +54,7 @@ public class UpsertCompactionTaskGenerator extends BaseTaskGenerator { | |||
private static final String DEFAULT_BUFFER_PERIOD = "7d"; | |||
private static final double DEFAULT_INVALID_RECORDS_THRESHOLD_PERCENT = 0.0; | |||
private static final long DEFAULT_INVALID_RECORDS_THRESHOLD_COUNT = 0; | |||
private static final String DEFAULT_VALID_DOC_ID_TYPE = "validDocIdsSnapshot"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can the default be just validDocIds
to keep it backward-compatible?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tibrewalpratik17 We found that using in-memory based validDocIds is a bit dangerous as it will not give us the consistency (e.g. fetching validDocIds bitmap while the server is restarting & updating validDocIds)
- by default, we would want to enforce the customer to use
segment compaction + snapshot enabled in the upsert config
- We would still want to allow the previous approach (im-memory bitmap based). In this case, we will ask the customers to pick the appropriate validDocIdType.
How do you think?
cc: @Jackie-Jiang
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In that case, can we tag this PR as backward-incompatible as existing compaction tasks will break without snapshot?
I agree of this inconsistent scenario:
in-memory based validDocIds is a bit dangerous as it will not give us the consistency (e.g. fetching validDocIds bitmap while the server is restarting & updating validDocIds)
...ava/org/apache/pinot/plugin/minion/tasks/upsertcompaction/UpsertCompactionTaskGenerator.java
Show resolved
Hide resolved
...ava/org/apache/pinot/plugin/minion/tasks/upsertcompaction/UpsertCompactionTaskGenerator.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please update the PR description with what is backward incompatible?
...t-common/src/main/java/org/apache/pinot/common/restlet/resources/ValidDocIdMetadataInfo.java
Outdated
Show resolved
Hide resolved
...ommon/src/main/java/org/apache/pinot/common/restlet/resources/ValidDocIdsBitmapResponse.java
Outdated
Show resolved
Hide resolved
fde4c34
to
b8f847a
Compare
@Jackie-Jiang @tibrewalpratik17 Please take a final look on this PR. I think that this is now close to be merged. |
b8f847a
to
83eba44
Compare
pinot-common/src/main/java/org/apache/pinot/common/restlet/resources/ValidDocIdsType.java
Outdated
Show resolved
Hide resolved
pinot-common/src/main/java/org/apache/pinot/common/restlet/resources/ValidDocIdsType.java
Outdated
Show resolved
Hide resolved
...t-common/src/main/java/org/apache/pinot/common/restlet/resources/ValidDocIdMetadataInfo.java
Outdated
Show resolved
Hide resolved
...ommon/src/main/java/org/apache/pinot/common/restlet/resources/ValidDocIdsBitmapResponse.java
Outdated
Show resolved
Hide resolved
beae4f3
to
cac50bd
Compare
* Cover the race condition for upsert compaction The current code can face the race condition when the following condition is met: 1. Segment upload is complete for segment replacement and segment zk metadata is changed to point the new segment. 2. Server is still loading the segment (e.g. if offheap upsert is enabled, segment reload can take a while) 3. If minion picks up the compaction task at the above step, minion will download the new segment (replaced) while it will fetch the validDocId bitmap for old segment. To avoid this scenario, we added the extra crc check on the minion side. Minion will check if the crc from downloaded segment and crc from server api for valid doc id bitmap are the same. Changes from the PR: 1. Add the new API for fetching validDocId bitmap. the response is changed to JSON to include extra information like crc. 2. Wired the new API & added tests 3. Added the crc check on the minion task executor 4. Added the crc check on the task generator * Add the 'validDocIdType' across all APIs * Changed 'validDocIdType' to 'validDocIdsType' * Add validDocIdsType enum * Addressing the comments to use enum everywhere
The current code can face the race condition when the following condition is met:
To avoid this scenario, we added the extra crc check on the minion side. Minion will check if the crc from downloaded segment and crc from server api for valid doc id bitmap are the same.
Changes from the PR:
Release Note on the backward incompatibility
This PR is introducing backward incompatibility for UpsertCompactionTask. Previously, we allowed to configure the compaction task without the snapshot enabled. We found that using in-memory based validDocIds is a bit dangerous as it will not give us the consistency (e.g. fetching validDocIds bitmap while the server is restarting & updating validDocIds).
We now enforce the
enableSnapshot=true
for UpsertCompactionTask if the advanced customer wants to run the compaction with the in-memory validDocId bitmap.Also, we allow to configure
invalidDocIdsType
to UpsertCompactionTask for advanced user.snapshot
: Default validDocIds type. This indicates that the validDocIds bitmap is loaded from the snapshot from the Pinot segment. UpsertConfig'senableSnapshot
must be enabled for this type.onHeap
: the validDocIds bitmap will be fetched from the server.onHeapWithDelete
: the validDocIds bitmap will be fetched from the server. This will also take account into the deleted documents. UpsertConfig'sdeleteRecordColumn
must be provided for this type.