-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a check to enable size based threshold for realtime tables #12016
Add a check to enable size based threshold for realtime tables #12016
Conversation
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## master #12016 +/- ##
=============================================
- Coverage 61.69% 27.57% -34.12%
Complexity 207 207
=============================================
Files 2385 2385
Lines 129352 129357 +5
Branches 20029 20032 +3
=============================================
- Hits 79799 35669 -44130
- Misses 43731 90998 +47267
+ Partials 5822 2690 -3132
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
streamConfigMap.put(StreamConfigProperties.SEGMENT_FLUSH_THRESHOLD_ROWS, "1000000"); | ||
streamConfigMap.put(StreamConfigProperties.SEGMENT_FLUSH_THRESHOLD_SEGMENT_SIZE, "100M"); | ||
try { | ||
new StreamConfig(tableName, streamConfigMap); | ||
fail("Invalid config: flush threshold rows must be 0, when flush threshold size is set."); | ||
} catch (Exception e) { | ||
// Expected | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i am not sure this is what the document intended to mean
what i read was:
You can pick the appropriate value for segment size and number of hours in the table config, and set the number of rows to zero. Note that you don't have to pick values exactly as given in each of these combinations (they are calculated guesses anyway).
it wouldn't throw exception if num rows is set to non-zero.
my concern is that this will cause issue during cluster upgrade. if any user have both configuration set table will be unable to load in an error state .
@@ -157,7 +157,8 @@ public static void validate(TableConfig tableConfig, @Nullable Schema schema, @N | |||
// Validate that StreamConfig can be created | |||
streamConfig = new StreamConfig(tableConfig.getTableName(), streamConfigMap); | |||
} catch (Exception e) { | |||
throw new IllegalStateException("Could not create StreamConfig using the streamConfig map", e); | |||
throw new IllegalStateException("Could not create StreamConfig using the streamConfig map: " + e.getMessage(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we already included the exception, do not add message again
@@ -172,7 +172,7 @@ public StreamConfig(String tableNameWithType, Map<String, String> streamConfigMa | |||
|
|||
_flushThresholdRows = extractFlushThresholdRows(streamConfigMap); | |||
_flushThresholdTimeMillis = extractFlushThresholdTimeMillis(streamConfigMap); | |||
_flushThresholdSegmentSizeBytes = extractFlushThresholdSegmentSize(streamConfigMap); | |||
_flushThresholdSegmentSizeBytes = extractFlushThresholdSegmentSize(streamConfigMap, _flushThresholdRows); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do not add check within the constructor because it will break existing table configs. For new checks we want to enforce, add it into TableConfigUtils.validate()
…ment.flush.threshold.rows must be set to 0 for size base flush threshold updater to be active.
…reation only. Existing tables will continue to work.
9046668
to
725d400
Compare
As per the Pinot document, to enable the segment size based threshold rows must be set to 0. Adding a check during table creation, if realtime.segment.flush.threshold.segment.size is set, realtime.segment.flush.threshold.rows must be set to 0.
Added a test, and also ran below curl command to ensure if the table config has invalid combination, then the API fails:
% curl -X POST http://localhost:9000/tables -H 'accept: application/json' -H 'Content-Type: application/json' -d @/Users/soumitra/pinot-tutorial/transcript/transcript-table-realtime.json
{"code":400,"error":"Invalid config: realtime.segment.flush.threshold.rows=50000, it must be set to 0 for size based segment threshold to work."}
backward-incompat