-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add test suite for real time compatibility testing(#6787) #6916
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For realtime tables, the best way to query may be to find out the highest gen number, and then issue queries from 1 to that number. This will probably need code changes, which you can chose to do now or in the next PR.
@@ -28,7 +28,19 @@ operations: | |||
tableConfigFileName: feature-test-1.json | |||
recordReaderConfigFileName: data/recordReaderConfig.json | |||
segmentName: FeatureTest1_Segment6 | |||
- type: streamOp | |||
description: publish rows to PinotRealtimeFeatureTest1Event |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pls correct the description
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this was updated!
@@ -20,6 +20,12 @@ | |||
# Operations to be done. | |||
description: Operations to be run before Controller upgrade | |||
operations: | |||
- type: tableOp |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you need to create the table after creating the kafka topic
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated!
- type: streamOp | ||
description: create Kafka topic PinotRealtimeFeatureTest2Event | ||
op: CREATE | ||
streamConfigFileName: feature-test-2-realtime-stream-config.json |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make sure we have 1 partition in the stream.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
feature-test-2-realtime-stream-config.json has 1 partition in the stream.
description: publish rows to PinotRealtimeFeatureTest2Event | ||
op: PRODUCE | ||
streamConfigFileName: feature-test-2-realtime-stream-config.json | ||
numRows: 9 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggest publishing higher number of rows.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the number of rows are consistent with offline data set(10 rows). Higher number of rows will need addition of more dummy data in csv and recalculate query results. What do you suggest?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we dont need more data in csv. we loop around until we reach the number of rows, so it is ok.
yes, we do need to modify the query results.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done!
356dd51
to
e3308b5
Compare
didnt we fix that in verify queries in last PR? |
Codecov Report
@@ Coverage Diff @@
## master #6916 +/- ##
============================================
+ Coverage 73.39% 73.51% +0.12%
Complexity 12 12
============================================
Files 1432 1432
Lines 70592 70592
Branches 10205 10205
============================================
+ Hits 51808 51898 +90
+ Misses 15348 15255 -93
- Partials 3436 3439 +3
Flags with carried forward coverage won't be shown. Click here to find out more. Continue to review full report at Codecov.
|
LGTM |
@@ -41,7 +41,7 @@ | |||
"sortedColumn": [], | |||
"streamConfigs": { | |||
"streamType": "kafka", | |||
"stream.kafka.topic.name": "PinotRealtimeFeatureTest1Event", | |||
"stream.kafka.topic.name": "PinotRealtimeFeatureTest2Event", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add the segment completion criteria here.
"realtime.segment.flush.threshold.size": "63" "realtime.segment.flush.threshold.time": "1h"
Since we are publishing 66 rows, it is best to complete a segment a bit below 66, so that we always complete at least one segment each generation, and also have some left over rows that we need to re-consume in the next generation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated!
@GSharayu thanks. Merged. |
Compatibility tester needs that the yaml files used for testing (and the file paths within yaml files) reside in the resource directory path. We should change these so that the files can be absolute path. This will allow Pinot installations to specify their own test suite and run compat testing before upgrade to new pinot versions.
The tester script should take a directory name (absolute path) as the argument where all the yaml files are located. The yaml files should be named as pre-controller-upgrade.yaml, pre-broker-upgrade.yaml, etc (6 of them) . The shell script should be modified to pick up these yaml files from the directory specified.
In addition, the directory should contain a subdir called config under which all the config file names should reside (e.g. the table config files, schema files, kafka producer config files. etc.), possibly under subdirectories if the user chooses to put them that way. These config files are the ones referenced inside the yaml files.
The PR fixes support for real time compatibility testing change. In realtime tables, the generation number should be taken from the data, not from the argument ideally. This is okay as we will be sending in new events every generation