Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add test suite for real time compatibility testing(#6787) #6916

Merged
merged 1 commit into from
May 14, 2021

Conversation

GSharayu
Copy link
Contributor

@GSharayu GSharayu commented May 13, 2021

Compatibility tester needs that the yaml files used for testing (and the file paths within yaml files) reside in the resource directory path. We should change these so that the files can be absolute path. This will allow Pinot installations to specify their own test suite and run compat testing before upgrade to new pinot versions.

The tester script should take a directory name (absolute path) as the argument where all the yaml files are located. The yaml files should be named as pre-controller-upgrade.yaml, pre-broker-upgrade.yaml, etc (6 of them) . The shell script should be modified to pick up these yaml files from the directory specified.

In addition, the directory should contain a subdir called config under which all the config file names should reside (e.g. the table config files, schema files, kafka producer config files. etc.), possibly under subdirectories if the user chooses to put them that way. These config files are the ones referenced inside the yaml files.

The PR fixes support for real time compatibility testing change. In realtime tables, the generation number should be taken from the data, not from the argument ideally. This is okay as we will be sending in new events every generation

@GSharayu GSharayu changed the title Add correct yaml files for real time data(#6787) Add correct yaml files for real time compatibility testing(#6787) May 13, 2021
@GSharayu GSharayu changed the title Add correct yaml files for real time compatibility testing(#6787) Add test suite for real time compatibility testing(#6787) May 13, 2021
Copy link
Contributor

@mcvsubbu mcvsubbu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For realtime tables, the best way to query may be to find out the highest gen number, and then issue queries from 1 to that number. This will probably need code changes, which you can chose to do now or in the next PR.

@@ -28,7 +28,19 @@ operations:
tableConfigFileName: feature-test-1.json
recordReaderConfigFileName: data/recordReaderConfig.json
segmentName: FeatureTest1_Segment6
- type: streamOp
description: publish rows to PinotRealtimeFeatureTest1Event
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pls correct the description

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this was updated!

@@ -20,6 +20,12 @@
# Operations to be done.
description: Operations to be run before Controller upgrade
operations:
- type: tableOp
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you need to create the table after creating the kafka topic

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated!

- type: streamOp
description: create Kafka topic PinotRealtimeFeatureTest2Event
op: CREATE
streamConfigFileName: feature-test-2-realtime-stream-config.json
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sure we have 1 partition in the stream.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

feature-test-2-realtime-stream-config.json has 1 partition in the stream.

description: publish rows to PinotRealtimeFeatureTest2Event
op: PRODUCE
streamConfigFileName: feature-test-2-realtime-stream-config.json
numRows: 9
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest publishing higher number of rows.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the number of rows are consistent with offline data set(10 rows). Higher number of rows will need addition of more dummy data in csv and recalculate query results. What do you suggest?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we dont need more data in csv. we loop around until we reach the number of rows, so it is ok.
yes, we do need to modify the query results.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done!

@GSharayu GSharayu force-pushed the pinot_6787_3 branch 3 times, most recently from 356dd51 to e3308b5 Compare May 14, 2021 00:06
@GSharayu
Copy link
Contributor Author

For realtime tables, the best way to query may be to find out the highest gen number, and then issue queries from 1 to that number. This will probably need code changes, which you can chose to do now or in the next PR.

didnt we fix that in verify queries in last PR?

@codecov-commenter
Copy link

codecov-commenter commented May 14, 2021

Codecov Report

Merging #6916 (5f064ef) into master (e1d6ca4) will increase coverage by 0.12%.
The diff coverage is n/a.

Impacted file tree graph

@@             Coverage Diff              @@
##             master    #6916      +/-   ##
============================================
+ Coverage     73.39%   73.51%   +0.12%     
  Complexity       12       12              
============================================
  Files          1432     1432              
  Lines         70592    70592              
  Branches      10205    10205              
============================================
+ Hits          51808    51898      +90     
+ Misses        15348    15255      -93     
- Partials       3436     3439       +3     
Flag Coverage Δ Complexity Δ
integration 43.12% <ø> (+0.17%) 7.00 <ø> (ø)
unittests 65.43% <ø> (-0.01%) 12.00 <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ Complexity Δ
...e/impl/dictionary/LongOnHeapMutableDictionary.java 63.85% <0.00%> (-6.03%) 0.00% <0.00%> (ø%)
.../impl/dictionary/FloatOnHeapMutableDictionary.java 69.87% <0.00%> (-6.03%) 0.00% <0.00%> (ø%)
.../pinot/core/query/scheduler/PriorityScheduler.java 80.82% <0.00%> (-2.74%) 0.00% <0.00%> (ø%)
...not/broker/broker/helix/ClusterChangeMediator.java 74.72% <0.00%> (-2.20%) 0.00% <0.00%> (ø%)
...lix/core/realtime/PinotRealtimeSegmentManager.java 81.02% <0.00%> (-1.03%) 0.00% <0.00%> (ø%)
.../controller/helix/core/SegmentDeletionManager.java 77.04% <0.00%> (-0.82%) 0.00% <0.00%> (ø%)
...nMaxValueBasedSelectionOrderByCombineOperator.java 70.89% <0.00%> (+0.74%) 0.00% <0.00%> (ø%)
...a/org/apache/pinot/core/common/DataBlockCache.java 96.96% <0.00%> (+0.75%) 0.00% <0.00%> (ø%)
...ot/common/protocols/SegmentCompletionProtocol.java 95.26% <0.00%> (+0.94%) 0.00% <0.00%> (ø%)
.../helix/core/realtime/SegmentCompletionManager.java 72.46% <0.00%> (+1.01%) 0.00% <0.00%> (ø%)
... and 15 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e1d6ca4...5f064ef. Read the comment docs.

@mqliang
Copy link
Contributor

mqliang commented May 14, 2021

LGTM

@@ -41,7 +41,7 @@
"sortedColumn": [],
"streamConfigs": {
"streamType": "kafka",
"stream.kafka.topic.name": "PinotRealtimeFeatureTest1Event",
"stream.kafka.topic.name": "PinotRealtimeFeatureTest2Event",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add the segment completion criteria here.
"realtime.segment.flush.threshold.size": "63" "realtime.segment.flush.threshold.time": "1h"
Since we are publishing 66 rows, it is best to complete a segment a bit below 66, so that we always complete at least one segment each generation, and also have some left over rows that we need to re-consume in the next generation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated!

@mcvsubbu mcvsubbu merged commit e379292 into apache:master May 14, 2021
@mcvsubbu
Copy link
Contributor

@GSharayu thanks. Merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants