-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add integration test for rebalance in upsert tables #11568
Conversation
Codecov Report
@@ Coverage Diff @@
## master #11568 +/- ##
============================================
- Coverage 63.10% 63.08% -0.02%
Complexity 1118 1118
============================================
Files 2342 2342
Lines 125916 125917 +1
Branches 19370 19370
============================================
- Hits 79454 79440 -14
- Misses 40813 40825 +12
- Partials 5649 5652 +3
Flags with carried forward coverage won't be shown. Click here to find out more.
... and 11 files with indirect coverage changes 📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
088a2b2
to
eba73c6
Compare
...ts/src/test/java/org/apache/pinot/integration/tests/UpsertTableRebalanceIntegrationTest.java
Outdated
Show resolved
Hide resolved
@Override | ||
protected void waitForAllDocsLoaded(long timeoutMs) | ||
throws Exception { | ||
Thread.sleep(5000L); //TODO: Wait for some time for rebalance to kickoff |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We want to avoid sleeping a fixed amount of time. For fast machine, it can waste time; for slow machine, it can cause flakiness. Any specific reason why waiting for condition itself is not enough?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, waiting on condition itself triggers an early termination since the rebalance simply hasn't started and all segments are still available.
is there an jira that prompted this item? did this not work in some env / scenario? |
Yes, there are cases when ideal state diverges from instance partition map and it created prod issues. @klsince is working on the fix for the issue. |
@klsince Can you help review this test? |
return 600; | ||
} | ||
|
||
protected long getCurrentCountStarResultWithoutNulls(String tableName, Schema schema) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
curious why need to check on NULL rows?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
aah because we have observed an issue in partial upsert tables in the past where some columns always contain null even though the correct data is coming in the stream.
// Check the number of replicas after rebalancing | ||
finalReplicas = _resourceManager.getServerInstancesForTable(getTableName(), TableType.REALTIME).size(); | ||
|
||
// Check that a replica has been added |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume you meant a server has been added? as from the comment below, rf=1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes
waitForAllDocsLoaded(600_000L); | ||
|
||
// number of instances assigned can't be more than number of partitions for rf = 1 | ||
verifySegmentAssignment(rebalanceResult.getSegmentAssignment(), 5, getNumKafkaPartitions()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
iiuc, I assume the number of kafka partitions is 3? then, it may be interesting to test by adding one more server (4 servers in total) to check that one server is left idle w/o taking any table partition.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No the default is 2 partitons and that scenario is already covered.
|
||
import static org.testng.Assert.assertEquals; | ||
|
||
public class PartialUpsertTableRebalanceIntegrationTest extends BaseClusterIntegrationTest { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks like the test cases are not specific to partial upsert? or I must have missed some details 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah it covers both upsert and partial upsert. I have added the table config to be partial upsert cause we observed weird scenarios there sometimes where some columns are always null.
638a567
to
905af70
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
just two non-blocking comments
...test/java/org/apache/pinot/integration/tests/PartialUpsertTableRebalanceIntegrationTest.java
Outdated
Show resolved
Hide resolved
...test/java/org/apache/pinot/integration/tests/PartialUpsertTableRebalanceIntegrationTest.java
Outdated
Show resolved
Hide resolved
adf9e77
to
e0c8532
Compare
Mostly tests that partitions move as a whole unit to new servers