-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tenant rebalance and status tracking APIs #11128
Conversation
Codecov Report
@@ Coverage Diff @@
## master #11128 +/- ##
==========================================
- Coverage 0.11% 0.11% -0.01%
==========================================
Files 2229 2239 +10
Lines 119951 120412 +461
Branches 18171 18215 +44
==========================================
Hits 137 137
- Misses 119794 120255 +461
Partials 20 20
Flags with carried forward coverage won't be shown. Click here to find out more.
... and 47 files with indirect coverage changes 📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
try { | ||
Configuration config = extractRebalanceConfig(context); | ||
config.setProperty(RebalanceConfigConstants.DRY_RUN, true); | ||
rebalanceResult.put(table, _pinotHelixResourceManager.rebalanceTable(table, config, false)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we rebalancing twice in this function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am trying to keep the behaviour equivalent to the table rebalance endpoint. The first rebalance runs in DRY_RUN mode so the table is not actually rebalanced but we get the expected state which the user can expect once the async operations are completed. The next rebalance calls will do the actual work.
config.setProperty(RebalanceConfigConstants.JOB_ID, rebalanceResult.get(table).getJobId()); | ||
rebalanceTable(table, config, observer); | ||
} | ||
} catch (NoSuchElementException ignore) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should break out of the while loop explicitly and not by throwing an exception.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will change it
_executorService.submit(() -> { | ||
while (!parallelQueue.isEmpty()) { | ||
try { | ||
Thread.sleep(5000); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will hog a thread from the shared executorService. We should yield thread for other tasks while waiting for some condition
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @saurabhd336 thanks for pointing this out, I have made changes to avoid the thread hogging. Now the last active thread from the parallel threads will pick up the job to rebalance the sequential tables.
also adding @swaminathanmanish for review |
} | ||
// Last parallel thread to finish the table rebalance job will pick up the | ||
// sequential table rebalance execution | ||
if (activeThreads.decrementAndGet() == 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clever! 👍
@@ -103,6 +112,9 @@ public class PinotTenantRestletResource { | |||
@Inject | |||
ControllerMetrics _controllerMetrics; | |||
|
|||
@Inject | |||
ExecutorService _executorService; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if using the same shared executorService (which is also being used to handle multi get API call responses) for parallelly running multiple blocking operation like table rebalance is a good idea.. Although I do see same executorService being used to run rebalance for single table.
Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense. Even I referred to the single table rebalance implementation but agree with your point that it will affect the shared executorService in cases where the tenant has lot of tables and user runs the rebalance with high parallelism.
e2782c9
to
35a4a51
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM otherwise
public class RebalanceContext { | ||
@JsonProperty("dryRun") | ||
@ApiModelProperty(example = "false") | ||
private Boolean _dryRun = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Default should be dryRun=true
to make the protection? What's the current behavior for the rebalance API?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the table rebalance API its false by default. I am keeping all the configs and their defaults inline with the table rebalance API so that user gets consistent behaviour for similar APIs. But I agree with you point as tenant rebalance operation will have a bigger impact radius. For now I am inclined to keep the consistency with table rebalance API but let me know if you think otherwise and we can make dryRun=true
by default.
@JsonProperty("dryRun") | ||
@ApiModelProperty(example = "false") | ||
private Boolean _dryRun = false; | ||
@JsonProperty("reassignInstances") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are the default values for configs are the same as the existing API? If so, let's add TODO
comment to see if we can make the config easier.
Ideally, these configuration should be intelligently be picked up as much as possible. For instance, reassignInstances
should be triggered if we detect the need (i.e. there is extra server in the tenant that is not part of instance assignment or there is server in the instance assignment that is no longer be part of tenant).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I am referring the existing table rebalance API configs and their default values.
Great point on the auto feeding the configs wherever possible! I can take a follow up item to enhance the configs so as to keep the scope of the current PR limited. Will add a TODO
.
These config enhancements should be reflected at table rebalance as well as tenant rebalance at the same time so that we maintain the consistency.
d34a09c
to
b362946
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add Authorize
(introduced in #11016) annotation to the new added APIs
Tenant Rebalance API
This API addresses the task to rebalance all the tables for a given tenant. Usually when we tag new servers to a tenant it gets tedious to rebalance each table under that tenant to utilise the newly added servers. This operation becomes impossible to handle if the tenant has high number of tables. The proposed utility will allow users to rebalance a tenant and track its progress with minimal operational overhead.
More details are captured in this doc : Tenant Rebalance
Path :
/tenants/{tenantName}/rebalance
Method :
POST
Sample request body :
Sample response :
Tenant Rebalance Status tracking API
Path :
/tenants/rebalanceStatus/{jobId}
Method :
GET
Sample response body :