-
Notifications
You must be signed in to change notification settings - Fork 24.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for CPU ranges in desired nodes #86434
Conversation
4a031f9
to
b2b4f26
Compare
Pinging @elastic/es-distributed (Team:Distributed) |
Hi @fcofdez, I've created a changelog YAML for you. |
Pinging @elastic/clients-team (Team:Clients) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, two questions:
- Are the
memory
andstorage
fields going to also have this change eventually? - Can the
processors.min
be set withoutprocessors.max
or is the value always required?
Thanks for the review @sethmlarson
It might change as more internal users adopt this API.
When |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did an initial read and left a few comments.
@@ -77,7 +77,7 @@ PUT /_internal/desired_nodes/Ywkh3INLQcuPT49f6kcppA/100 | |||
"node.attr.data" : "hot", | |||
"node.attr.logical_availability_zone" : "zone-0" | |||
}, | |||
"processors" : 8, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we would not want to keep the processors: 8
case in the documentation (perhaps using 8.0
instead). I think the best world is if we have just the one number rather than multiple and having that case be simpler for those orchestrations that can supply it seems desirable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let us also mention that processors
can be specified as a range, that makes it easier to see which releases the range syntax is supported in (and documents it).
@@ -774,3 +767,135 @@ teardown: | |||
- { settings: { "node.external_id": "instance-000187"}, processors: 64, memory: "1b", storage: "1b", node_version: null } | |||
- match: { status: 400 } | |||
- match: { error.type: x_content_parse_exception } | |||
--- | |||
"Test update desired nodes using legacy processors field": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would not call this legacy, the min/max is not really the desired way to use this.
"Test update desired nodes using legacy processors field": | ||
- skip: | ||
reason: "contains is a newly added assertion" | ||
features: contains |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like we are not using contains
here (as well as the rest of the new tests)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copy-paste... 🤦
} | ||
|
||
public int roundedMin() { | ||
return Math.round(min); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we should round this down and round max up? That seems more intuitive, since then the actual quota of processors is still within the range.
@@ -86,21 +91,23 @@ private static Version parseVersion(String version) { | |||
} | |||
|
|||
private final Settings settings; | |||
private final int processors; | |||
private final Processors processors; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure this will work in a mixed cluster, since we are changing the type of a field stored in cluster state (regardless of whether the new API has been used) and an old version cannot read it. I think we need a rolling upgrade test. Also, I wonder if we should simply make it two fields instead like processors
and processors_range
- to simplify parsing both for ourselves and clients? We could still make the two fields mutually exclusive.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made the assumption that we would be able to break this since we don't have known users of this API, but that might be a risky assumption. I'm wondering if we should prevent upgrades during mixed-version clusters (at least when there are 8.2 and 8.3 master nodes) as we should be able to accept a processor range but the older nodes won't have that notion and we would have to send them the lower bound, maybe it's better to reject the updates in that phase to simplify things? wdty @henningandersen
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but that might be a risky assumption
Yes, it can be a bit hard to control that it is never called on an 8.2 cluster. And if so, it will essentially break the cluster during upgrade, which sounds bad.
we would have to send them the lower bound, maybe it's better to reject the updates in that phase to simplify things?
It sounds reasonable to me to reject a range specification during an upgrade to the release allowing the range specification, since otherwise we would be throwing away data.
I think we need to still accept a single processors specification in a mixed cluster though, i.e., as long as the client conforms to the old version API, everything should still work.
|
||
static { | ||
PROCESSORS_PARSER.declareFloat(ConstructingObjectParser.constructorArg(), MIN_FIELD); | ||
PROCESSORS_PARSER.declareFloat(ConstructingObjectParser.constructorArg(), MAX_FIELD); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be optional? I think k8s might only have requests set, leading to no max cpu. In that case we would still like to know the minimum value.
…search into desired-nodes-processors
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made a few tests using a dev version of the ECK operator (from this PR) and it works great 👍
Some comments/questions:
-
Setting a value of
0
for themax
CPU capacity does not seem to clear a previous max value. It also does not raise an error as I think it is the case in the current implementation. I'm wondering if setting a value of0
should raise an error as it is the case now? (and also maybe explain it in the documentation?) -
I also want to double check the units: if a memory capacity of
2333Mi
is set in the K8S resource, it is translated by the ECK operator as2446327808b
:
In K8S:
resources:
requests:
memory: 2333Mi
cpu: 2222m
limits:
memory: 2333Mi
cpu: 3141m
This the content of the API request by the ECK operator:
PUT /_internal/desired_nodes/history/version
"processors_range": {
"min": 2.222,
"max": 3.141
},
"memory": "2446327808b"
It is however then stored in Elasticsearch as 2.2gb
in the desired node state:
GET _internal/desired_nodes/_latest
"processors_range" : {
"min" : 2.222,
"max" : 3.141
},
"memory" : "2.2gb"
I just want to be sure it is the expected result given that it should be either 2.2783203125 GiB
or 2.446327808 GB
. I think this is because "2.2gb"
actually means "2.2GiB"
?
Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks mostly good. My only real concerns is with how ProcessorsRange.max()
returns the min
value if not specified, that looks wrong to me.
@@ -77,7 +77,7 @@ PUT /_internal/desired_nodes/Ywkh3INLQcuPT49f6kcppA/100 | |||
"node.attr.data" : "hot", | |||
"node.attr.logical_availability_zone" : "zone-0" | |||
}, | |||
"processors" : 8, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let us also mention that processors
can be specified as a range, that makes it easier to see which releases the range syntax is supported in (and documents it).
public class DesiredNodesUpgradeIT extends AbstractRollingTestCase { | ||
public void testUpgradeDesiredNodes() throws Exception { | ||
// Desired nodes was introduced in 8.2 | ||
if (UPGRADE_FROM_VERSION.before(Version.V_8_2_0)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we introduced it in 8.1, so we should test with upgrade from version = 8.1+?
|
||
@Override | ||
public Float max() { | ||
return max == null ? min : max; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would have thought we would make this return null if max
is null? Or infinity.
IllegalArgumentException.class, | ||
() -> new DesiredNode( | ||
settings, | ||
new DesiredNode.ProcessorsRange(randomFloat() + 1, randomInvalidFloatProcessor()), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
new DesiredNode.ProcessorsRange(randomFloat() + 1, randomInvalidFloatProcessor()), | |
new DesiredNode.ProcessorsRange(randomFloat() + 0.1, randomInvalidFloatProcessor()), |
IllegalArgumentException.class, | ||
() -> new DesiredNode( | ||
settings, | ||
new DesiredNode.ProcessorsRange(3.0f, 2.0f), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would pass even if comparing ints. Perhaps we can pick the lower bound randomly between 0.1-10 and the upper bound between 0.01 and lower bound-Math.ulp(lower bound)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TIL about ulp
. Thanks for the tip!
} | ||
|
||
private Float randomInvalidFloatProcessor() { | ||
return randomFrom(-1.0f, Float.NaN, Float.POSITIVE_INFINITY, Float.NEGATIVE_INFINITY); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we also include 0.0f?
case 1 -> randomDesiredNode(version, randomIntBetween(1, 256), settingsProvider); | ||
case 2 -> randomDesiredNode(version, randomIntBetween(1, 256) + randomFloat(), settingsProvider); | ||
default -> throw new IllegalStateException(); | ||
}; | ||
} | ||
|
||
public static DesiredNode randomDesiredNode(Version version, int processors, Consumer<Settings.Builder> settingsProvider) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we simply remove this overload?
Thanks for testing the PR @barkbay!
This is strange, it should fail (I even added a couple of tests that cover that case). When you mention that it does not clear the previous max value, do you mean that the Desired nodes remain in the previous version?
Yes, it's |
I'm not able to reproduce, it might be a problem on my side.
No, just want to be sure I was understanding the units correctly. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks Francisco.
server/src/main/java/org/elasticsearch/cluster/metadata/DesiredNode.java
Outdated
Show resolved
Hide resolved
…dNode.java Co-authored-by: Henning Andersen <33268011+henningandersen@users.noreply.github.com>
@elasticmachine upgrade branch |
Thanks all for the reviews! |
This commit adds support for CPU ranges in the desired nodes API.
This aligns better with environments where administrators/orchestrators
can define lower and upper bounds for the amount of CPUs that the
desired node would get once deployed.
This allows to provide information about the expected CPU and possible
allowed overcommit that the desired node will run on.
This was the previous expected body for the desired nodes API (we still support it):
Now it's possible to define
processors
orprocessors_range
as in:Note that
max
inprocessors_range
is optional.This commit also moves from representing CPUs as integers to
accept floating point numbers.
Note: I disabled the bwc yamlRestTests for versions < 8.3 since we introduced
a few "breaking changes" but since this is an internal API it should be fine.