-
Notifications
You must be signed in to change notification settings - Fork 24.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ML] Use perAllocation and perDeployment memory usage in the model assignment planner #98874
Conversation
Hi @valeriy42, I've created a changelog YAML for you. |
Pinging @elastic/ml-core (Team:ML) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@valeriy42 this is failing CI because of this:
You can fix that by using |
@elasticmachine update branch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good work on this! LGTM
This PR adds an ability to estimate per deployment and per allocation memory usage of NLP transformer models. It uses torch.profiler and performs logs the peak memory usage during the inference. This information is then used in Elasticsearch to provision models with sufficient memory (elastic/elasticsearch#98874).
…signment planner (elastic#98874) Building upon elastic#98139, this PR extends the model assignment planning algorithms and the linear solver to use the extended memory fields. It also adds unit tests to verify the new behavior. I needed to adjust the old unit tests since we use the estimateMemoryUsage routine, which would compute 2*memoryBytes + 240 MB as the memory requirement. Previously, in the unit tests, we were simply using memoryBytes field value.
…in the model assignment planner" (#101853) The original PR #98874 missed the memory overhead adjustment from #86416. As it caused some BWC test failures on the CI, I reverted it in #101834. This PR reintegrates the functionality and extends the BWC integration test with the memory constant depending on the version of the old cluster.
…model assignment planner" This reverts commit 31ca2f7. The functionality of elastic#98874 is being removed from 8.12 because it means that models which were working successfully on 2GB nodes in 8.11 will no longer fit on 2GB nodes. This will be frustrating for trial users. Before 8.13 we need to do a more thorough assessment of which models will and won't fit on 2GB nodes as a result of better memory estimation. It may be possible to tweak the memory usage estimation so that we require more memory than 8.11 but not so much more that our recommended trial models no longer fit onto 2GB nodes.
Building upon #98139, this PR extends the model assignment planning algorithms and the linear solver to use the extended memory fields. It also adds unit tests to verify the new behavior.
I needed to adjust the old unit tests since we use the
estimateMemoryUsage
routine, which would compute2*memoryBytes + 240 MB
as the memory requirement. Previously, in the unit tests, we were simply usingmemoryBytes
field value.