[ML][Inference] Adding model memory estimations #48323

benwtrent · 2019-10-21T21:32:37Z

This adds model memory estimations and a setting to limit the amount of memory used by the cached models.

elasticmachine · 2019-10-21T21:32:39Z

Pinging @elastic/ml-core (:ml)

droberts195 · 2019-10-23T10:52:28Z

I think we will be creating a problem for ourselves by using the term "model memory" for this, and if a future extension of this work ever uses a field model_memory_limit then that will definitely cause confusion. Previous uses of model_memory_limit have referred to memory used by ML's native processes. Some people have already been confused about whether this in some way relates to the Java heap size. If we end up with some places in the ML plugin having a model_memory_limit that relates to native memory used by external processes and other places having a model_memory_limit that relates to memory used in the Java heap then that will make it really hard to answer future questions about memory limits succinctly.

If there is any other functionality where Java heap size usage can be restricted by a setting then it may be best to reuse that terminology. Or if we can somehow frame this functionality as some sort of "circuit breaker", then that may be good as that's the terminology used by aggregations (although for a slightly different use case). Or failing that we should think of a new term that isn't "model memory".

…ture/ml-inference-model-cache-limits

hendrikmuhs

I had a look at the cache implementation and hopefully added some useful feedback.

hendrikmuhs · 2019-10-24T06:25:29Z

...l/src/main/java/org/elasticsearch/xpack/ml/inference/loadingservice/ModelLoadingService.java

+     * How long should a model stay in the cache since its last access
+     *
+     * Since the model loading service is used related to ingest scenarios it makes sense for unused models to be evicted.
+     * If a processor references a model, but does not use that model for an extended period of time, we will evict it.


Nit: It's not clear to me how model usage is tracked (promotion of cache entries by usage). Does the ingest processor gets the model from the cache for every document or for every x documents? It would be good to document the proper interaction with this service.

ingest processor gets the model as many times as it needs. When a model is referenced by any processor, it is loaded into the cache. I can try to clarify the comment

hendrikmuhs · 2019-10-24T06:54:00Z

...l/src/main/java/org/elasticsearch/xpack/ml/inference/loadingservice/ModelLoadingService.java

@@ -77,13 +125,15 @@ public void getModel(String modelId, ActionListener<Model> modelActionListener)
     */


Returns false if the model is not loaded or actively being loaded

Nit: This is somewhat unclear, from the code I get that it's not loaded if not referenced which means not used e.g. in a pipeline. But this code is also called from getModel(...), which is confusing: I want to explicitly get a model but might not get it in the end?

Not really, if getModel(...) sees that somebody else kicked off a load (cluster state change?) but it is not loaded yet, this method will add the listener to the listener queue.

If the caller sees that the model is not in the middle of loading AND is not loaded, it will make a direct call to eager load it and NOT attempt to cache it.

I will try to clarify the comment.

hendrikmuhs · 2019-10-24T07:08:15Z

...l/src/main/java/org/elasticsearch/xpack/ml/inference/loadingservice/ModelLoadingService.java

            }
+            localModelCache.put(modelId, loadedModel);


I think the implementation has a small race: if >1 threads call this with the same modelId/Config the model gets loaded twice, gets overwritten in the cache but also will be twice in memory as the loadModel instances are returned (line 180). If line 173 kicks in you also load the model superfluously.

I suggest to remove line 168 and use computeIfAbsent on line 176 instead of put. That ensures that the model is loaded only if necessary and exactly once.

If line 173 kicks in you also load the model superfluously.

That is true, but the loading already kicked off. This check is here for the scenario when a processor references a model_id, the loading starts, but the the processor is removed before it can be fully loaded. So, we should not cache it as nothing cares about it.

If this method gets called more than once for the same modelId and loadedModel, it will not get added to cache twice.

Since loadingListeners is a map by modelId even if more than one model load event has occurred (which I am not convinced is possible), the listeners are only notified once since the map entry is removed in line 170.

The second thread to enter the synchronized block would see that loadingListeners does not have a key entry for modelId and will exit early without putting the model in cache.

With twice in memory I did not mean the cache: this would be impossible. I mean the instances in memory. say A and B call this, both load it. Than A enters the synchronized block first, puts it into the cache and "returns" the model. Whoever that is uses instanceA. Than B enters the block, overwrites the cache entry and "returns" the model instanceB.
In the end you have instanceB in the cache and instanceA and instanceB in memory.

This whole thing is a bit constructed and probably not possible due to the call chain before that can happen.

hendrikmuhs · 2019-10-24T07:17:55Z

...l/src/main/java/org/elasticsearch/xpack/ml/inference/loadingservice/ModelLoadingService.java

-                loadedModels.keySet().retainAll(allReferencedModelKeys);
+        ClusterState state = event.state();
+        IngestMetadata currentIngestMetadata = state.metaData().custom(IngestMetadata.TYPE);
+        Set<String> allReferencedModelKeys = getReferencedModelKeys(currentIngestMetadata);


what if the cluster state update does not change anything w.r.t. models, which is probably mostly the case? I miss a shortcut optimization.

nice catch, should exit early if allReferenceModelKeys. equalTo(referencedModels)

hendrikmuhs · 2019-10-24T07:21:12Z

...l/src/main/java/org/elasticsearch/xpack/ml/inference/loadingservice/ModelLoadingService.java

-                for (ActionListener<Model> listener : modelAndListeners.v2()) {
-                    listener.onFailure(new ElasticsearchException(msg));
-                }
+        }


nit: would be good for reading to mark the end of a synchronized block as comment e.g. } // synchronized(...)

…ture/ml-inference-model-cache-limits

hendrikmuhs

LGTM

hendrikmuhs · 2019-10-25T11:54:00Z

...l/src/main/java/org/elasticsearch/xpack/ml/inference/loadingservice/ModelLoadingService.java

            }
+            localModelCache.put(modelId, loadedModel);


With twice in memory I did not mean the cache: this would be impossible. I mean the instances in memory. say A and B call this, both load it. Than A enters the synchronized block first, puts it into the cache and "returns" the model. Whoever that is uses instanceA. Than B enters the block, overwrites the cache entry and "returns" the model instanceB.
In the end you have instanceB in the cache and instanceA and instanceB in memory.

This whole thing is a bit constructed and probably not possible due to the call chain before that can happen.

* [ML][Inference] adds lazy model loader and inference (#47410) This adds a couple of things: - A model loader service that is accessible via transport calls. This service will load in models and cache them. They will stay loaded until a processor no longer references them - A Model class and its first sub-class LocalModel. Used to cache model information and run inference. - Transport action and handler for requests to infer against a local model Related Feature PRs: * [ML][Inference] Adjust inference configuration option API (#47812) * [ML][Inference] adds logistic_regression output aggregator (#48075) * [ML][Inference] Adding read/del trained models (#47882) * [ML][Inference] Adding inference ingest processor (#47859) * [ML][Inference] fixing classification inference for ensemble (#48463) * [ML][Inference] Adding model memory estimations (#48323) * [ML][Inference] adding more options to inference processor (#48545) * [ML][Inference] handle string values better in feature extraction (#48584) * [ML][Inference] Adding _stats endpoint for inference (#48492) * [ML][Inference] add inference processors and trained models to usage (#47869) * [ML][Inference] add new flag for optionally including model definition (#48718) * [ML][Inference] adding license checks (#49056) * [ML][Inference] Adding memory and compute estimates to inference (#48955)

* [ML][Inference] adds lazy model loader and inference (elastic#47410) This adds a couple of things: - A model loader service that is accessible via transport calls. This service will load in models and cache them. They will stay loaded until a processor no longer references them - A Model class and its first sub-class LocalModel. Used to cache model information and run inference. - Transport action and handler for requests to infer against a local model Related Feature PRs: * [ML][Inference] Adjust inference configuration option API (elastic#47812) * [ML][Inference] adds logistic_regression output aggregator (elastic#48075) * [ML][Inference] Adding read/del trained models (elastic#47882) * [ML][Inference] Adding inference ingest processor (elastic#47859) * [ML][Inference] fixing classification inference for ensemble (elastic#48463) * [ML][Inference] Adding model memory estimations (elastic#48323) * [ML][Inference] adding more options to inference processor (elastic#48545) * [ML][Inference] handle string values better in feature extraction (elastic#48584) * [ML][Inference] Adding _stats endpoint for inference (elastic#48492) * [ML][Inference] add inference processors and trained models to usage (elastic#47869) * [ML][Inference] add new flag for optionally including model definition (elastic#48718) * [ML][Inference] adding license checks (elastic#49056) * [ML][Inference] Adding memory and compute estimates to inference (elastic#48955)

* [ML] ML Model Inference Ingest Processor (#49052) * [ML][Inference] adds lazy model loader and inference (#47410) This adds a couple of things: - A model loader service that is accessible via transport calls. This service will load in models and cache them. They will stay loaded until a processor no longer references them - A Model class and its first sub-class LocalModel. Used to cache model information and run inference. - Transport action and handler for requests to infer against a local model Related Feature PRs: * [ML][Inference] Adjust inference configuration option API (#47812) * [ML][Inference] adds logistic_regression output aggregator (#48075) * [ML][Inference] Adding read/del trained models (#47882) * [ML][Inference] Adding inference ingest processor (#47859) * [ML][Inference] fixing classification inference for ensemble (#48463) * [ML][Inference] Adding model memory estimations (#48323) * [ML][Inference] adding more options to inference processor (#48545) * [ML][Inference] handle string values better in feature extraction (#48584) * [ML][Inference] Adding _stats endpoint for inference (#48492) * [ML][Inference] add inference processors and trained models to usage (#47869) * [ML][Inference] add new flag for optionally including model definition (#48718) * [ML][Inference] adding license checks (#49056) * [ML][Inference] Adding memory and compute estimates to inference (#48955) * fixing version of indexed docs for model inference

benwtrent added >non-issue :ml Machine learning v8.0.0 v7.6.0 labels Oct 21, 2019

[ML][Inference] Adding model memory estimations

2e91eed

benwtrent force-pushed the feature/ml-inference-model-cache-limits branch from 3662d19 to 2e91eed Compare October 23, 2019 15:56

benwtrent marked this pull request as ready for review October 23, 2019 15:57

Merge remote-tracking branch 'upstream/feature/ml-inference' into fea…

ef55298

…ture/ml-inference-model-cache-limits

hendrikmuhs reviewed Oct 24, 2019

View reviewed changes

benwtrent added 2 commits October 24, 2019 09:22

addressing PR comments

919af71

Merge remote-tracking branch 'upstream/feature/ml-inference' into fea…

e92cb69

…ture/ml-inference-model-cache-limits

benwtrent requested a review from hendrikmuhs October 24, 2019 13:44

hendrikmuhs approved these changes Oct 25, 2019

View reviewed changes

benwtrent merged commit c6d977c into elastic:feature/ml-inference Oct 25, 2019

benwtrent deleted the feature/ml-inference-model-cache-limits branch October 25, 2019 15:46

benwtrent mentioned this pull request Nov 18, 2019

[7.x][ML] ML Model Inference Ingest Processor (#49052) #49257

Merged

jakelandis added v8.0.0-alpha1 and removed v8.0.0 labels Jul 26, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ML][Inference] Adding model memory estimations #48323

[ML][Inference] Adding model memory estimations #48323

benwtrent commented Oct 21, 2019

elasticmachine commented Oct 21, 2019

droberts195 commented Oct 23, 2019

hendrikmuhs left a comment

hendrikmuhs Oct 24, 2019

benwtrent Oct 24, 2019

hendrikmuhs Oct 24, 2019

benwtrent Oct 24, 2019 •

edited

Loading

hendrikmuhs Oct 24, 2019

benwtrent Oct 24, 2019 •

edited

Loading

hendrikmuhs Oct 25, 2019

hendrikmuhs Oct 24, 2019

benwtrent Oct 24, 2019

hendrikmuhs Oct 24, 2019

hendrikmuhs left a comment

hendrikmuhs Oct 25, 2019

		@@ -77,13 +125,15 @@ public void getModel(String modelId, ActionListener<Model> modelActionListener)
		*/

[ML][Inference] Adding model memory estimations #48323

[ML][Inference] Adding model memory estimations #48323

Conversation

benwtrent commented Oct 21, 2019

elasticmachine commented Oct 21, 2019

droberts195 commented Oct 23, 2019

hendrikmuhs left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

benwtrent Oct 24, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

benwtrent Oct 24, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hendrikmuhs left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

benwtrent Oct 24, 2019 •

edited

Loading

benwtrent Oct 24, 2019 •

edited

Loading