[ML] adds new params to the start trained model deployment docs (elas…

…tic#80016)
stu-elastic · Oct 28, 2021 · f9bf4e5 · f9bf4e5
1 parent 375fc77
commit f9bf4e5
Show file tree

Hide file tree

Showing 2 changed files with 23 additions and 1 deletion.
diff --git a/docs/reference/ml/df-analytics/apis/start-trained-model-deployment.asciidoc b/docs/reference/ml/df-analytics/apis/start-trained-model-deployment.asciidoc
@@ -48,11 +48,31 @@ any node. The value `started` indicates the model has started on at least one
 node. The value `fully_allocated` indicates the deployment has started on all
 valid nodes.
 
+`model_threads`::
+(Optional, integer)
+Indicates how many threads are used when sending inference requests to
+the model. Increasing this value generally increases the throughput. Defaults to
+1.
+
+`inference_threads`::
+(Optional, integer)
+Sets the number of threads used by the inference process. This generally increases
+the inference speed. The inference process is a compute-bound process; any number 
+greater than the number of available CPU cores on the machine does not increase the 
+inference speed.
+Defaults to 1.
+
+`queue_capacity`::
+(Optional, integer)
+Controls how many inference requests are allowed in the queue at a time. Once the
+number of requests exceeds this value, new requests are rejected with a 429 error.
+Defaults to 1024.
+
 [[start-trained-model-deployment-example]]
 == {api-examples-title}
 
 The following example starts a new deployment for a
-`elastic__d`istilbert-base-uncased-finetuned-conll03-english` trained model: 
+`elastic__distilbert-base-uncased-finetuned-conll03-english` trained model:
 
 [source,console]
 --------------------------------------------------

diff --git a/...c/main/java/org/elasticsearch/xpack/core/ml/action/StartTrainedModelDeploymentAction.java b/...c/main/java/org/elasticsearch/xpack/core/ml/action/StartTrainedModelDeploymentAction.java
@@ -274,7 +274,9 @@ public static TaskParams fromXContent(XContentParser parser) {
 
         private final String modelId;
         private final long modelBytes;
+        // How many threads are used by the model during inference. Used to increase inference speed.
         private final int inferenceThreads;
+        // How many threads are used when forwarding the request to the model. Used to increase throughput.
         private final int modelThreads;
         private final int queueCapacity;