Skip to content

Commit

Permalink
[ML] adds new params to the start trained model deployment docs (elas…
Browse files Browse the repository at this point in the history
  • Loading branch information
benwtrent committed Oct 28, 2021
1 parent 375fc77 commit f9bf4e5
Show file tree
Hide file tree
Showing 2 changed files with 23 additions and 1 deletion.
Original file line number Diff line number Diff line change
Expand Up @@ -48,11 +48,31 @@ any node. The value `started` indicates the model has started on at least one
node. The value `fully_allocated` indicates the deployment has started on all
valid nodes.

`model_threads`::
(Optional, integer)
Indicates how many threads are used when sending inference requests to
the model. Increasing this value generally increases the throughput. Defaults to
1.

`inference_threads`::
(Optional, integer)
Sets the number of threads used by the inference process. This generally increases
the inference speed. The inference process is a compute-bound process; any number
greater than the number of available CPU cores on the machine does not increase the
inference speed.
Defaults to 1.

`queue_capacity`::
(Optional, integer)
Controls how many inference requests are allowed in the queue at a time. Once the
number of requests exceeds this value, new requests are rejected with a 429 error.
Defaults to 1024.

[[start-trained-model-deployment-example]]
== {api-examples-title}

The following example starts a new deployment for a
`elastic__d`istilbert-base-uncased-finetuned-conll03-english` trained model:
`elastic__distilbert-base-uncased-finetuned-conll03-english` trained model:

[source,console]
--------------------------------------------------
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -274,7 +274,9 @@ public static TaskParams fromXContent(XContentParser parser) {

private final String modelId;
private final long modelBytes;
// How many threads are used by the model during inference. Used to increase inference speed.
private final int inferenceThreads;
// How many threads are used when forwarding the request to the model. Used to increase throughput.
private final int modelThreads;
private final int queueCapacity;

Expand Down

0 comments on commit f9bf4e5

Please sign in to comment.