[ML][Transforms] add wait_for_checkpoint flag to stop #47935

benwtrent · 2019-10-11T17:05:40Z

This adds the new flag wait_for_checkpoint to _stop.

This is a new version of the much older (now closed) PR: #45469

The state is all persisted to the index and relies on optimistic concurrency controls.

I thought a bit more about the default value being set to true and that choice continues to bug me a little bit. It sort of switches expectations around the current behavior and to me, it seems that for BWC, and consistent behavior, the default value should be false.

closes #45293

elasticmachine · 2019-10-11T17:05:42Z

Pinging @elastic/ml-core (:ml/Transform)

benwtrent · 2019-10-11T17:33:02Z

run elasticsearch-ci/2

droberts195 · 2019-10-15T10:21:01Z

...core/src/main/java/org/elasticsearch/xpack/core/transform/transforms/TransformTaskState.java

@@ -14,6 +14,7 @@
 import java.util.Locale;

 public enum TransformTaskState implements Writeable {
+    // TODO 8.x add a `STOPPING` state and BWC handling in ::fromString


We could do this earlier than 8.x given the feature is beta. It's too late for 7.5 but it would probably be worth adding STOPPING to 7.6 before making the feature GA.

+1. those TODO's are in danger of being forgotten. If there is a possibility to do it now, let's do it.

However, I am not fully convinced. STOPPING is usually used when there are technical reasons, e.g. if we need an intermediate state between OPEN/STARTED/RUNNING and STOPPED, usually due to resources that require time to stop. This isn't the case here, but this is a feature.

With other words, I find the current solution more appropriate.

Q: Do we return something in _stats after a call to _stop?wait_for_checkpoint=true ?

TransformStats.State returns a user readable state and I think it still returns INDEXING instead of STOPPING.

@hendrikmuhs the state object will have a new field called should_stop_at_checkpoint

TransformStats.State is what we return to the user, it's a combined value made from the indexer and task state. TransformStateis only written to/ read from the index, it's not exposed to the user since the stats endpoint has been rewritten. We could also expose a new field, but I think this isn't needed.

@hendrikmuhs

So you want to adjust TransformStats.State to say STOPPING when should_stop_at_checkpoint == true. Or STOPPING_AT_CHECKPOINT? Will need to think on this...

I would start with STOPPING, note that we can revisit that decision. If we expose it as STOPPING_AT_CHECKPOINT it's harder to go back. I doubt that STOPPING_AT_CHECKPOINT is useful from a user perspective. For similar reasons we made the decision to hide indexer and task state and provide a simplified state value.

hendrikmuhs

Looks good, can't wait to finally have this.

I added some thoughts I think we need to agree on to go on with this.

hendrikmuhs · 2019-10-16T13:26:02Z

...orm/src/main/java/org/elasticsearch/xpack/transform/rest/action/RestStopTransformAction.java

@@ -28,13 +28,15 @@ protected RestChannelConsumer prepareRequest(RestRequest restRequest, NodeClient
        boolean waitForCompletion = restRequest.paramAsBoolean(TransformField.WAIT_FOR_COMPLETION.getPreferredName(), false);
        boolean force = restRequest.paramAsBoolean(TransformField.FORCE.getPreferredName(), false);
        boolean allowNoMatch = restRequest.paramAsBoolean(TransformField.ALLOW_NO_MATCH.getPreferredName(), false);
+        boolean waitForCheckpoint = restRequest.paramAsBoolean(TransformField.WAIT_FOR_CHECKPOINT.getPreferredName(), false);


I think we should disallow the combination force and waitForCheckpoint

force pretty much negates the need for a waitForCheckpoint. So, having a force but when the transform is NOT failed, should essentially be a no-op for force. I think reasons for this are laid out in the source issue.

yes, fully agree. My argument is that something like _stop?force=true&waitForCheckpoint=true makes no sense. You only need force if the transform is failed but if it's failed you can not wait for a checkpoint and it does not make sense to call force if the transform isn't in a failed state. -> The parameters are mutually exclusive.

Now we can either simply ignore that or we disallow the combination. The problem is that if we disallow we should also disallow using force if the transform isn't failed. I am not sure if we have precedence here, I can only think of force, e.g. in ML jobs. I think we are lenient there, too?

hendrikmuhs · 2019-10-16T13:30:19Z

...form/src/main/java/org/elasticsearch/xpack/transform/persistence/TransformInternalIndex.java

@@ -54,6 +54,7 @@
     *                  progress::docs_processed, progress::docs_indexed,
     *                  stats::exponential_avg_checkpoint_duration_ms, stats::exponential_avg_documents_indexed,
     *                  stats::exponential_avg_documents_processed
+     * version 3 (7.6): state::should_stop_at_checkpoint


needs changes in TransformInternalIndexConstants, to keep the numbering despite the big rename I upped it there to "3" (003) for 7.5, but I haven't updated the history entry here, sorry.

So, here it should be version 4 and TransformInternalIndexConstants.INDEX_VERSION should become 004

hendrikmuhs · 2019-10-16T13:46:07Z

...t/rest-high-level/src/main/java/org/elasticsearch/client/transform/StopTransformRequest.java

    }

-    public StopTransformRequest(String id, Boolean waitForCompletion, TimeValue timeout) {
+    public StopTransformRequest(String id, Boolean waitForCompletion, TimeValue timeout, Boolean waitForCheckpoint) {


If I understand correctly I see 1 potential problem:

call _stop?wait_for_completion=true&wait_for_checkpoint=true

this lets the call block

call _stop?wait_for_checkpoint=false

that's fine, you should be able to switch wait_for_checkpoint by calling _stop again, however: the call to 1 will return after this call and there is no indication that we did not stop at a checkpoint. I think we should add a field to the response object noting whether the api has stopped at a checkpoint or not.

(In that respect I also wonder if it should be possible to revert the decission of wait_for_checkpoint by a call to _start)

would also be in line with a general improvement of Response objects (probably better placed in a separate PR)

@hendrikmuhs I agree on indicating if the stop was indeed at a checkpoint or not. That information would be available via a _stats call, but I suppose we can add something here.

As for negating with a call to _start, I do not want to complicate these interactions any further. I am against it. If somebody wants to start it again, they should call _stop?wait_for_checkpoint=false and then _start again.

I will have to think about being able to alert the user on if the _stop indeed caused the transform to stop at a checkpoint or not.

when wait_for_completion is spinning, it is checking on every cluster state update. Since we don't store this state information inside cluster state, there is nothing indicating if the value changed or not.

@hendrikmuhs after digging around, I don't think this is possible. There are no hooks into how the task is cleared and we consider it "stopped" when it is deleted, consequently losing insight into how it was stopped.

hendrikmuhs · 2019-10-16T13:54:25Z

...core/src/main/java/org/elasticsearch/xpack/core/transform/transforms/TransformTaskState.java

@@ -14,6 +14,7 @@
 import java.util.Locale;

 public enum TransformTaskState implements Writeable {
+    // TODO 8.x add a `STOPPING` state and BWC handling in ::fromString


+1. those TODO's are in danger of being forgotten. If there is a possibility to do it now, let's do it.

However, I am not fully convinced. STOPPING is usually used when there are technical reasons, e.g. if we need an intermediate state between OPEN/STARTED/RUNNING and STOPPED, usually due to resources that require time to stop. This isn't the case here, but this is a feature.

With other words, I find the current solution more appropriate.

hendrikmuhs · 2019-10-16T14:02:47Z

...core/src/main/java/org/elasticsearch/xpack/core/transform/transforms/TransformTaskState.java

@@ -14,6 +14,7 @@
 import java.util.Locale;

 public enum TransformTaskState implements Writeable {
+    // TODO 8.x add a `STOPPING` state and BWC handling in ::fromString


Q: Do we return something in _stats after a call to _stop?wait_for_checkpoint=true ?

TransformStats.State returns a user readable state and I think it still returns INDEXING instead of STOPPING.

hendrikmuhs · 2019-10-16T14:14:06Z

...sform/src/main/java/org/elasticsearch/xpack/transform/transforms/ClientTransformIndexer.java

+        ));
+    }
+
+    protected void doSaveState(TransformState state, ActionListener<Void> listener) {


can we name this somehow different and make it private to avoid confusing it with the real doSaveState?

You don't think that not being an Override and having different parameters is enough?

I will look into making it private

hendrikmuhs · 2019-10-16T15:07:29Z

...orm/src/main/java/org/elasticsearch/xpack/transform/action/TransportStopTransformAction.java

+                        transformTask.stop(request.isForce(), request.isWaitForCheckpoint());
+                        listener.onResponse(new Response(true));
+                    } catch (ElasticsearchException ex) {
+                        listener.onFailure(ex);


nit: as this causes often trouble, an explicit return would be good although superfluous logically here.

benwtrent · 2019-10-16T18:35:14Z

@elasticmachine update branch

benwtrent · 2019-10-16T19:17:52Z

run elasticsearch-ci/bwc
run elasticsearch-ci/default-distro

…b.com:benwtrent/elasticsearch into feature/ml-transform-wait_for_checkpoint-flag

…orm-wait_for_checkpoint-flag

…heckpoint

droberts195

LGTM

hendrikmuhs

LGTM, 1 nit: I think we decided to go without STOPPING, so we can remove the todo's?

hendrikmuhs · 2019-10-28T13:11:13Z

...gin/core/src/main/java/org/elasticsearch/xpack/core/transform/transforms/TransformState.java

@@ -43,6 +43,9 @@
    @Nullable
    private NodeAttributes node;

+    // TODO: 8.x this needs to be deprecated and we move towards a STOPPING TASK_STATE
+    private final boolean shouldStopAtNextCheckpoint;


the todo can be removed?

Adds `wait_for_checkpoint` for `_stop` API.

[ML][Transforms] add wait_for_checkpoint flag to stop

2cbf1f7

benwtrent added >enhancement v8.0.0 :ml/Transform Transform v7.5.0 labels Oct 11, 2019

droberts195 reviewed Oct 15, 2019

View reviewed changes

Merge branch 'master' into feature/ml-transform-wait_for_checkpoint-flag

136be34

hendrikmuhs reviewed Oct 16, 2019

View reviewed changes

benwtrent added v7.6.0 and removed v7.5.0 labels Oct 16, 2019

addressing PR comments

346e8ab

Merge branch 'master' into feature/ml-transform-wait_for_checkpoint-flag

43aec52

benwtrent added 2 commits October 16, 2019 15:49

bumping index version

bf9598d

Merge branch 'feature/ml-transform-wait_for_checkpoint-flag' of githu…

a8921be

…b.com:benwtrent/elasticsearch into feature/ml-transform-wait_for_checkpoint-flag

benwtrent requested review from hendrikmuhs and droberts195 October 24, 2019 20:21

benwtrent added 2 commits October 25, 2019 09:02

Merge remote-tracking branch 'upstream/master' into feature/ml-transf…

1453d04

…orm-wait_for_checkpoint-flag

Adjusting to show STOPPING with reason. Disallow force and wait_for_c…

4aeed43

…heckpoint

droberts195 approved these changes Oct 25, 2019

View reviewed changes

hendrikmuhs approved these changes Oct 28, 2019

View reviewed changes

removing unnecessary todos

ec734e0

benwtrent merged commit 451a5c0 into elastic:master Oct 28, 2019

benwtrent deleted the feature/ml-transform-wait_for_checkpoint-flag branch October 28, 2019 15:21

benwtrent mentioned this pull request Oct 28, 2019

[7.x] [ML][Transforms] add wait_for_checkpoint flag to stop (#47935) #48591

Merged

benwtrent added a commit to benwtrent/elasticsearch that referenced this pull request Oct 28, 2019

[ML][Transforms] add wait_for_checkpoint flag to stop (elastic#47935)

9428e62

Adds `wait_for_checkpoint` for `_stop` API.

benwtrent added a commit that referenced this pull request Oct 28, 2019

[ML][Transforms] add wait_for_checkpoint flag to stop (#47935) (#48591)

6ea59dd

Adds `wait_for_checkpoint` for `_stop` API.

This was referenced Feb 3, 2020

[meta] 7.6 release elastic/elasticsearch-net#4340

Closed

[meta] 7.6 release elastic/elasticsearch-net#4341

Closed

jakelandis added v8.0.0-alpha1 and removed v8.0.0 labels Jul 26, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ML][Transforms] add wait_for_checkpoint flag to stop #47935

[ML][Transforms] add wait_for_checkpoint flag to stop #47935

benwtrent commented Oct 11, 2019

elasticmachine commented Oct 11, 2019

benwtrent commented Oct 11, 2019

droberts195 Oct 15, 2019

hendrikmuhs Oct 16, 2019

hendrikmuhs Oct 16, 2019

benwtrent Oct 16, 2019

hendrikmuhs Oct 18, 2019

benwtrent Oct 18, 2019

hendrikmuhs Oct 22, 2019

hendrikmuhs left a comment

hendrikmuhs Oct 16, 2019

benwtrent Oct 16, 2019

hendrikmuhs Oct 25, 2019

hendrikmuhs Oct 16, 2019

hendrikmuhs Oct 16, 2019

hendrikmuhs Oct 16, 2019

benwtrent Oct 16, 2019

benwtrent Oct 16, 2019

benwtrent Oct 24, 2019

hendrikmuhs Oct 16, 2019

hendrikmuhs Oct 16, 2019

hendrikmuhs Oct 16, 2019

benwtrent Oct 16, 2019 •

edited

Loading

hendrikmuhs Oct 16, 2019

benwtrent commented Oct 16, 2019

benwtrent commented Oct 16, 2019

droberts195 left a comment

hendrikmuhs left a comment

hendrikmuhs Oct 28, 2019

[ML][Transforms] add wait_for_checkpoint flag to stop #47935

[ML][Transforms] add wait_for_checkpoint flag to stop #47935

Conversation

benwtrent commented Oct 11, 2019

elasticmachine commented Oct 11, 2019

benwtrent commented Oct 11, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hendrikmuhs left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

benwtrent Oct 16, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

benwtrent commented Oct 16, 2019

benwtrent commented Oct 16, 2019

droberts195 left a comment

Choose a reason for hiding this comment

hendrikmuhs left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

benwtrent Oct 16, 2019 •

edited

Loading