[ML][Data Frame] adds new pipeline field to dest config #43124

benwtrent · 2019-06-11T21:03:14Z

This PR adds a new field pipeline that references an existing ingest pipeline via its id. If set, the DestConfig#pipeline field will be set on each index request. Meaning, all documents being indexed will go through the defined pipeline.

closes #43061

elasticmachine · 2019-06-11T21:03:16Z

Pinging @elastic/ml-core

benwtrent

This change may bring the need for adding BWC tests for data frames. If desired, will address that in a separate PR.

benwtrent · 2019-06-11T21:03:38Z

.../plugin/core/src/main/java/org/elasticsearch/xpack/core/dataframe/transforms/DestConfig.java

    }

    public DestConfig(final StreamInput in) throws IOException {
        index = in.readString();
+        if (in.getVersion().onOrAfter(Version.CURRENT)) {


This will be adjusted after backport

benwtrent · 2019-06-11T21:03:48Z

.../plugin/core/src/main/java/org/elasticsearch/xpack/core/dataframe/transforms/DestConfig.java

    public boolean isValid() {
        return index.isEmpty() == false;
    }

    @Override
    public void writeTo(StreamOutput out) throws IOException {
        out.writeString(index);
+        if (out.getVersion().onOrAfter(Version.CURRENT)) {


This will be adjusted after backport

benwtrent · 2019-06-12T11:54:04Z

I have been thinking more about how to handle pipelines in _preview. It may be beneficial to _simulate the pipeline given the created docs.

droberts195

What's there so far looks good, but, as you said in a comment, we need to think how to include this in the preview too. Otherwise the preview becomes actively misleading.

droberts195 · 2019-06-12T13:08:02Z

.../rest-high-level/src/main/java/org/elasticsearch/client/dataframe/transforms/DestConfig.java

+         * @return The {@link Builder} with index set
+         */
+        public Builder setIndex(String index) {
+            this.index = index;


Maybe Objects.requireNonNull(index) here so the error that the constructor will throw is traced to the root cause.

benwtrent · 2019-06-12T13:31:20Z

@droberts195 thinking more about _preview. If we do a pipeline simulation, I think it also makes sense to include doc metadata (_id, etc.) in the preview. Right now it is just the _source but if we allow pipelines, it makes sense for each preview entry to also include the pipeline metadata.

Consequently, even without a pipeline the preview results should probably look like {_id: id, _source: <the current preview objects>}

…-pipeline-support

benwtrent · 2019-06-12T19:42:35Z

Current simulation returns formatted docs like:

"preview" : [
    {
      "doc" : {
        "_index" : "_index",
        "_type" : "_doc",
        "_id" : "adzAekwbm_Y7QlKddxbhMG4AAAAAAAAA",
        "_source" : {
          "static_field" : 42,
          "machine" : {
            "os" : "ios"
          },
          "max(time)" : "2019-08-01T21:45:26.749Z"
        },
        "_ingest" : {
          "timestamp" : "2019-06-12T19:28:22.342755Z"
        }
      }
    },
...
]

It seems to me that we want the two returns to be as similar as possible. So, we can:

1️⃣ Nest the normal preview doc objects into a doc field.
2️⃣ Bring the doc field value supplied by the _simulate response up a level
3️⃣ Only display the _source from the _simulate response and don't change the format of the preview response without a pipeline at all.

I am not sure which would be better. When it comes to consuming the API, either work.

benwtrent · 2019-06-14T13:03:08Z

Ultimately, I have opted to use option 3️⃣ mentioned above. It keeps the preview response consistent with what is currently done. Additionally, we don't want to "advertise" the _id as for consistent data frame transforms, we need the _id field untouched by the user.

droberts195

Looks good! I agree that making the output of preview match the final document structure as closely as possible is the best option.

My main comment is about the authorization aspect of this.

droberts195 · 2019-06-14T16:01:35Z

.../java/org/elasticsearch/xpack/dataframe/action/TransportPreviewDataFrameTransformAction.java

+                                            String id = (String) doc.get(DataFrameField.DOCUMENT_ID_FIELD);
+                                            doc.keySet().removeIf(k -> k.startsWith("_"));
+                                            src.put("_source", doc);
+                                            src.put("_id", id);


Maybe also pass in _index (obtained from config.getDestination().getIndex()) so that if the pipeline accesses _index the simulation works like the real thing would.

droberts195 · 2019-06-14T16:05:33Z

.../java/org/elasticsearch/xpack/dataframe/action/TransportPreviewDataFrameTransformAction.java

+                                        ClientHelper.executeWithHeadersAsync(threadPool.getThreadContext().getHeaders(),
+                                            ClientHelper.DATA_FRAME_ORIGIN,
+                                            client,
+                                            SimulatePipelineAction.INSTANCE,


The action name for this is cluster:admin/ingest/pipeline/simulate, so previewing a data frame transform with a pipeline now requires that privilege. We should document this and also test how easy it is to understand the resulting error message if you have all the other privileges required to use the data frame transform except this one.

It's not ideal that using the data frame transform for real does not require any special privilege over and above being able to index data, but simulating does. The cluster privilege that includes cluster:admin/ingest/pipeline/simulate is manage_ingest_pipelines, which is the same one that lets you delete ingest pipelines! As a followup maybe we should enquire about the possibility of adding a simulate_ingest_pipelines cluster privilege that just lets you simulate, and not CRUD.

We also need to pass this information on to the UI team when the ability to add a pipeline gets added to the UI, as the UI might want to proactively prevent people who cannot simulate a pipeline from creating a data frame transform that uses one.

@droberts195 what do you think of not using the user headers for _simulate? I am not sure the reason behind requiring the special permission when they can already index documents referring to the pipeline itself....

Yes, that's a good idea. If we run this action as the internal _xpack user then it bypasses the need for the end user to have pipeline admin privileges. We're still requiring that they have permission to preview the data frame transform. If we consider simulating the pipeline an internal implementation detail of previewing the data frame transform then it's reasonable to run it as _xpack.

droberts195

LGTM

* [ML][Data Frame] adds new pipeline field to dest config * Adding pipeline support to _preview * removing unused import * moving towards extracting _source from pipeline simulation * fixing permission requirement, adding _index entry to doc

#43388) * [ML][Data Frame] adds new pipeline field to dest config (#43124) * [ML][Data Frame] adds new pipeline field to dest config * Adding pipeline support to _preview * removing unused import * moving towards extracting _source from pipeline simulation * fixing permission requirement, adding _index entry to doc * adjusting for java 8 compatibility * adjusting bwc serialization version to 7.3.0

[ML][Data Frame] adds new pipeline field to dest config

f9fe735

benwtrent added >enhancement v8.0.0 :ml/Transform Transform v7.3.0 labels Jun 11, 2019

benwtrent commented Jun 11, 2019

View reviewed changes

droberts195 reviewed Jun 12, 2019

View reviewed changes

Merge remote-tracking branch 'upstream/master' into feature/ml-df-add…

bdb7231

…-pipeline-support

benwtrent added 4 commits June 12, 2019 15:08

Adding pipeline support to _preview

7dccdca

removing unused import

8c1d974

moving towards extracting _source from pipeline simulation

104d796

Merge branch 'master' into feature/ml-df-add-pipeline-support

1a6850d

benwtrent requested a review from droberts195 June 14, 2019 13:04

droberts195 reviewed Jun 14, 2019

View reviewed changes

fixing permission requirement, adding _index entry to doc

97d1283

droberts195 approved these changes Jun 14, 2019

View reviewed changes

Merge branch 'master' into feature/ml-df-add-pipeline-support

0614fe0

benwtrent merged commit 9f29749 into elastic:master Jun 19, 2019

benwtrent deleted the feature/ml-df-add-pipeline-support branch June 19, 2019 17:58

benwtrent mentioned this pull request Jun 19, 2019

[7.x] [ML][Data Frame] adds new pipeline field to dest config (#43124) #43388

Merged

Mpdreamz mentioned this pull request Aug 7, 2019

[meta] 7.3 Release elastic/elasticsearch-net#4001

Closed

16 tasks

jakelandis removed the v8.0.0 label Jul 26, 2021

jakelandis added the v8.0.0-alpha1 label Jul 26, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ML][Data Frame] adds new pipeline field to dest config #43124

[ML][Data Frame] adds new pipeline field to dest config #43124

benwtrent commented Jun 11, 2019

elasticmachine commented Jun 11, 2019

benwtrent left a comment

benwtrent Jun 11, 2019

benwtrent Jun 11, 2019

benwtrent commented Jun 12, 2019

droberts195 left a comment

droberts195 Jun 12, 2019

benwtrent commented Jun 12, 2019

benwtrent commented Jun 12, 2019 •

edited

Loading

benwtrent commented Jun 14, 2019

droberts195 left a comment

droberts195 Jun 14, 2019

droberts195 Jun 14, 2019

benwtrent Jun 14, 2019

droberts195 Jun 14, 2019

droberts195 left a comment

[ML][Data Frame] adds new pipeline field to dest config #43124

[ML][Data Frame] adds new pipeline field to dest config #43124

Conversation

benwtrent commented Jun 11, 2019

elasticmachine commented Jun 11, 2019

benwtrent left a comment

Choose a reason for hiding this comment

benwtrent Jun 11, 2019

Choose a reason for hiding this comment

benwtrent Jun 11, 2019

Choose a reason for hiding this comment

benwtrent commented Jun 12, 2019

droberts195 left a comment

Choose a reason for hiding this comment

droberts195 Jun 12, 2019

Choose a reason for hiding this comment

benwtrent commented Jun 12, 2019

benwtrent commented Jun 12, 2019 • edited Loading

benwtrent commented Jun 14, 2019

droberts195 left a comment

Choose a reason for hiding this comment

droberts195 Jun 14, 2019

Choose a reason for hiding this comment

droberts195 Jun 14, 2019

Choose a reason for hiding this comment

benwtrent Jun 14, 2019

Choose a reason for hiding this comment

droberts195 Jun 14, 2019

Choose a reason for hiding this comment

droberts195 left a comment

Choose a reason for hiding this comment

benwtrent commented Jun 12, 2019 •

edited

Loading