[ML] update truncation default & adding field output when input is truncated #79942

benwtrent · 2021-10-27T17:42:15Z

This commit makes the two following changes (along with some refactoring)

Nlp results will now indicate if the input was truncated or not
The default truncation is now none instead of first

…uncated

elasticmachine · 2021-10-27T17:42:19Z

Pinging @elastic/ml-core (Team:ML)

benwtrent · 2021-10-27T18:33:06Z

@elasticmachine update branch

…-nlp-truncation

…nt/elasticsearch into feature/ml-update-nlp-truncation

davidkyle

LGTM

davidkyle · 2021-10-28T12:19:32Z

...c/main/java/org/elasticsearch/xpack/core/ml/inference/results/PyTorchPassThroughResults.java

        PyTorchPassThroughResults that = (PyTorchPassThroughResults) o;
        return Arrays.deepEquals(inference, that.inference) && Objects.equals(resultsField, that.resultsField);
    }

    @Override
    public int hashCode() {
-        return Objects.hash(Arrays.deepHashCode(inference), resultsField);
+        int result = Objects.hash(super.hashCode(), resultsField);


Suggested change

int result = Objects.hash(super.hashCode(), resultsField);

return Objects.hash(super.hashCode(), Arrays.deepHashCode(inference), resultsField);

davidkyle · 2021-10-28T12:21:17Z

...re/src/main/java/org/elasticsearch/xpack/core/ml/inference/results/TextEmbeddingResults.java

    }

    @Override
    public int hashCode() {
-        return Objects.hash(Arrays.hashCode(inference), resultsField);
+        int result = Objects.hash(super.hashCode(), resultsField);


Same comment; why not return Objects.hash(super.hashCode(), resultsField, Arrays.hashCode(inference))?

these were auto generated by Intellij :D. So I can update them.

davidkyle · 2021-10-28T12:32:42Z

...ugin/ml/src/main/java/org/elasticsearch/xpack/ml/inference/nlp/tokenizers/BertTokenizer.java

        if (numTokens > maxSequenceLength) {
+            isTruncated = true;


I would be tempted to put this is the switch case for FIRST & SECOND because NONE does not truncate it throws instead. Same result just a matter of preference

…-nlp-truncation

…uncated (elastic#79942) This commit makes the two following changes (along with some refactoring) - Nlp results will now indicate if the input was truncated or not - The default truncation is now `none` instead of `first`

elasticsearchmachine · 2021-10-28T14:42:04Z

💚 Backport successful

Status	Branch	Result
✅	8.0

…formance * upstream/master: (153 commits) [ML] update truncation default & adding field output when input is truncated (elastic#79942) [ML] stop using isAllowedByLicense for model license checks (elastic#79908) [ML] Retain built-in ML roles granting Kibana privileges (elastic#80014) [Transform] remove old mixed cluster BWC layers, not required for 8x (elastic#79927) Increase test timeout for CoordinatorTests testAllSearchesExecuted [Transform] add rolling upgrade tests for upgrade endpoint (elastic#79721) [ML] Update trained model docs for truncate parameter for bert tokenization (elastic#79652) `CoordinatorTests` sometimes needs three term bumps (elastic#79574) [ML] Account for service being triggered twice in tests (elastic#80000) SearchContext: remove unused variable (elastic#79917) Revert "Deprecate resolution loss on date field (elastic#78921)" (elastic#79914) Re-enable GeoIpDownloaderIT#testStartWithNoDatabases() (elastic#79907) Fix SnapshotBasedIndexRecoveryIT#testSeqNoBasedRecoveryIsUsedAfterPrimaryFailOver (elastic#79469) Fix RecoverySourceHandlerTests (elastic#79546) SQL: stabilize SqlSearchPageTimeoutIT (elastic#79928) Wait 3 seconds for the server to reload trust (elastic#79778) Skip automatically preserved request headers when rewriting (elastic#79973) Check whether stdout is a real console (elastic#79882) Convert remote license checker to use LicensedFeature (elastic#79876) Miscellaneous fixes for LDAP SDK v6 upgrade (elastic#79891) ... # Conflicts: # libs/x-content/src/main/java/org/elasticsearch/xcontent/support/filtering/FilterPath.java # libs/x-content/src/test/java/org/elasticsearch/xcontent/support/filtering/FilterPathGeneratorFilteringTests.java # libs/x-content/src/test/java/org/elasticsearch/xcontent/support/filtering/FilterPathTests.java

…uncated (#79942) (#80022) This commit makes the two following changes (along with some refactoring) - Nlp results will now indicate if the input was truncated or not - The default truncation is now `none` instead of `first`

[ML] update truncation default & adding field output when input is tr…

7eb8493

…uncated

benwtrent added >non-issue :ml Machine learning v8.0.0 v8.1.0 labels Oct 27, 2021

elasticmachine added the Team:ML Meta label for the ML team label Oct 27, 2021

elasticmachine and others added 4 commits October 27, 2021 14:33

Merge branch 'master' into feature/ml-update-nlp-truncation

41d256f

Merge remote-tracking branch 'upstream/master' into feature/ml-update…

cb6c401

…-nlp-truncation

updating docs

f1c13a8

Merge branch 'feature/ml-update-nlp-truncation' of github.com:benwtre…

051ecea

…nt/elasticsearch into feature/ml-update-nlp-truncation

davidkyle approved these changes Oct 28, 2021

View reviewed changes

benwtrent added 2 commits October 28, 2021 09:24

Merge remote-tracking branch 'upstream/master' into feature/ml-update…

7095428

…-nlp-truncation

addressing PR comments

38bd4c0

benwtrent added auto-backport-and-merge Automatically create backport pull requests and merge when ready auto-merge Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) labels Oct 28, 2021

elasticsearchmachine merged commit 375fc77 into elastic:master Oct 28, 2021

benwtrent deleted the feature/ml-update-nlp-truncation branch October 28, 2021 14:41

benwtrent mentioned this pull request Oct 28, 2021

[8.0] [ML] update truncation default & adding field output when input is truncated (#79942) #80022

Merged

jakelandis added v8.0.0-beta1 and removed v8.0.0 labels Nov 8, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ML] update truncation default & adding field output when input is truncated #79942

[ML] update truncation default & adding field output when input is truncated #79942

benwtrent commented Oct 27, 2021

elasticmachine commented Oct 27, 2021

benwtrent commented Oct 27, 2021

davidkyle left a comment

davidkyle Oct 28, 2021

davidkyle Oct 28, 2021 •

edited

Loading

benwtrent Oct 28, 2021

davidkyle Oct 28, 2021

elasticsearchmachine commented Oct 28, 2021

	int result = Objects.hash(super.hashCode(), resultsField);
	return Objects.hash(super.hashCode(), Arrays.deepHashCode(inference), resultsField);

[ML] update truncation default & adding field output when input is truncated #79942

[ML] update truncation default & adding field output when input is truncated #79942

Conversation

benwtrent commented Oct 27, 2021

elasticmachine commented Oct 27, 2021

benwtrent commented Oct 27, 2021

davidkyle left a comment

Choose a reason for hiding this comment

davidkyle Oct 28, 2021

Choose a reason for hiding this comment

davidkyle Oct 28, 2021 • edited Loading

Choose a reason for hiding this comment

benwtrent Oct 28, 2021

Choose a reason for hiding this comment

davidkyle Oct 28, 2021

Choose a reason for hiding this comment

elasticsearchmachine commented Oct 28, 2021

💚 Backport successful

davidkyle Oct 28, 2021 •

edited

Loading