Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disallow removing some metadata fields by remove ingest processor #10895

Merged
merged 17 commits into from
Nov 22, 2023

Conversation

gaobinlong
Copy link
Collaborator

@gaobinlong gaobinlong commented Oct 24, 2023

Description

When removing metadata fields such as _index, _version, _id can cause unexpected result: OpenSearch process crash or null_pointer_exception, so it's better we do some check in remove ingest processor to prevent those things from happening.

In current implementation, the enriched document IngestDocument contains these metadata fields: _index, _version, _id, _version_type, _routing:

this.sourceAndMetadata.put(Metadata.INDEX.getFieldName(), index);

, in most cases, these metadata fields are used internally, but actually they can be removed by remove ingest processor. Removing these metadata fields are meaningless because these fields have corresponding query parameters in index API, remove ingest processor should not disrupt that API, if users want to remove those fields they should remove the query parameters of index API, remove ingest processor should only focus on the source fields of the incoming documents, this makes the responsibility of index API and remove ingest processor clear.

We have two options to prevent removing metadata fields in remove ingest processor, one option is that throwing an illegal_argument_exception with an error message like cannot remove metadata fields..., another option is to ignore the metadata fields when removing the fields from the source document. But in the long run, we may support exclude parameter for the remove processor, so if all the metadata fields don't math the exclude pattern, we shouldn't remove the metadata fields, we should ignore them, in order to make this behavior consistently, I prefer the option which ignoring all the metadata fields.

Related Issues

#10732

Check List

  • New functionality includes testing.
    • All tests pass
  • New functionality has been documented.
    • New functionality has javadoc added
  • Failing checks are inspected and point to the corresponding known issue(s) (See: Troubleshooting Failing Builds)
  • Commits are signed per the DCO using --signoff
  • Commit changes are listed out in CHANGELOG.md file (See: Changelog)
  • Public documentation issue/PR created

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Gao Binlong <gbinlong@amazon.com>
@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@github-actions
Copy link
Contributor

github-actions bot commented Oct 24, 2023

Compatibility status:

Checks if related components are compatible with change a12f312

Incompatible components

Incompatible components: [https://github.com/opensearch-project/performance-analyzer.git]

Skipped components

Compatible components

Compatible components: [https://github.com/opensearch-project/security-analytics.git, https://github.com/opensearch-project/asynchronous-search.git, https://github.com/opensearch-project/observability.git, https://github.com/opensearch-project/reporting.git, https://github.com/opensearch-project/job-scheduler.git, https://github.com/opensearch-project/opensearch-oci-object-storage.git, https://github.com/opensearch-project/custom-codecs.git, https://github.com/opensearch-project/performance-analyzer-rca.git, https://github.com/opensearch-project/common-utils.git, https://github.com/opensearch-project/notifications.git, https://github.com/opensearch-project/ml-commons.git, https://github.com/opensearch-project/k-nn.git, https://github.com/opensearch-project/anomaly-detection.git, https://github.com/opensearch-project/index-management.git, https://github.com/opensearch-project/neural-search.git, https://github.com/opensearch-project/geospatial.git, https://github.com/opensearch-project/security.git, https://github.com/opensearch-project/alerting.git, https://github.com/opensearch-project/cross-cluster-replication.git, https://github.com/opensearch-project/sql.git]

Signed-off-by: Gao Binlong <gbinlong@amazon.com>
@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@reta
Copy link
Collaborator

reta commented Nov 20, 2023

@gaobinlong a few minor comments but LGTM otherwise, thank you!

Copy link
Contributor

❌ Gradle check result for 3cb50e8: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Signed-off-by: Gao Binlong <gbinlong@amazon.com>
Copy link
Contributor

✅ Gradle check result for a12f312: SUCCESS

@gaobinlong
Copy link
Collaborator Author

Now all gradle checks have passed yet, but the code coverage check is not, that's because of the randomization in our test code.

@reta
Copy link
Collaborator

reta commented Nov 22, 2023

@gaobinlong we would need documentation update for this change as well, could you please create the issue at https://github.com/opensearch-project/documentation-website/issues ?

@reta reta merged commit a0b016b into opensearch-project:main Nov 22, 2023
28 of 29 checks passed
@reta reta added bug Something isn't working v3.0.0 Issues and PRs related to version 3.0.0 backport 2.x Backport to 2.x branch labels Nov 22, 2023
@opensearch-trigger-bot
Copy link
Contributor

The backport to 2.x failed:

The process '/usr/bin/git' failed with exit code 128

To backport manually, run these commands in your terminal:

# Navigate to the root of your repository
cd $(git rev-parse --show-toplevel)
# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add ../.worktrees/OpenSearch/backport-2.x 2.x
# Navigate to the new working tree
pushd ../.worktrees/OpenSearch/backport-2.x
# Create a new branch
git switch --create backport/backport-10895-to-2.x
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 a0b016bf154cf765483f38c4c7f135ae972004c2
# Push it to GitHub
git push --set-upstream origin backport/backport-10895-to-2.x
# Go back to the original working tree
popd
# Delete the working tree
git worktree remove ../.worktrees/OpenSearch/backport-2.x

Then, create a pull request where the base branch is 2.x and the compare/head branch is backport/backport-10895-to-2.x.

@reta
Copy link
Collaborator

reta commented Nov 22, 2023

@gaobinlong and we need manual backport to 2.x sadly :( please create one

gaobinlong added a commit to gaobinlong/OpenSearch that referenced this pull request Nov 23, 2023
…ensearch-project#10895)

* Ignore metadata fields when removing fields by remove ingest processor

Signed-off-by: Gao Binlong <gbinlong@amazon.com>

* Modify change log

Signed-off-by: Gao Binlong <gbinlong@amazon.com>

* Throw exception when removing some metadata fields

Signed-off-by: Gao Binlong <gbinlong@amazon.com>

* Format the code

Signed-off-by: Gao Binlong <gbinlong@amazon.com>

* Remove calling toLowerCase()

Signed-off-by: Gao Binlong <gbinlong@amazon.com>

* Remove calling toLowerCase()

Signed-off-by: Gao Binlong <gbinlong@amazon.com>

* Fix test failure

Signed-off-by: Gao Binlong <gbinlong@amazon.com>

* Add skip config in yml test file

Signed-off-by: Gao Binlong <gbinlong@amazon.com>

* Improve test coverage

Signed-off-by: Gao Binlong <gbinlong@amazon.com>

* Optimize some code

Signed-off-by: Gao Binlong <gbinlong@amazon.com>

---------

Signed-off-by: Gao Binlong <gbinlong@amazon.com>
(cherry picked from commit a0b016b)
@gaobinlong
Copy link
Collaborator Author

Thanks @reta ,I've created a document issue: 5634 and a backport PR: #11311.

reta pushed a commit that referenced this pull request Nov 23, 2023
…t processor (#11311)

* Disallow removing some metadata fields by remove ingest processor (#10895)

* Ignore metadata fields when removing fields by remove ingest processor

Signed-off-by: Gao Binlong <gbinlong@amazon.com>

* Modify change log

Signed-off-by: Gao Binlong <gbinlong@amazon.com>

* Throw exception when removing some metadata fields

Signed-off-by: Gao Binlong <gbinlong@amazon.com>

* Format the code

Signed-off-by: Gao Binlong <gbinlong@amazon.com>

* Remove calling toLowerCase()

Signed-off-by: Gao Binlong <gbinlong@amazon.com>

* Remove calling toLowerCase()

Signed-off-by: Gao Binlong <gbinlong@amazon.com>

* Fix test failure

Signed-off-by: Gao Binlong <gbinlong@amazon.com>

* Add skip config in yml test file

Signed-off-by: Gao Binlong <gbinlong@amazon.com>

* Improve test coverage

Signed-off-by: Gao Binlong <gbinlong@amazon.com>

* Optimize some code

Signed-off-by: Gao Binlong <gbinlong@amazon.com>

---------

Signed-off-by: Gao Binlong <gbinlong@amazon.com>
(cherry picked from commit a0b016b)

* Remove duplicated changelog

Signed-off-by: Gao Binlong <gbinlong@amazon.com>

---------

Signed-off-by: Gao Binlong <gbinlong@amazon.com>
fahadshamiinsta pushed a commit to fahadshamiinsta/OpenSearch270 that referenced this pull request Dec 4, 2023
…ensearch-project#10895)

* Ignore metadata fields when removing fields by remove ingest processor

Signed-off-by: Gao Binlong <gbinlong@amazon.com>

* Modify change log

Signed-off-by: Gao Binlong <gbinlong@amazon.com>

* Throw exception when removing some metadata fields

Signed-off-by: Gao Binlong <gbinlong@amazon.com>

* Format the code

Signed-off-by: Gao Binlong <gbinlong@amazon.com>

* Remove calling toLowerCase()

Signed-off-by: Gao Binlong <gbinlong@amazon.com>

* Remove calling toLowerCase()

Signed-off-by: Gao Binlong <gbinlong@amazon.com>

* Fix test failure

Signed-off-by: Gao Binlong <gbinlong@amazon.com>

* Add skip config in yml test file

Signed-off-by: Gao Binlong <gbinlong@amazon.com>

* Improve test coverage

Signed-off-by: Gao Binlong <gbinlong@amazon.com>

* Optimize some code

Signed-off-by: Gao Binlong <gbinlong@amazon.com>

---------

Signed-off-by: Gao Binlong <gbinlong@amazon.com>
deshsidd pushed a commit to deshsidd/OpenSearch that referenced this pull request Dec 11, 2023
…ensearch-project#10895)

* Ignore metadata fields when removing fields by remove ingest processor

Signed-off-by: Gao Binlong <gbinlong@amazon.com>

* Modify change log

Signed-off-by: Gao Binlong <gbinlong@amazon.com>

* Throw exception when removing some metadata fields

Signed-off-by: Gao Binlong <gbinlong@amazon.com>

* Format the code

Signed-off-by: Gao Binlong <gbinlong@amazon.com>

* Remove calling toLowerCase()

Signed-off-by: Gao Binlong <gbinlong@amazon.com>

* Remove calling toLowerCase()

Signed-off-by: Gao Binlong <gbinlong@amazon.com>

* Fix test failure

Signed-off-by: Gao Binlong <gbinlong@amazon.com>

* Add skip config in yml test file

Signed-off-by: Gao Binlong <gbinlong@amazon.com>

* Improve test coverage

Signed-off-by: Gao Binlong <gbinlong@amazon.com>

* Optimize some code

Signed-off-by: Gao Binlong <gbinlong@amazon.com>

---------

Signed-off-by: Gao Binlong <gbinlong@amazon.com>
rayshrey pushed a commit to rayshrey/OpenSearch that referenced this pull request Mar 18, 2024
…ensearch-project#10895)

* Ignore metadata fields when removing fields by remove ingest processor

Signed-off-by: Gao Binlong <gbinlong@amazon.com>

* Modify change log

Signed-off-by: Gao Binlong <gbinlong@amazon.com>

* Throw exception when removing some metadata fields

Signed-off-by: Gao Binlong <gbinlong@amazon.com>

* Format the code

Signed-off-by: Gao Binlong <gbinlong@amazon.com>

* Remove calling toLowerCase()

Signed-off-by: Gao Binlong <gbinlong@amazon.com>

* Remove calling toLowerCase()

Signed-off-by: Gao Binlong <gbinlong@amazon.com>

* Fix test failure

Signed-off-by: Gao Binlong <gbinlong@amazon.com>

* Add skip config in yml test file

Signed-off-by: Gao Binlong <gbinlong@amazon.com>

* Improve test coverage

Signed-off-by: Gao Binlong <gbinlong@amazon.com>

* Optimize some code

Signed-off-by: Gao Binlong <gbinlong@amazon.com>

---------

Signed-off-by: Gao Binlong <gbinlong@amazon.com>
shiv0408 pushed a commit to Gaurav614/OpenSearch that referenced this pull request Apr 25, 2024
…ensearch-project#10895)

* Ignore metadata fields when removing fields by remove ingest processor

Signed-off-by: Gao Binlong <gbinlong@amazon.com>

* Modify change log

Signed-off-by: Gao Binlong <gbinlong@amazon.com>

* Throw exception when removing some metadata fields

Signed-off-by: Gao Binlong <gbinlong@amazon.com>

* Format the code

Signed-off-by: Gao Binlong <gbinlong@amazon.com>

* Remove calling toLowerCase()

Signed-off-by: Gao Binlong <gbinlong@amazon.com>

* Remove calling toLowerCase()

Signed-off-by: Gao Binlong <gbinlong@amazon.com>

* Fix test failure

Signed-off-by: Gao Binlong <gbinlong@amazon.com>

* Add skip config in yml test file

Signed-off-by: Gao Binlong <gbinlong@amazon.com>

* Improve test coverage

Signed-off-by: Gao Binlong <gbinlong@amazon.com>

* Optimize some code

Signed-off-by: Gao Binlong <gbinlong@amazon.com>

---------

Signed-off-by: Gao Binlong <gbinlong@amazon.com>
Signed-off-by: Shivansh Arora <hishiv@amazon.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport 2.x Backport to 2.x branch backport-failed bug Something isn't working v3.0.0 Issues and PRs related to version 3.0.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants