Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use Collections.synchronizedSet and Collections.synchronizedMap for r… #1970

Merged

Conversation

cwperks
Copy link
Member

@cwperks cwperks commented Jul 27, 2022

…oles, securityRoles and attributes in User

Signed-off-by: Craig Perkins cwperx@amazon.com

Description

Creating a draft PR to solicit feedback on a potential fix for #1961 and #1927.

Users are reporting that when the security cache expires that an intermittent java.io.OptionalDataException occurs.

The error is thrown while trying to readObject() from a SafeObjectInputStream while deserializing getThreadContext().getHeader(ConfigConstants.OPENDISTRO_SECURITY_USER_HEADER).

I got the idea for this fix from here and here. Since the error was occurring on deserialization of the User object, I turned the HashSet and HashMap members of User into their corresponding Collections.synchronizedSet and Collections.synchronizedMap respectively.

Automated tests for this issue are being written.

  • Category (Enhancement, New feature, Bug fix, Test fix, Refactoring, Maintenance, Documentation)

Bug Fix

  • What is the old behavior before changes and new behavior after changes?

Old behavior: java.io.OptionalDataException exception is intermittently thrown on expiry of plugins.security.cache.ttl_minutes

New behavior: No exceptions thrown, bulk api does not throw exception at the expiration of the security cache

Issues Resolved

Testing

Testing was performed by following the steps outlined here: #1961 (comment)

Before the change: Cluster would throw the exception within a minute

After the change: Cluster is stable and running without runtime exception for a long time

Check List

  • New functionality includes testing
  • New functionality has been documented
  • Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

…oles, securityRoles and attributes in User

Signed-off-by: Craig Perkins <cwperx@amazon.com>
@DarshitChanpura
Copy link
Member

Good find @cwperks !!

@codecov-commenter
Copy link

codecov-commenter commented Jul 27, 2022

Codecov Report

Merging #1970 (0b2280f) into main (d96da6c) will increase coverage by 0.01%.
The diff coverage is 50.00%.

@@             Coverage Diff              @@
##               main    #1970      +/-   ##
============================================
+ Coverage     61.04%   61.05%   +0.01%     
- Complexity     3233     3234       +1     
============================================
  Files           256      256              
  Lines         18085    18085              
  Branches       3222     3222              
============================================
+ Hits          11040    11042       +2     
+ Misses         5470     5467       -3     
- Partials       1575     1576       +1     
Impacted Files Coverage Δ
...c/main/java/org/opensearch/security/user/User.java 52.63% <50.00%> (ø)
.../dlic/auth/ldap2/LDAPConnectionFactoryFactory.java 56.48% <0.00%> (-1.53%) ⬇️
...ecurity/configuration/ConfigurationRepository.java 74.31% <0.00%> (+2.18%) ⬆️

Help us with your feedback. Take ten seconds to tell us how you rate us.

Copy link
Member

@peternied peternied left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These kinds of bugs are the worst thank you for digging in on this issue.

As there was a provable (albeit complex) reproduction shouldn't we be able to create a test case? I spent some time trying to author a test that would create same state as the much more convoluted repro to no avail. Any other test that involves more systems running in concern - concurrently means that CI wouldn't be as reliable if its running at lower clock speed that dev boxes.

What about updating Base64Helper to ensure that collections objects are synchronized? We cannot easily do this with reflection as the classes do not implement a clear interface, and the definitions themselves are private, see Collections.java#L2005

I'd recommend we take the fix as is, if other folks have ideas on how we can test it I would be interested.

@msoler8785
Copy link

@peternied I'm willing to test on my prod cluster if you can provide build artifacts.

@peternied peternied marked this pull request as ready for review July 29, 2022 17:27
@peternied peternied requested a review from a team July 29, 2022 17:27
@peternied peternied added the backport 2.x backport to 2.x branch label Jul 29, 2022
@peternied
Copy link
Member

@msoler8785 Thanks for the generous offer. Unfortunately due to version number alignment you can only install plugins with the same version number as OpenSearch. We don't have a clean way to produce a build pointing to an older number.

The next scheduled release is OpenSearch 2.2.0 which will include this fix. According to the schedule there should be a viable built by Aug 4th if not sooner.

At your own risk; For a more immediate deployment, one could build the security plugin by checking out the version number of this plugin that corresponds to the branch of your cluster, cherry-pick this change, ./gradlew assemble -Dbuild.snapshot=false, replace the security jar on the machine under test with build/distributions/opensearch-security-X.Y.Z.0.jar.

@msoler8785
Copy link

@peternied thanks to your guidance I was able to get the plugin built and installed on all my cluster nodes. Initial testing looks good and I am no longer encountering those errors. I'll review over the weekend and update if anything changes. Thanks @cwperks!

@peternied
Copy link
Member

@opensearch-project/security I'd like to see this merged for 2.2.0 what do y'all think?

@DarshitChanpura
Copy link
Member

@opensearch-project/security I'd like to see this merged for 2.2.0 what do y'all think?

I agree

@peternied peternied merged commit 50a94b4 into opensearch-project:main Aug 1, 2022
opensearch-trigger-bot bot pushed a commit that referenced this pull request Aug 1, 2022
…oles, securityRoles and attributes in User (#1970)

Signed-off-by: Craig Perkins <cwperx@amazon.com>
(cherry picked from commit 50a94b4)
peternied pushed a commit that referenced this pull request Aug 1, 2022
…oles, securityRoles and attributes in User (#1970) (#1983)

Signed-off-by: Craig Perkins <cwperx@amazon.com>
(cherry picked from commit 50a94b4)

Co-authored-by: Craig Perkins <cwperx@amazon.com>
DarshitChanpura pushed a commit to DarshitChanpura/security that referenced this pull request Aug 1, 2022
…oles, securityRoles and attributes in User (opensearch-project#1970)

Signed-off-by: Craig Perkins <cwperx@amazon.com>
bharath-techie pushed a commit to bharath-techie/security that referenced this pull request Aug 12, 2022
…oles, securityRoles and attributes in User (opensearch-project#1970)

Signed-off-by: Craig Perkins <cwperx@amazon.com>
stephen-crawford pushed a commit to stephen-crawford/security that referenced this pull request Nov 10, 2022
…oles, securityRoles and attributes in User (opensearch-project#1970)

Signed-off-by: Craig Perkins <cwperx@amazon.com>
Signed-off-by: Stephen Crawford <steecraw@amazon.com>
wuychn pushed a commit to ochprince/security that referenced this pull request Mar 16, 2023
…oles, securityRoles and attributes in User (opensearch-project#1970) (opensearch-project#1983)

Signed-off-by: Craig Perkins <cwperx@amazon.com>
(cherry picked from commit 50a94b4)

Co-authored-by: Craig Perkins <cwperx@amazon.com>
stephen-crawford added a commit to stephen-crawford/security that referenced this pull request Nov 16, 2023
Signed-off-by: Stephen Crawford <steecraw@amazon.com>
willyborankin pushed a commit that referenced this pull request Nov 17, 2023
#3725)

### Description
This change backports #1970 in order to fix the OptionalDataException
issue encountered on the 1.3.x line.
The original change did not have any tests but #3637 added tests, so I
backported those as well and the changes required to support them. This
resulted in changes to the TestSecurityConfig.User class, the addition
of a IndexStateIsEqualToMatcher and User.java.

### Issues Resolved
- Resolves #3531
- Backports #1970 
- Backports #3637

### Check List
- [X] New functionality includes testing
- [ ] ~New functionality has been documented~
- [x] Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made
under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and
signing off your commits, please check
[here](https://github.com/opensearch-project/OpenSearch/blob/main/CONTRIBUTING.md#developer-certificate-of-origin).

---------

Signed-off-by: Stephen Crawford <steecraw@amazon.com>
Signed-off-by: Stephen Crawford <65832608+scrawfor99@users.noreply.github.com>
Signed-off-by: Peter Nied <petern@amazon.com>
Co-authored-by: Peter Nied <petern@amazon.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport 2.x backport to 2.x branch
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants