Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Remote Store] Inject remote store in IndexShard instead of RemoteStoreRefreshListener #3703

Merged

Conversation

sachinpkale
Copy link
Member

Signed-off-by: Sachin Kale kalsac@amazon.com

Description

  • Currently, we inject RemoteStoreRefreshListener into IndexShard where IndexService has the logic of creating an instance of RemoteStoreRefreshListener. As IndexShard already takes Store as one of the parameters, to make the behavior consistent, this change injects another instance of Store (created with RemoteDisrectory) to IndexShard.
  • As per the comment [Remote Store] Upload segments to remote store post refresh #3460 (comment), RepositoriesService is injected into RemoteDirectoryFactory and factory takes care of fetching the repository for the given name.

Issues Resolved

Check List

  • New functionality includes testing.
    • All tests pass
  • New functionality has been documented.
    • New functionality has javadoc added
  • Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@opensearch-ci-bot
Copy link
Collaborator

✅   Gradle Check success 8a3c14c5fe347910bcc5938b69d01352464e7263
Log 6338

Reports 6338

@opensearch-ci-bot
Copy link
Collaborator

✅   Gradle Check success 7b3a31d510b5dd02ddfd5c6244ca3e139826e92a
Log 6343

Reports 6343

@@ -3205,8 +3211,9 @@ private EngineConfig newEngineConfig(LongSupplier globalCheckpointSupplier) {

final List<ReferenceManager.RefreshListener> internalRefreshListener = new ArrayList<>();
internalRefreshListener.add(new RefreshMetricUpdater(refreshMetric));
if (remoteStoreRefreshListener != null && shardRouting.primary()) {
internalRefreshListener.add(remoteStoreRefreshListener);
if (remoteStore != null && shardRouting.primary()) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe extract it into remoteStoreEnabled()

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -329,7 +331,7 @@ public IndexShard(
final RetentionLeaseSyncer retentionLeaseSyncer,
final CircuitBreakerService circuitBreakerService,
@Nullable final SegmentReplicationCheckpointPublisher checkpointPublisher,
@Nullable final RemoteStoreRefreshListener remoteStoreRefreshListener
@Nullable final Store remoteStore
Copy link
Collaborator

@Bukhtawar Bukhtawar Jun 27, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if the Store should be abstracted as a CompositeStore that has data backed in remote. IndexShard shouldn't worry about exposing or maintaining another Store that is remote

class CompositeStore extends Store
{ 
  private Store localStore;
  private Store remoteStore, 
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need some time to think about this abstraction. Tracking here: #3719

@opensearch-ci-bot
Copy link
Collaborator

✅   Gradle Check success 49412c44fd45b4c154f66620c5e2c7b2ee1c0c88
Log 6383

Reports 6383

Comment on lines +3214 to +3216
if (isRemoteStoreEnabled()) {
Directory remoteDirectory = ((FilterDirectory) ((FilterDirectory) remoteStore.directory()).getDelegate()).getDelegate();
internalRefreshListener.add(new RemoteStoreRefreshListener(store.directory(), remoteDirectory));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I couldn't find where are we handling the reference counting for Store. The call to directory() requires the Store to be Open(ref counter > 0) so this call would fail?

public Directory directory() {
ensureOpen();
return directory;
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When store is created, default refCount is 1.

/**
* A basic RefCounted implementation that is initialized with a
* ref count of 1 and calls {@link #closeInternal()} once it reaches
* a 0 ref count
*/
public abstract class AbstractRefCounted implements RefCounted {

Copy link
Collaborator

@Bukhtawar Bukhtawar Jun 28, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @sachinpkale so when should Store be closed for remote store as in reference count decremented or explicitly closed? I see IndexShard#closeShard closing the local store, do we need to handle close for remote store as well?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As of now, we are not closing remoteStore. Let me add remoteStore close as a part of IndexShard.close()

Copy link
Collaborator

@Bukhtawar Bukhtawar Jun 28, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For local store we have ShardStoreDeleter as a listener that cleans up data, for remote we are deleting it as a part of close. Also when a shard relocates to another node, the shard engine I believe closes, would that mean we are removing data from the remote? I guess that would be no, but please verify if that could happen

Copy link
Collaborator

@Bukhtawar Bukhtawar Jun 28, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just have a feeling we might be leaking the Store on abrupt Engine failures?

/**
* Closes the engine without acquiring the write lock. This should only be
* called while the write lock is hold or in a disaster condition ie. if the engine
* is failed.
*/
@Override
protected final void closeNoLock(String reason, CountDownLatch closedLatch) {
if (isClosed.compareAndSet(false, true)) {
assert rwl.isWriteLockedByCurrentThread() || failEngineLock.isHeldByCurrentThread()
: "Either the write lock must be held or the engine must be currently be failing itself";
try {
this.versionMap.clear();
if (internalReaderManager != null) {
internalReaderManager.removeListener(versionMap);
}
try {
IOUtils.close(externalReaderManager, internalReaderManager);
} catch (Exception e) {
logger.warn("Failed to close ReaderManager", e);
}
try {
IOUtils.close(translogManager.getTranslog());
} catch (Exception e) {
logger.warn("Failed to close translog", e);
}
// no need to commit in this case!, we snapshot before we close the shard, so translog and all sync'ed
logger.trace("rollback indexWriter");
try {
indexWriter.rollback();
} catch (AlreadyClosedException ex) {
failOnTragicEvent(ex);
throw ex;
}
logger.trace("rollback indexWriter done");
} catch (Exception e) {
logger.warn("failed to rollback writer on close", e);
} finally {
try {
store.decRef();
logger.debug("engine closed [{}]", reason);
} finally {
closedLatch.countDown();
}

Can we add assertions in the tests to ensure the state of the remote store?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we should not delete data on close. I think I have removed the blobcontainer.delete() as a part of some other commit which is not included in this PR. Let me add that.

I did not get the store leaking problem though. Are you talking about store or remoteStore?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In a case of abrupt Engine failure should we close the remote store similar to how a local store gets closed?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But internal engine is not aware of the remoteStore. Its scope is limited to IndexShard.

@sachinpkale sachinpkale force-pushed the feature/inject-remote-store branch 3 times, most recently from fa4996f to 2326fca Compare June 29, 2022 03:20
@sachinpkale
Copy link
Member Author

Build is failing with following exception:

* What went wrong:
Execution failed for task ':plugins:discovery-ec2:qa:amazon-ec2:yamlRestTestInstanceProfile'.
> `node{:plugins:discovery-ec2:qa:amazon-ec2:yamlRestTestInstanceProfile-0}` failed to wait for ports files after 135000 MILLISECONDS

Running on local worked fine.

Sachin Kale added 5 commits June 29, 2022 14:55
Signed-off-by: Sachin Kale <kalsac@amazon.com>
Signed-off-by: Sachin Kale <kalsac@amazon.com>
Signed-off-by: Sachin Kale <kalsac@amazon.com>
Signed-off-by: Sachin Kale <kalsac@amazon.com>
Signed-off-by: Sachin Kale <kalsac@amazon.com>
@Bukhtawar Bukhtawar merged commit 30209d8 into opensearch-project:main Jun 29, 2022
imRishN pushed a commit to imRishN/OpenSearch that referenced this pull request Jul 3, 2022
…reRefreshListener (opensearch-project#3703)

* Inject remote store in IndexShard instead of RemoteStoreRefreshListener

Signed-off-by: Sachin Kale <kalsac@amazon.com>

* Pass supplier of RepositoriesService to RemoteDirectoryFactory

Signed-off-by: Sachin Kale <kalsac@amazon.com>

* Create isRemoteStoreEnabled function for IndexShard

Signed-off-by: Sachin Kale <kalsac@amazon.com>

* Explicitly close remoteStore on indexShard close

Signed-off-by: Sachin Kale <kalsac@amazon.com>

* Change RemoteDirectory.close to a no-op

Signed-off-by: Sachin Kale <kalsac@amazon.com>

Co-authored-by: Sachin Kale <kalsac@amazon.com>
sachinpkale added a commit to sachinpkale/OpenSearch that referenced this pull request Sep 1, 2022
…reRefreshListener (opensearch-project#3703)

* Inject remote store in IndexShard instead of RemoteStoreRefreshListener

Signed-off-by: Sachin Kale <kalsac@amazon.com>

* Pass supplier of RepositoriesService to RemoteDirectoryFactory

Signed-off-by: Sachin Kale <kalsac@amazon.com>

* Create isRemoteStoreEnabled function for IndexShard

Signed-off-by: Sachin Kale <kalsac@amazon.com>

* Explicitly close remoteStore on indexShard close

Signed-off-by: Sachin Kale <kalsac@amazon.com>

* Change RemoteDirectory.close to a no-op

Signed-off-by: Sachin Kale <kalsac@amazon.com>

Co-authored-by: Sachin Kale <kalsac@amazon.com>
sachinpkale added a commit to sachinpkale/OpenSearch that referenced this pull request Sep 2, 2022
…reRefreshListener (opensearch-project#3703)

* Inject remote store in IndexShard instead of RemoteStoreRefreshListener

Signed-off-by: Sachin Kale <kalsac@amazon.com>

* Pass supplier of RepositoriesService to RemoteDirectoryFactory

Signed-off-by: Sachin Kale <kalsac@amazon.com>

* Create isRemoteStoreEnabled function for IndexShard

Signed-off-by: Sachin Kale <kalsac@amazon.com>

* Explicitly close remoteStore on indexShard close

Signed-off-by: Sachin Kale <kalsac@amazon.com>

* Change RemoteDirectory.close to a no-op

Signed-off-by: Sachin Kale <kalsac@amazon.com>

Co-authored-by: Sachin Kale <kalsac@amazon.com>
sachinpkale added a commit to sachinpkale/OpenSearch that referenced this pull request Sep 2, 2022
…reRefreshListener (opensearch-project#3703)

* Inject remote store in IndexShard instead of RemoteStoreRefreshListener

Signed-off-by: Sachin Kale <kalsac@amazon.com>

* Pass supplier of RepositoriesService to RemoteDirectoryFactory

Signed-off-by: Sachin Kale <kalsac@amazon.com>

* Create isRemoteStoreEnabled function for IndexShard

Signed-off-by: Sachin Kale <kalsac@amazon.com>

* Explicitly close remoteStore on indexShard close

Signed-off-by: Sachin Kale <kalsac@amazon.com>

* Change RemoteDirectory.close to a no-op

Signed-off-by: Sachin Kale <kalsac@amazon.com>

Co-authored-by: Sachin Kale <kalsac@amazon.com>
Bukhtawar pushed a commit that referenced this pull request Sep 2, 2022
…4380)

* [Remote Store] Upload segments to remote store post refresh (#3460)

* Add RemoteDirectory interface to copy segment files to/from remote store

Signed-off-by: Sachin Kale <kalsac@amazon.com>

Co-authored-by: Sachin Kale <kalsac@amazon.com>

* Add index level setting for remote store

Signed-off-by: Sachin Kale <kalsac@amazon.com>

Co-authored-by: Sachin Kale <kalsac@amazon.com>

* Add RemoteDirectoryFactory and use RemoteDirectory instance in RefreshListener

Co-authored-by: Sachin Kale <kalsac@amazon.com>
Signed-off-by: Sachin Kale <kalsac@amazon.com>

* Upload segment to remote store post refresh

Signed-off-by: Sachin Kale <kalsac@amazon.com>

Co-authored-by: Sachin Kale <kalsac@amazon.com>
Signed-off-by: Sachin Kale <kalsac@amazon.com>

* [Remote Store] Inject remote store in IndexShard instead of RemoteStoreRefreshListener (#3703)

* Inject remote store in IndexShard instead of RemoteStoreRefreshListener

Signed-off-by: Sachin Kale <kalsac@amazon.com>

* Pass supplier of RepositoriesService to RemoteDirectoryFactory

Signed-off-by: Sachin Kale <kalsac@amazon.com>

* Create isRemoteStoreEnabled function for IndexShard

Signed-off-by: Sachin Kale <kalsac@amazon.com>

* Explicitly close remoteStore on indexShard close

Signed-off-by: Sachin Kale <kalsac@amazon.com>

* Change RemoteDirectory.close to a no-op

Signed-off-by: Sachin Kale <kalsac@amazon.com>

Co-authored-by: Sachin Kale <kalsac@amazon.com>

* [Remote Store] Add remote store restore API implementation (#3642)

* Add remote restore API implementation

Signed-off-by: Sachin Kale <kalsac@amazon.com>

* [Remote Store] Add support to add nested settings for remote store (#4060)

* Add support to add nested settings for remote store

Signed-off-by: Sachin Kale <kalsac@amazon.com>

* [Remote Store] Add rest endpoint for remote store restore (#3576)

* Add rest endpoint for remote store restore

Signed-off-by: Sachin Kale <kalsac@amazon.com>

* [Remote Store] Add validator that forces segment replication type before enabling remote store (#4175)

* Add validator that forces segment replication type before enabling remote store

Signed-off-by: Sachin Kale <kalsac@amazon.com>

* [Remote Store] Change remote_store setting validation message to make it more clear (#4199)

* Change remote_store setting validation message to make it more clear

Signed-off-by: Sachin Kale <kalsac@amazon.com>

* [Remote Store] Add RemoteSegmentStoreDirectory to interact with remote segment store (#4020)

* Add RemoteSegmentStoreDirectory to interact with remote segment store

Signed-off-by: Sachin Kale <kalsac@amazon.com>

* Use RemoteSegmentStoreDirectory instead of RemoteDirectory (#4240)

* Use RemoteSegmentStoreDirectory instead of RemoteDirectory

Signed-off-by: Sachin Kale <kalsac@amazon.com>

Signed-off-by: Sachin Kale <kalsac@amazon.com>
Co-authored-by: Sachin Kale <kalsac@amazon.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants