Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] Cross Cluster Replication as a Core Component #2872

Open
nknize opened this issue Apr 12, 2022 · 6 comments
Open

[RFC] Cross Cluster Replication as a Core Component #2872

nknize opened this issue Apr 12, 2022 · 6 comments
Labels
discuss Issues intended to help drive brainstorming and decision making enhancement Enhancement or improvement to existing feature or request feature New feature or request Indexing:Replication Issues and PRs related to core replication framework eg segrep Meta Meta issue, not directly linked to a PR Roadmap:Modular Architecture Project-wide roadmap label

Comments

@nknize
Copy link
Collaborator

nknize commented Apr 12, 2022

Is your feature request related to a problem? Please describe.

CCR is currently an external plugin written in Kotlin and relying on internal components of the engine that do not guarantee backwards compatibility. This has already caused issues carrying CCR specific API settings in the core and now a concern about CPU performance when retrieving operations history from the lucene index. This external plugin is requiring core implementation guarantees that was not designed for external plugins.

Describe the solution you'd like

Refactor CCR as a core module and offer as a core feature. Not only would this provide internal compatibility that is not currently guaranteed to external plugins but it would allow the feature to be re-implemented on top of the more proper segment replication feature by replicating segments directly to the following cluster instead of slowly replaying operations from the translog.

Note: for now we should explore refactoring CCR from kotlin to java as we have no idea what kind of interpreter trickery kotlin plays that might lead to crazy direct memory OOM like the ILM issue discovered.

Describe alternatives you've considered

None. The current implementation should be considered obsolete and refactored once segment replication becomes GA.

Additional context

Related issues:

@nknize nknize added enhancement Enhancement or improvement to existing feature or request discuss Issues intended to help drive brainstorming and decision making feature New feature or request untriaged labels Apr 12, 2022
@krishna-ggk
Copy link

Created dedicated issue to discuss/track segment replication - cross-cluster-replication/issues/373.

@mikemccand
Copy link

Hi @krishna-ggk -- just continuing this discussion a bit from #2482:

  1. Ability to filter out documents to be replicated

Lucene has a powerful FilterLeafReader that works really well for this. You can far more efficiently filter on the post-indexed form, segment by segment, instead of the raw original input source docs.

True, definitely worth considering. One of the reason for considering filtering at source while replicating was to provide a security posture where filtered out documents don't move out of cluster. However given that current CCR doesn't implement filtering yet, we could consider seeking input from community to see if there is real need for this usecase in the RFC.

Oh you can choose where you do the filtering, either in the source cluster or the target cluster. So if security is important, always use FilterLeafReader at the source. But if it's less important, and performance capacity on the target cluster is ample compared to source, and networking bandwidth is sufficient, and the filtering is not too restrictive, filter at the target.

Or maybe just always filter at the source and don't risk security issues :)

In Lucene we are also working on making IndexWriter.addIndexes API concurrent to speed up use cases like this.

3. Reliance on refreshes which may have implications to some usecases

That's a good point -- if you somehow need CCR more frequently than refreshes than replicating segments might be an issue. But does this really happen in practice?

Since CCR targets Disaster Recovery usecases, the thought was to ensure ack'd writes are replicated as soon as possible.

OK I see.

I think this is yet another reason to add a bulk streaming indexing API to OpenSearch -- the synchronous (ack'd on every bulk request) bulk write model is not great for several reasons: it pushes concurrency tweaking (for higher throughput) out to clients, forcing them to also deal with RejectedExecutionException if they send too many concurrent requests; client must also play with block sizes to maximize throughput; it ties up more transient heap and adds GC load, holding all these blocks in flight; and it pays some price for fync'ing translog after each bulk request (though, Linux's EXT4 has made awesome progress recently on making fsync faster!).

With a streaming indexing API instead / in addition, users could send an endless stream of docs and the cluster could "do the right thing" dynamically picking appropriate concurrency and block sizes internally based on available cluster resources. Finally, when the client closes the bulk stream, at that point they get a write ack. This would allow users who do not need such synchronous durability to amortize that cost better.

@nknize
Copy link
Collaborator Author

nknize commented Apr 20, 2022

Created dedicated issue to discuss/track segment replication

I don't think we should fragment the conversation? Segrep is a core component and I think we need to look at baking CCR mechanisms w/ segrep in the core. I'm not convinced that looks like a completely separate CCR module instead of using segment based snapshot leader / restore follower. If it does that module should live in core.

Maybe we could use this as a meta issue and relocate #3020 to core?

Since CCR targets Disaster Recovery usecases

I think this makes sense absent remote storage but should still be a solution baked into segrep (more likely for on-prem use cases). If a user ops into remote storage, however, we should rely on the durability of the remote storage mechanism.

forcing them to also deal with RejectedExecutionException if they send too many concurrent requests;

This is a HUGE problem today.

The Streaming Index API can not only determine the appropriate level of concurrency based on resource load but also accept a user defined Durability Policy to guide the type of remote storage (warm, cold, etc), what documents to replicate across clusters (if any), whether or not to use a translog. It decouples the durability configuration as input to the segment replication setup.

@nknize nknize added the Meta Meta issue, not directly linked to a PR label Apr 20, 2022
@krishna-ggk
Copy link

krishna-ggk commented Apr 21, 2022

Created dedicated issue to discuss/track segment replication

I don't think we should fragment the conversation? Segrep is a core component and I think we need to look at baking CCR mechanisms w/ segrep in the core. I'm not convinced that looks like a completely separate CCR module instead of using segment based snapshot leader / restore follower. If it does that module should live in core.

Maybe we could use this as a meta issue and relocate opensearch-project/cross-cluster-replication#373 to core?

Done - I transferred cross-cluster-replication/issues/373 into OpenSearch/issues/3020. All I wanted to ensure was we discuss segrep and architectural details of moving to core separately. Anyways agree with avoiding further fragmentation of the conversation 👍

@krishna-ggk
Copy link

Hi @krishna-ggk -- just continuing this discussion a bit from #2482:

True, definitely worth considering. One of the reason for considering filtering at source while replicating was to provide a security posture where filtered out documents don't move out of cluster. However given that current CCR doesn't implement filtering yet, we could consider seeking input from community to see if there is real need for this usecase in the RFC.

Oh you can choose where you do the filtering, either in the source cluster or the target cluster. So if security is important, always use FilterLeafReader at the source. But if it's less important, and performance capacity on the target cluster is ample compared to source, and networking bandwidth is sufficient, and the filtering is not too restrictive, filter at the target.

Or maybe just always filter at the source and don't risk security issues :)

Thanks for the suggestions. Very likely we may have to end up filtering at source from security standpoint - however we can evaluate deeply while implementing it.

In Lucene we are also working on making IndexWriter.addIndexes API concurrent to speed up use cases like this.

Nice!

@ankitkala
Copy link
Member

ankitkala commented Jul 1, 2022

Closing this issue as we'll track this as part of Segment Replication for CCR: #3020

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discuss Issues intended to help drive brainstorming and decision making enhancement Enhancement or improvement to existing feature or request feature New feature or request Indexing:Replication Issues and PRs related to core replication framework eg segrep Meta Meta issue, not directly linked to a PR Roadmap:Modular Architecture Project-wide roadmap label
Projects
Status: New
Development

No branches or pull requests

8 participants