[RFC] Cross Cluster Replication as a Core Component #2872

nknize · 2022-04-12T04:18:19Z

Is your feature request related to a problem? Please describe.

CCR is currently an external plugin written in Kotlin and relying on internal components of the engine that do not guarantee backwards compatibility. This has already caused issues carrying CCR specific API settings in the core and now a concern about CPU performance when retrieving operations history from the lucene index. This external plugin is requiring core implementation guarantees that was not designed for external plugins.

Describe the solution you'd like

Refactor CCR as a core module and offer as a core feature. Not only would this provide internal compatibility that is not currently guaranteed to external plugins but it would allow the feature to be re-implemented on top of the more proper segment replication feature by replicating segments directly to the following cluster instead of slowly replaying operations from the translog.

Note: for now we should explore refactoring CCR from kotlin to java as we have no idea what kind of interpreter trickery kotlin plays that might lead to crazy direct memory OOM like the ILM issue discovered.

Describe alternatives you've considered

None. The current implementation should be considered obsolete and refactored once segment replication becomes GA.

Additional context

Related issues:

krishna-ggk · 2022-04-20T11:59:56Z

Created dedicated issue to discuss/track segment replication - cross-cluster-replication/issues/373.

mikemccand · 2022-04-20T12:28:52Z

Hi @krishna-ggk -- just continuing this discussion a bit from #2482:

Ability to filter out documents to be replicated

Lucene has a powerful FilterLeafReader that works really well for this. You can far more efficiently filter on the post-indexed form, segment by segment, instead of the raw original input source docs.

True, definitely worth considering. One of the reason for considering filtering at source while replicating was to provide a security posture where filtered out documents don't move out of cluster. However given that current CCR doesn't implement filtering yet, we could consider seeking input from community to see if there is real need for this usecase in the RFC.

Oh you can choose where you do the filtering, either in the source cluster or the target cluster. So if security is important, always use FilterLeafReader at the source. But if it's less important, and performance capacity on the target cluster is ample compared to source, and networking bandwidth is sufficient, and the filtering is not too restrictive, filter at the target.

Or maybe just always filter at the source and don't risk security issues :)

In Lucene we are also working on making IndexWriter.addIndexes API concurrent to speed up use cases like this.

3. Reliance on refreshes which may have implications to some usecases

That's a good point -- if you somehow need CCR more frequently than refreshes than replicating segments might be an issue. But does this really happen in practice?

Since CCR targets Disaster Recovery usecases, the thought was to ensure ack'd writes are replicated as soon as possible.

OK I see.

I think this is yet another reason to add a bulk streaming indexing API to OpenSearch -- the synchronous (ack'd on every bulk request) bulk write model is not great for several reasons: it pushes concurrency tweaking (for higher throughput) out to clients, forcing them to also deal with RejectedExecutionException if they send too many concurrent requests; client must also play with block sizes to maximize throughput; it ties up more transient heap and adds GC load, holding all these blocks in flight; and it pays some price for fync'ing translog after each bulk request (though, Linux's EXT4 has made awesome progress recently on making fsync faster!).

With a streaming indexing API instead / in addition, users could send an endless stream of docs and the cluster could "do the right thing" dynamically picking appropriate concurrency and block sizes internally based on available cluster resources. Finally, when the client closes the bulk stream, at that point they get a write ack. This would allow users who do not need such synchronous durability to amortize that cost better.

nknize · 2022-04-20T15:40:39Z

Created dedicated issue to discuss/track segment replication

I don't think we should fragment the conversation? Segrep is a core component and I think we need to look at baking CCR mechanisms w/ segrep in the core. I'm not convinced that looks like a completely separate CCR module instead of using segment based snapshot leader / restore follower. If it does that module should live in core.

Maybe we could use this as a meta issue and relocate #3020 to core?

Since CCR targets Disaster Recovery usecases

I think this makes sense absent remote storage but should still be a solution baked into segrep (more likely for on-prem use cases). If a user ops into remote storage, however, we should rely on the durability of the remote storage mechanism.

forcing them to also deal with RejectedExecutionException if they send too many concurrent requests;

This is a HUGE problem today.

The Streaming Index API can not only determine the appropriate level of concurrency based on resource load but also accept a user defined Durability Policy to guide the type of remote storage (warm, cold, etc), what documents to replicate across clusters (if any), whether or not to use a translog. It decouples the durability configuration as input to the segment replication setup.

krishna-ggk · 2022-04-21T06:43:01Z

Created dedicated issue to discuss/track segment replication

I don't think we should fragment the conversation? Segrep is a core component and I think we need to look at baking CCR mechanisms w/ segrep in the core. I'm not convinced that looks like a completely separate CCR module instead of using segment based snapshot leader / restore follower. If it does that module should live in core.

Maybe we could use this as a meta issue and relocate opensearch-project/cross-cluster-replication#373 to core?

Done - I transferred cross-cluster-replication/issues/373 into OpenSearch/issues/3020. All I wanted to ensure was we discuss segrep and architectural details of moving to core separately. Anyways agree with avoiding further fragmentation of the conversation 👍

krishna-ggk · 2022-04-21T18:04:09Z

Hi @krishna-ggk -- just continuing this discussion a bit from #2482:

True, definitely worth considering. One of the reason for considering filtering at source while replicating was to provide a security posture where filtered out documents don't move out of cluster. However given that current CCR doesn't implement filtering yet, we could consider seeking input from community to see if there is real need for this usecase in the RFC.

Oh you can choose where you do the filtering, either in the source cluster or the target cluster. So if security is important, always use FilterLeafReader at the source. But if it's less important, and performance capacity on the target cluster is ample compared to source, and networking bandwidth is sufficient, and the filtering is not too restrictive, filter at the target.

Or maybe just always filter at the source and don't risk security issues :)

Thanks for the suggestions. Very likely we may have to end up filtering at source from security standpoint - however we can evaluate deeply while implementing it.

In Lucene we are also working on making IndexWriter.addIndexes API concurrent to speed up use cases like this.

Nice!

ankitkala · 2022-07-01T05:05:35Z

~~Closing this issue as we'll track this as part of Segment Replication for CCR: #3020~~

nknize added enhancement Enhancement or improvement to existing feature or request discuss Issues intended to help drive brainstorming and decision making feature New feature or request untriaged labels Apr 12, 2022

nknize mentioned this issue Apr 12, 2022

[BUG] Breaking changes for CCR plugin in OS 2.0 #2482

Closed

ryanbogan added distributed framework and removed untriaged labels Apr 12, 2022

dblock mentioned this issue Apr 13, 2022

Add deprecated API for creating History Ops Snapshot from translog #2886

Merged

5 tasks

nknize added the Meta Meta issue, not directly linked to a PR label Apr 20, 2022

ankitkala mentioned this issue Jul 1, 2022

[Feature] Cross Cluster Replication based on segment replication #3020

Open

ankitkala mentioned this issue Sep 13, 2022

[Discuss] Long term plan for cross-cluster-replication opensearch-project/cross-cluster-replication#558

Closed

ankitkala mentioned this issue Aug 1, 2022

[FEATURE] Use Pluggable translog for fetching the operations from leader opensearch-project/cross-cluster-replication#375

Closed

ankitkala mentioned this issue Nov 2, 2022

CCR move to Core opensearch-project/cross-cluster-replication#615

Closed

anasalkouz mentioned this issue Apr 25, 2023

[POC] Cross Cluster Replication as a core component #7222

Open

Bukhtawar added the Indexing:Replication Issues and PRs related to core replication framework eg segrep label Jul 27, 2023

anasalkouz removed the distributed framework label Sep 19, 2023

andrross added the Roadmap:Modular Architecture Project-wide roadmap label label May 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RFC] Cross Cluster Replication as a Core Component #2872

[RFC] Cross Cluster Replication as a Core Component #2872

nknize commented Apr 12, 2022 •

edited

Loading

krishna-ggk commented Apr 20, 2022

mikemccand commented Apr 20, 2022

nknize commented Apr 20, 2022 •

edited

Loading

krishna-ggk commented Apr 21, 2022 •

edited

Loading

krishna-ggk commented Apr 21, 2022

ankitkala commented Jul 1, 2022 •

edited

Loading

[RFC] Cross Cluster Replication as a Core Component #2872

[RFC] Cross Cluster Replication as a Core Component #2872

Comments

nknize commented Apr 12, 2022 • edited Loading

krishna-ggk commented Apr 20, 2022

mikemccand commented Apr 20, 2022

nknize commented Apr 20, 2022 • edited Loading

krishna-ggk commented Apr 21, 2022 • edited Loading

krishna-ggk commented Apr 21, 2022

ankitkala commented Jul 1, 2022 • edited Loading

nknize commented Apr 12, 2022 •

edited

Loading

nknize commented Apr 20, 2022 •

edited

Loading

krishna-ggk commented Apr 21, 2022 •

edited

Loading

ankitkala commented Jul 1, 2022 •

edited

Loading