Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replication support #170

Closed
derlaft opened this issue Jun 22, 2022 · 10 comments
Closed

Replication support #170

derlaft opened this issue Jun 22, 2022 · 10 comments
Labels
question Further information is requested

Comments

@derlaft
Copy link

derlaft commented Jun 22, 2022

Hi,

is replication support planned?

In theory it should be possible to replace engines here to be Replicated* and it should just work.

Documentation also mentions distributed(), but it's not clear how that's supposed to work

@derlaft
Copy link
Author

derlaft commented Jun 22, 2022

Just noticed #62 - maybe it's a duplicate

@lmangani
Copy link
Collaborator

@derlaft you're correct, changing the engine should be all that's required - we know several users doing so, and we have a couple test setups on various cloud providers providing this out of the box and this could easily be turned into a setup preference if the users want it. Feel free to extend this thread with any suggestions.

@smmstf
Copy link

smmstf commented Jun 23, 2022

Hello, can you share the script to use to init replication please ?
I tried to replace the Engine as mentionned before but it didn't work. I saw issue #62 but it's not clear enough.

@lmangani lmangani added the question Further information is requested label Jun 29, 2022
@akvlad
Copy link
Collaborator

akvlad commented Jun 30, 2022

@derlaft I have created a replicated deployment in my local machine and described my experience in #172 . I can confirm that after some troubleshooting replication started working.
But I didn't experiment with different replica and shard setups so any help is very welcome. It would be interesting to know how clickhouse distributes inserted data between different shards and how it reads data from. I'll check that today.

We have an old implementation of clickhouse cluster support with Distributed table engine https://github.com/metrico/qryn/tree/clickhouse-cluster . But the schema is outdated for a long time.

@derlaft
Copy link
Author

derlaft commented Jun 30, 2022

But I didn't experiment with different replica and shard setups so any help is very welcome. It would be interesting to know how clickhouse distributes inserted data between different shards and how it reads data from. I'll check that today.

If you just use Replicated without any additional configuration:

  • There will only be one shard, all the data should be present on all the nodes
  • There will be no load balancing in terms of reads (for that, external load balancer is required)

From my understanding (might be incorrect here), Distributed is the next step, potentially built on top of Replicated, which allows to implement some sharding and read distribution. I'm mainly interested in Replicated setup working, because it allows to achieve some fault tolerance in an extremely simple manner.

@akvlad
Copy link
Collaborator

akvlad commented Jun 30, 2022

@derlaft let's say I have 10 shards behind kubernetes or HAProxy & round robin. I added 1000 logs with 1000 INSERT requests.
Then I do SELECT * FROM samples_v3. Will it return all the 1000 rows or just the rows from the random shard (about 100)?

@derlaft
Copy link
Author

derlaft commented Jun 30, 2022

@akvlad with pure Replicated there's only one shard. Data should be (eventually) same, so query will return 1000 rows.

@akvlad
Copy link
Collaborator

akvlad commented Jun 30, 2022

@derlaft There can be a lot of shards with pure Replicated if I add some macros into config.xml

    <macros>
        <shard>unique_id_for_each_node_of_the_cluster_01</shard>
        <replica>example01-01-1</replica>
    </macros>
    <default_replica_path>/clickhouse/tables/{shard}/{database}/{table}</default_replica_path>
    <default_replica_name>{replica}</default_replica_name>

@derlaft
Copy link
Author

derlaft commented Jun 30, 2022

I don't think it's related to Replicated. You are basically just creating independent sets of Replicated tables using macro and calling it shards

Replication does not depend on sharding. Each shard has its own independent replication.

https://clickhouse.com/docs/en/engines/table-engines/mergetree-family/replication/

@lmangani
Copy link
Collaborator

lmangani commented Nov 1, 2023

Feel free to reopen if still interested for 3.x

@lmangani lmangani closed this as completed Nov 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

4 participants