Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Doc shared CDC source transactions and refactor example section #1754

Merged
merged 6 commits into from
Jan 16, 2024

Conversation

emile-00
Copy link
Contributor

Info

For reviewers

  • Preview

    • [ Paste the preview link to the updated page(s) here. Edit this item after the preview site is ready. To find the updated pages, scroll down to locate and open the Amplify preview link and select the dev version of the documentation. ]
  • Key points

    • [ Parts that may need revision or extra consideration. ]

Before merging

  • I have checked the doc site preview, and the updated parts look good.

  • I have acquired the approval from the owner (and optionally the reviewers) of the code PR and at least one tech writer (CharlieSYH, emile-00, & hengm3467).

Copy link

This pull request is automatically being deployed by Amplify Hosting (learn more).

Access this pull request here: https://pr-1754.d2fbku9n2b6wde.amplifyapp.com

Copy link
Contributor

@ShanlanLi ShanlanLi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@@ -8,7 +8,7 @@ slug: /ingest-from-mysql-cdc
<link rel="canonical" href="https://docs.risingwave.com/docs/current/ingest-from-mysql-cdc/" />
</head>

Change Data Capture (CDC) refers to the process of identifying and capturing data changes in a database, then delivering the changes to a downstream service in real time.
Change Data Capture (CDC) refers to the process of identifying and capturing data changes in a database and then delivering the changes to a downstream service in real time.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meet similar conditions like this. I guess it's Grammarly's suggestion to remove the comma and use "and". However, I personally think the comma is better. Because it makes the sentence shorter and easier to read. If use "and", the sentence is too long and a bit hard to understand, or it costs me more time to understand.

Maybe we can use both the comma and the conjunction when meeting this? like:
Change Data Capture (CDC) refers to the process of identifying and capturing data changes in a database**, and** then delivering the changes to a downstream service in real time.

This way, we meet both the requirements of grammar standards and reader-friendability.

@StrikeW
Copy link
Contributor

StrikeW commented Jan 11, 2024

Do we have a rendered link of the PR? This link https://pr-1754.d2fbku9n2b6wde.amplifyapp.com/ doesn't contain the modification of this PR.

@StrikeW
Copy link
Contributor

StrikeW commented Jan 11, 2024

Btw I found a bug not related to this PR: we should not claim transaction support for the Citus CDC connector. The Citus connector is not full fledged like mysql and postgres and it is in limited support.

@emile-00
Copy link
Contributor Author

Do we have a rendered link of the PR? This link https://pr-1754.d2fbku9n2b6wde.amplifyapp.com/ doesn't contain the modification of this PR.

Here they are: MySQL, PG

The preview page is on v1.5 and the updates are on v1.6 of the docs.

@emile-00
Copy link
Contributor Author

Btw I found a bug not related to this PR: we should not claim transaction support for the Citus CDC connector. The Citus connector is not full fledged like mysql and postgres and it is in limited support.

Got it! I can remove the option in this PR since it's a minor change.

@@ -155,7 +155,7 @@ To ensure all data changes are captured, you must create a table and specify pri
### Syntax

```sql
CREATE TABLE [ IF NOT EXISTS ] source_name (
CREATE {TABLE | SOURCE} [ IF NOT EXISTS ] source_name (
Copy link
Contributor

@StrikeW StrikeW Jan 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The syntax for CREATE SOURCE is not correct. We cannot specify table schema for the cdc source.
Btw I think we should remove [ FORMAT DEBEZIUM ENCODE JSON ], it is not compatible with CREATE SOURCE. User don't need to aware the format when they use native cdc connector.

username = 'root',
password = '123456',
database.name = 'mydb',
server.id = 5888
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

server.id in shared source is required, we may make it optional in future.

@@ -8,7 +8,7 @@
<link rel="canonical" href="https://docs.risingwave.com/docs/current/ingest-from-postgres-cdc/" />
</head>

Change Data Capture (CDC) refers to the process of identifying and capturing data changes in a database, then delivering the changes to a downstream service in real time.
Change Data Capture (CDC) refers to the process of identifying and capturing data changes in a database, and then delivering the changes to a downstream service in real time.

RisingWave supports ingesting CDC data from PostgreSQL. Versions 10, 11, 12, 13, and 14 of PostgreSQL are supported.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We also support version 15 of PG


## Create a table using the native CDC connector

To ensure all data changes are captured, you must create a table and specify primary keys. See the [`CREATE TABLE`](/sql/commands/sql-create-table.md) command for more details. The data format must be Debezium JSON.
To ensure all data changes are captured, you must create a table or source and specify primary keys. See the [`CREATE TABLE`](/sql/commands/sql-create-table.md) command for more details. The data format must be Debezium JSON.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need to remind user for the data format, it should be hide from users.


### Syntax

```sql
CREATE TABLE [ IF NOT EXISTS ] source_name (
CREATE {TABLE | SOURCE} [ IF NOT EXISTS ] source_name (
Copy link
Contributor

@StrikeW StrikeW Jan 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This syntax is not correct. We cannot specify table schema for the cdc Source


:::note
RisingWave implements CDC via PostgresQL replication. Inspect the current progress via the [`pg_replication_slots`](https://www.postgresql.org/docs/14/view-pg-replication-slots.html) view. Remove inactive replication slots via [`pg_drop_replication_slot()`](https://www.postgresql.org/docs/current/functions-admin.html#:~:text=pg_drop_replication_slot).
RisingWave implements CDC via PostgreSQL replication. Inspect the current progress via the [`pg_replication_slots`](https://www.postgresql.org/docs/14/view-pg-replication-slots.html) view. Remove inactive replication slots via [`pg_drop_replication_slot()`](https://www.postgresql.org/docs/current/functions-admin.html#:~:text=pg_drop_replication_slot).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a note that RW won't drop replication slot, user need to drop useless slots.

@@ -181,27 +181,54 @@ All the fields listed below are required.
|database.name| Name of the database. Note that RisingWave cannot read data from a built-in MySQL database, such as `mysql`, `sys`, etc.|
|table.name| Name of the table that you want to ingest data from. |
|server.id| Optional. A numeric ID of the database client. It must be unique across all database processes that are running in the MySQL cluster. If not specified, RisingWave will generate a random ID.|
|transactional| Optional. Specify whether you want to enable transactions for the CDC table that you are about to create. For details, see [Transaction within a CDC table](/concepts/transactions.md#transactions-within-a-cdc-table).|
|transactional| Optional. Specify whether you want to enable transactions for the CDC table that you are about to create. This feature is also supported for shared CDC sources for multi-table transactions. For details, see [Transaction within a CDC table](/concepts/transactions.md#transactions-within-a-cdc-table).|

### Data format

Data is in Debezium JSON or Debezium AVRO format. [Debezium](https://debezium.io) is a log-based CDC tool that can capture row changes from various database management systems such as PostgreSQL, MySQL, and SQL Server and generate events with consistent structures in real time. The MySQL CDC connector in RisingWave supports JSON or AVRO as the serialization format for Debezium data. The data format does not need to be specified when creating a table with `mysql-cdc` as the source.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't support AVRO at all. This should be mistake.

Copy link
Contributor

@StrikeW StrikeW left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you!

@emile-00 emile-00 merged commit e931ef1 into main Jan 16, 2024
3 checks passed
@emile-00 emile-00 deleted the transaction-shared-cdc branch January 16, 2024 20:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Document: feat(cdc): support transaction for shared cdc source
4 participants