-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Doc shared CDC source transactions and refactor example section #1754
Conversation
This pull request is automatically being deployed by Amplify Hosting (learn more). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
docs/guides/ingest-from-mysql-cdc.md
Outdated
@@ -8,7 +8,7 @@ slug: /ingest-from-mysql-cdc | |||
<link rel="canonical" href="https://docs.risingwave.com/docs/current/ingest-from-mysql-cdc/" /> | |||
</head> | |||
|
|||
Change Data Capture (CDC) refers to the process of identifying and capturing data changes in a database, then delivering the changes to a downstream service in real time. | |||
Change Data Capture (CDC) refers to the process of identifying and capturing data changes in a database and then delivering the changes to a downstream service in real time. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meet similar conditions like this. I guess it's Grammarly's suggestion to remove the comma and use "and". However, I personally think the comma is better. Because it makes the sentence shorter and easier to read. If use "and", the sentence is too long and a bit hard to understand, or it costs me more time to understand.
Maybe we can use both the comma and the conjunction when meeting this? like:
Change Data Capture (CDC) refers to the process of identifying and capturing data changes in a database**, and** then delivering the changes to a downstream service in real time.
This way, we meet both the requirements of grammar standards and reader-friendability.
Do we have a rendered link of the PR? This link https://pr-1754.d2fbku9n2b6wde.amplifyapp.com/ doesn't contain the modification of this PR. |
Btw I found a bug not related to this PR: we should not claim transaction support for the Citus CDC connector. The Citus connector is not full fledged like mysql and postgres and it is in limited support. |
The preview page is on v1.5 and the updates are on v1.6 of the docs. |
Got it! I can remove the option in this PR since it's a minor change. |
docs/guides/ingest-from-mysql-cdc.md
Outdated
@@ -155,7 +155,7 @@ To ensure all data changes are captured, you must create a table and specify pri | |||
### Syntax | |||
|
|||
```sql | |||
CREATE TABLE [ IF NOT EXISTS ] source_name ( | |||
CREATE {TABLE | SOURCE} [ IF NOT EXISTS ] source_name ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The syntax for CREATE SOURCE
is not correct. We cannot specify table schema for the cdc source.
Btw I think we should remove [ FORMAT DEBEZIUM ENCODE JSON ]
, it is not compatible with CREATE SOURCE
. User don't need to aware the format when they use native cdc connector.
username = 'root', | ||
password = '123456', | ||
database.name = 'mydb', | ||
server.id = 5888 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
server.id
in shared source is required, we may make it optional in future.
@@ -8,7 +8,7 @@ | |||
<link rel="canonical" href="https://docs.risingwave.com/docs/current/ingest-from-postgres-cdc/" /> | |||
</head> | |||
|
|||
Change Data Capture (CDC) refers to the process of identifying and capturing data changes in a database, then delivering the changes to a downstream service in real time. | |||
Change Data Capture (CDC) refers to the process of identifying and capturing data changes in a database, and then delivering the changes to a downstream service in real time. | |||
|
|||
RisingWave supports ingesting CDC data from PostgreSQL. Versions 10, 11, 12, 13, and 14 of PostgreSQL are supported. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We also support version 15 of PG
|
||
## Create a table using the native CDC connector | ||
|
||
To ensure all data changes are captured, you must create a table and specify primary keys. See the [`CREATE TABLE`](/sql/commands/sql-create-table.md) command for more details. The data format must be Debezium JSON. | ||
To ensure all data changes are captured, you must create a table or source and specify primary keys. See the [`CREATE TABLE`](/sql/commands/sql-create-table.md) command for more details. The data format must be Debezium JSON. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't need to remind user for the data format, it should be hide from users.
|
||
### Syntax | ||
|
||
```sql | ||
CREATE TABLE [ IF NOT EXISTS ] source_name ( | ||
CREATE {TABLE | SOURCE} [ IF NOT EXISTS ] source_name ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This syntax is not correct. We cannot specify table schema for the cdc Source
|
||
:::note | ||
RisingWave implements CDC via PostgresQL replication. Inspect the current progress via the [`pg_replication_slots`](https://www.postgresql.org/docs/14/view-pg-replication-slots.html) view. Remove inactive replication slots via [`pg_drop_replication_slot()`](https://www.postgresql.org/docs/current/functions-admin.html#:~:text=pg_drop_replication_slot). | ||
RisingWave implements CDC via PostgreSQL replication. Inspect the current progress via the [`pg_replication_slots`](https://www.postgresql.org/docs/14/view-pg-replication-slots.html) view. Remove inactive replication slots via [`pg_drop_replication_slot()`](https://www.postgresql.org/docs/current/functions-admin.html#:~:text=pg_drop_replication_slot). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a note that RW won't drop replication slot, user need to drop useless slots.
docs/guides/ingest-from-mysql-cdc.md
Outdated
@@ -181,27 +181,54 @@ All the fields listed below are required. | |||
|database.name| Name of the database. Note that RisingWave cannot read data from a built-in MySQL database, such as `mysql`, `sys`, etc.| | |||
|table.name| Name of the table that you want to ingest data from. | | |||
|server.id| Optional. A numeric ID of the database client. It must be unique across all database processes that are running in the MySQL cluster. If not specified, RisingWave will generate a random ID.| | |||
|transactional| Optional. Specify whether you want to enable transactions for the CDC table that you are about to create. For details, see [Transaction within a CDC table](/concepts/transactions.md#transactions-within-a-cdc-table).| | |||
|transactional| Optional. Specify whether you want to enable transactions for the CDC table that you are about to create. This feature is also supported for shared CDC sources for multi-table transactions. For details, see [Transaction within a CDC table](/concepts/transactions.md#transactions-within-a-cdc-table).| | |||
|
|||
### Data format | |||
|
|||
Data is in Debezium JSON or Debezium AVRO format. [Debezium](https://debezium.io) is a log-based CDC tool that can capture row changes from various database management systems such as PostgreSQL, MySQL, and SQL Server and generate events with consistent structures in real time. The MySQL CDC connector in RisingWave supports JSON or AVRO as the serialization format for Debezium data. The data format does not need to be specified when creating a table with `mysql-cdc` as the source. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't support AVRO at all. This should be mistake.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thank you!
Info
Description
For both MySQL and PG CDC:
Notes
Related code PR
Related doc issue
Resolves Document: feat(cdc): support transaction for shared cdc source #1732, Bug: refactor the MySQL and Postgres native CDC page #1726
For reviewers
Preview
Key points
Before merging
I have checked the doc site preview, and the updated parts look good.
I have acquired the approval from the owner (and optionally the reviewers) of the code PR and at least one tech writer (
CharlieSYH
,emile-00
, &hengm3467
).