Allow kafka sink to encode as PLAIN AVRO #16912

maingoh · 2024-05-23T16:57:31Z

Is your feature request related to a problem? Please describe.

I would like to export the results of a materialized view into kafka with the avro format but it looks like it is not possible. Is it much different than with PLAIN JSON ?

Bind error: connector kafka does not support format Plain with encode Avro

Describe the solution you'd like

Be able to do:

{{ config(materialized="sink") }}
CREATE SINK {{ sink }} FROM {{ mv }} WITH (
  connector = 'kafka',
  topic = '{{ sink_topic }}',
  ...
) FORMAT PLAIN ENCODE AVRO (
  ...
);

Describe alternatives you've considered

No response

Additional context

No response

The text was updated successfully, but these errors were encountered:

xiangjinwu · 2024-05-24T03:52:32Z

It is doable. Just to confirm:

Is your avro schema definition stored in confluent schema registry (or one compatible with it, like redpanda or upstash)?
Are you going to use (a) a subset of output columns as avro-encoded kafka key (b) a single simple output column as string/text kafka key (c) no kafka key and round-robin for partition?

maingoh · 2024-05-24T15:59:00Z

Thank you for your quick reply ! I might have misunderstood a bit the UPSERT keyword. It didn't really make sense to me as kafka is append only. So I tried using an UPSERT AVRO sink and it actually append the message with the same primary key, which is the behavior I want but It looks more like a PLAIN AVRO to me.

To answer your questions:

Yes it is the case, however it should not be mandatory in both case PLAIN and UPSERT no ?
I actually use an "id" column from my materialized view as key. Though it should not be mandatory neither ?

tabVersion · 2024-05-26T17:47:14Z

I might have misunderstood a bit the UPSERT keyword. It didn't really make sense to me as kafka is append only. So I tried using an UPSERT AVRO sink and it actually append the message with the same primary key, which is the behavior I want but It looks more like a PLAIN AVRO to me.

Yes, it is the tricky part. If the Kafka's key is aligned with the downstream's pk, it is upsert format in RisingWave. If the Kafka's key is not the downstream's pk, just for doing partition, it is append-only format with a specified key in RisingWave. For Kafka itself, the two methods look the same.
Besides, Kafka has a "compaction" func working in the broker, if two messages have the same message key, it will delete the prior only keep the later one. So we want to make sure whether users want to do upsert.

xiangjinwu · 2024-05-27T04:23:56Z

Yes it is the case, however it should not be mandatory in both case PLAIN and UPSERT no ?

Use of schema registry is mandatory for avro unless the schema never evolves. Unlike json or protobuf, to decode an avro message encoded with one version of schema into another compatible version, the definitions (avsc) of both versions must be available during decoding.

I actually use an "id" column from my materialized view as key. Though it should not be mandatory neither ?

It is not mandatory for plain as I mentioned option-c for round-robin partition. It is mandatory for upsert as @tabVersion explained above.

maingoh · 2024-06-20T14:46:55Z

Yes, it is the tricky part. If the Kafka's key is aligned with the downstream's pk, it is upsert format in RisingWave. If the Kafka's key is not the downstream's pk, just for doing partition, it is append-only format with a specified key in RisingWave. For Kafka itself, the two methods look the same.
Besides, Kafka has a "compaction" func working in the broker, if two messages have the same message key, it will delete the prior only keep the later one. So we want to make sure whether users want to do upsert.

It looks to me that the compaction is a topic property / kafka responsability so RW should not really care if the downstream system is upserting or appending. So yes in my opinion for kafka PLAIN or UPSERT are the same (and could be optional because they don't really make sense ?).

I think my initial issue has been fixed by #17216, I am missing something ?

xiangjinwu · 2024-06-21T03:27:32Z

I think my initial issue has been fixed by #17216, I am missing something ?

Yes.

It looks to me that the compaction is a topic property / kafka responsability so RW should not really care if the downstream system is upserting or appending. So yes in my opinion for kafka PLAIN or UPSERT are the same (and could be optional because they don't really make sense ?).

Yes it is kafka's responsibility and outside RisingWave, but the RisingWave options are supposed to follow its conventions. To avoid confusion let me focus on RisingWave's behavior difference:

When stream to be sinked contains no DELETE, there is no difference.
When there is DELETE, upsert writes out a message with key but no value, while plain (force_append_only = true) would skip this entry.

maingoh · 2024-06-27T09:49:42Z

Thank you for the details @xiangjinwu ! I understand the difference now, maybe they could be highlighted in the documentation ?

xiangjinwu · 2024-06-27T09:53:42Z

Yes I am working on a document refactor of the relevant pages:
risingwavelabs/risingwave-docs#2294

maingoh added the type/feature label May 23, 2024

github-actions bot added this to the release-1.10 milestone May 23, 2024

xiangjinwu self-assigned this May 24, 2024

neverchanje added the user-feedback label May 28, 2024

xiangjinwu closed this as completed Jun 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow kafka sink to encode as PLAIN AVRO #16912

Allow kafka sink to encode as PLAIN AVRO #16912

maingoh commented May 23, 2024 •

edited

Loading

xiangjinwu commented May 24, 2024

maingoh commented May 24, 2024 •

edited

Loading

tabVersion commented May 26, 2024

xiangjinwu commented May 27, 2024

maingoh commented Jun 20, 2024

xiangjinwu commented Jun 21, 2024

maingoh commented Jun 27, 2024

xiangjinwu commented Jun 27, 2024

Allow kafka sink to encode as PLAIN AVRO #16912

Allow kafka sink to encode as PLAIN AVRO #16912

Comments

maingoh commented May 23, 2024 • edited Loading

Is your feature request related to a problem? Please describe.

Describe the solution you'd like

Describe alternatives you've considered

Additional context

xiangjinwu commented May 24, 2024

maingoh commented May 24, 2024 • edited Loading

tabVersion commented May 26, 2024

xiangjinwu commented May 27, 2024

maingoh commented Jun 20, 2024

xiangjinwu commented Jun 21, 2024

maingoh commented Jun 27, 2024

xiangjinwu commented Jun 27, 2024

maingoh commented May 23, 2024 •

edited

Loading

maingoh commented May 24, 2024 •

edited

Loading