(DEPRECATED) Apache Flink User Mailing List archive.

How to output a Table to Kafka?

Classic

List

Threaded

2 messages Options

chi ma

How to output a Table to Kafka?

Hi,

I'm a newbie to Flink. I'm trying to load data from HDFS and analyze it using Flink Table APIs.

I created a TableSource, and registered it into the StreamTableEnvironment as a table, and ran a SQL on that table through streamTableEnvironment.sqlQuery, and finally I called writeToSink to write the result to a Kafka010JsonTableSink.

But, I got an exception, "Exception in thread "main" org.apache.flink.table.api.TableException: AppendStreamTableSink requires that Table has only insert changes."

I've googled it. The probable cause is that I put a "group by" clause in the SQL statement, which makes the result table to be a retract table.

And, if I pass a SQL statement without "group by" clause to sqlQuery, everything gets ok, and I can see the result in the Kafka topic.

So, my questions are:

1. What is the classical usage for outputing a retract table?

2. Is there a unified way to process output for both append-only and not append-only table?

Thanks.

Best Regards~

Hequn Cheng

Re: How to output a Table to Kafka?

Hi chi ma,

A query with "group by" produces an updated table. Both UpsertStreamTableSink and RetractStreamTableSink can be used to emit an updated table. The main difference between UpsertStreamTableSink and RetractStreamTableSink is RetractStreamTableSink will encode an UPDATE change as a retract message for the updated (previous) row and an add message for the updating (new) row, while for UpsertStreamTableSink UPDATE changes are encoded with a single message and hence more efficient. More details can be found here[1][2].

Both UpsertStreamTableSink and RetractStreamTableSink can be used to emit an append-only table or an updated table. While AppendStreamTableSink can only be used to emit an append-only table.

[1] https://ci.apache.org/projects/flink/flink-docs-release-1.4/dev/table/streaming.html

[2] https://flink.apache.org/news/2017/04/04/dynamic-tables.html

On Mon, Apr 9, 2018 at 2:43 PM, chi ma <[hidden email]> wrote:

Hi,

I'm a newbie to Flink. I'm trying to load data from HDFS and analyze it using Flink Table APIs.

I created a TableSource, and registered it into the StreamTableEnvironment as a table, and ran a SQL on that table through streamTableEnvironment.sqlQuery, and finally I called writeToSink to write the result to a Kafka010JsonTableSink.

But, I got an exception, "Exception in thread "main" org.apache.flink.table.api.TableException: AppendStreamTableSink requires that Table has only insert changes."

I've googled it. The probable cause is that I put a "group by" clause in the SQL statement, which makes the result table to be a retract table.

And, if I pass a SQL statement without "group by" clause to sqlQuery, everything gets ok, and I can see the result in the Kafka topic.

So, my questions are:
1. What is the classical usage for outputing a retract table?
2. Is there a unified way to process output for both append-only and not append-only table?

Thanks.
Best Regards~