(DEPRECATED) Apache Flink User Mailing List archive.

Creating Kafka Topic dynamically in Flink

Classic

List

Threaded

8 messages Options

Prasanna kumar

Creating Kafka Topic dynamically in Flink

Hi,

I have Use Case where i read events from a Single kafka Stream comprising of JSON messages.

Requirement is to split the stream into multiple output streams based on some criteria say based on Type of Event or Based on Type and Customer associated with the event.

We could achieve the splitting of stream using Side outputs as i have seen in the documentation.

Our business environment is such that there could be new event types flowing in and would the Flink Kafka producer create the topics dynamically based on the inflowing events. I did not see any documentation saying that it could create.

Or should it be always pre created by running a script separately. (Not a good scalable practice in our case)

Thanks,

Prasanna.

Leonard Xu

Re: Creating Kafka Topic dynamically in Flink

Hi, kumar

Flink support consume/produce from/to multiple kafka topics[1], in your case you can implement KeyedSerializationSchema(legacy interface) or KafkaSerializationSchema[2] to make one producer instance support send data to multiple topics. There is an ITCase you can reference[3].

Best,

Leonard Xu

[1] https://ci.apache.org/projects/flink/flink-docs-master/dev/connectors/kafka.html#kafka-producer

[2]https://github.com/apache/flink/blob/master/flink-connectors/flink-connector-kafka-base/src/main/java/org/apache/flink/streaming/connectors/kafka/KafkaSerializationSchema.java

[3]https://github.com/apache/flink/blob/master/flink-connectors/flink-connector-kafka-0.11/src/test/java/org/apache/flink/streaming/connectors/kafka/Kafka011ITCase.java#L126

在 2020年6月1日，15:35，Prasanna kumar <[hidden email]> 写道：

Hi,

I have Use Case where i read events from a Single kafka Stream comprising of JSON messages.

Requirement is to split the stream into multiple output streams based on some criteria say based on Type of Event or Based on Type and Customer associated with the event.

We could achieve the splitting of stream using Side outputs as i have seen in the documentation.

Our business environment is such that there could be new event types flowing in and would the Flink Kafka producer create the topics dynamically based on the inflowing events. I did not see any documentation saying that it could create.

Or should it be always pre created by running a script separately. (Not a good scalable practice in our case)

Thanks,
Prasanna.

Prasanna kumar

Re: Creating Kafka Topic dynamically in Flink

Leaonard,

Thanks for the reply and would look into those options.

But as for the original question, could we create a topic dynamically when required .

Prasanna.

On Mon, Jun 1, 2020 at 2:18 PM Leonard Xu <[hidden email]> wrote:

Hi, kumar

Flink support consume/produce from/to multiple kafka topics[1], in your case you can implement KeyedSerializationSchema(legacy interface) or KafkaSerializationSchema[2] to make one producer instance support send data to multiple topics. There is an ITCase you can reference[3].

Best,
Leonard Xu

[1] https://ci.apache.org/projects/flink/flink-docs-master/dev/connectors/kafka.html#kafka-producer
[2]https://github.com/apache/flink/blob/master/flink-connectors/flink-connector-kafka-base/src/main/java/org/apache/flink/streaming/connectors/kafka/KafkaSerializationSchema.java
[3]https://github.com/apache/flink/blob/master/flink-connectors/flink-connector-kafka-0.11/src/test/java/org/apache/flink/streaming/connectors/kafka/Kafka011ITCase.java#L126

在 2020年6月1日，15:35，Prasanna kumar <[hidden email]> 写道：

Hi,

I have Use Case where i read events from a Single kafka Stream comprising of JSON messages.

Requirement is to split the stream into multiple output streams based on some criteria say based on Type of Event or Based on Type and Customer associated with the event.

We could achieve the splitting of stream using Side outputs as i have seen in the documentation.

Our business environment is such that there could be new event types flowing in and would the Flink Kafka producer create the topics dynamically based on the inflowing events. I did not see any documentation saying that it could create.

Or should it be always pre created by running a script separately. (Not a good scalable practice in our case)

Thanks,
Prasanna.

Leonard Xu

Re: Creating Kafka Topic dynamically in Flink

Hi, kumar

Sorry for missed the original question, I think we can not create topic dynamically current, creating topic should belong to control flow rather a data flow, and user may has some custom configurations of the topic from my understanding. Maybe you need implement the logic of check/create/manage topic in your custom SinkFunction so that the topic can create dynamically in runtime.

Best,

Leonard Xu

在 2020年6月1日，17:02，Prasanna kumar <[hidden email]> 写道：

Leaonard,

Thanks for the reply and would look into those options.
But as for the original question, could we create a topic dynamically when required .

Prasanna.

On Mon, Jun 1, 2020 at 2:18 PM Leonard Xu <[hidden email]> wrote:
Hi, kumar

Flink support consume/produce from/to multiple kafka topics[1], in your case you can implement KeyedSerializationSchema(legacy interface) or KafkaSerializationSchema[2] to make one producer instance support send data to multiple topics. There is an ITCase you can reference[3].

Best,
Leonard Xu

[1] https://ci.apache.org/projects/flink/flink-docs-master/dev/connectors/kafka.html#kafka-producer
[2]https://github.com/apache/flink/blob/master/flink-connectors/flink-connector-kafka-base/src/main/java/org/apache/flink/streaming/connectors/kafka/KafkaSerializationSchema.java
[3]https://github.com/apache/flink/blob/master/flink-connectors/flink-connector-kafka-0.11/src/test/java/org/apache/flink/streaming/connectors/kafka/Kafka011ITCase.java#L126

在 2020年6月1日，15:35，Prasanna kumar <[hidden email]> 写道：

Hi,

I have Use Case where i read events from a Single kafka Stream comprising of JSON messages.

Requirement is to split the stream into multiple output streams based on some criteria say based on Type of Event or Based on Type and Customer associated with the event.

We could achieve the splitting of stream using Side outputs as i have seen in the documentation.

Our business environment is such that there could be new event types flowing in and would the Flink Kafka producer create the topics dynamically based on the inflowing events. I did not see any documentation saying that it could create.

Or should it be always pre created by running a script separately. (Not a good scalable practice in our case)

Thanks,
Prasanna.

satya brat

Re: Creating Kafka Topic dynamically in Flink

Prasanna,

You might want to check the kafka broker configs where 'auto.create.topics.enable' helps with creating a new topic whenever a new message with non existent topic is published.

https://kafka.apache.org/documentation/#brokerconfigs

I am not too sure about pitfalls if any.

On Mon, Jun 1, 2020 at 3:20 PM Leonard Xu <[hidden email]> wrote:

Hi, kumar

Sorry for missed the original question, I think we can not create topic dynamically current, creating topic should belong to control flow rather a data flow, and user may has some custom configurations of the topic from my understanding. Maybe you need implement the logic of check/create/manage topic in your custom SinkFunction so that the topic can create dynamically in runtime.

Best,
Leonard Xu

在 2020年6月1日，17:02，Prasanna kumar <[hidden email]> 写道：

Leaonard,

Thanks for the reply and would look into those options.
But as for the original question, could we create a topic dynamically when required .

Prasanna.

On Mon, Jun 1, 2020 at 2:18 PM Leonard Xu <[hidden email]> wrote:
Hi, kumar

Flink support consume/produce from/to multiple kafka topics[1], in your case you can implement KeyedSerializationSchema(legacy interface) or KafkaSerializationSchema[2] to make one producer instance support send data to multiple topics. There is an ITCase you can reference[3].

Best,
Leonard Xu

[1] https://ci.apache.org/projects/flink/flink-docs-master/dev/connectors/kafka.html#kafka-producer
[2]https://github.com/apache/flink/blob/master/flink-connectors/flink-connector-kafka-base/src/main/java/org/apache/flink/streaming/connectors/kafka/KafkaSerializationSchema.java
[3]https://github.com/apache/flink/blob/master/flink-connectors/flink-connector-kafka-0.11/src/test/java/org/apache/flink/streaming/connectors/kafka/Kafka011ITCase.java#L126

在 2020年6月1日，15:35，Prasanna kumar <[hidden email]> 写道：

Hi,

I have Use Case where i read events from a Single kafka Stream comprising of JSON messages.

Requirement is to split the stream into multiple output streams based on some criteria say based on Type of Event or Based on Type and Customer associated with the event.

We could achieve the splitting of stream using Side outputs as i have seen in the documentation.

Our business environment is such that there could be new event types flowing in and would the Flink Kafka producer create the topics dynamically based on the inflowing events. I did not see any documentation saying that it could create.

Or should it be always pre created by running a script separately. (Not a good scalable practice in our case)

Thanks,
Prasanna.

Leonard Xu

Re: Creating Kafka Topic dynamically in Flink

I think @brat is right, I didn’t know the Kafka property 'auto.create.topics.enable’ , you can pass the property to Kafka Producer, that should work.

Best,

Leonard Xu

在 2020年6月1日，18:33，satya brat <[hidden email]> 写道：

Prasanna,
You might want to check the kafka broker configs where 'auto.create.topics.enable' helps with creating a new topic whenever a new message with non existent topic is published.
https://kafka.apache.org/documentation/#brokerconfigs

I am not too sure about pitfalls if any.

On Mon, Jun 1, 2020 at 3:20 PM Leonard Xu <[hidden email]> wrote:
Hi, kumar

Sorry for missed the original question, I think we can not create topic dynamically current, creating topic should belong to control flow rather a data flow, and user may has some custom configurations of the topic from my understanding. Maybe you need implement the logic of check/create/manage topic in your custom SinkFunction so that the topic can create dynamically in runtime.

Best,
Leonard Xu

在 2020年6月1日，17:02，Prasanna kumar <[hidden email]> 写道：

Leaonard,

Thanks for the reply and would look into those options.
But as for the original question, could we create a topic dynamically when required .

Prasanna.

On Mon, Jun 1, 2020 at 2:18 PM Leonard Xu <[hidden email]> wrote:
Hi, kumar

Flink support consume/produce from/to multiple kafka topics[1], in your case you can implement KeyedSerializationSchema(legacy interface) or KafkaSerializationSchema[2] to make one producer instance support send data to multiple topics. There is an ITCase you can reference[3].

Best,
Leonard Xu

[1] https://ci.apache.org/projects/flink/flink-docs-master/dev/connectors/kafka.html#kafka-producer
[2]https://github.com/apache/flink/blob/master/flink-connectors/flink-connector-kafka-base/src/main/java/org/apache/flink/streaming/connectors/kafka/KafkaSerializationSchema.java
[3]https://github.com/apache/flink/blob/master/flink-connectors/flink-connector-kafka-0.11/src/test/java/org/apache/flink/streaming/connectors/kafka/Kafka011ITCase.java#L126

在 2020年6月1日，15:35，Prasanna kumar <[hidden email]> 写道：

Hi,

I have Use Case where i read events from a Single kafka Stream comprising of JSON messages.

Requirement is to split the stream into multiple output streams based on some criteria say based on Type of Event or Based on Type and Customer associated with the event.

We could achieve the splitting of stream using Side outputs as i have seen in the documentation.

Our business environment is such that there could be new event types flowing in and would the Flink Kafka producer create the topics dynamically based on the inflowing events. I did not see any documentation saying that it could create.

Or should it be always pre created by running a script separately. (Not a good scalable practice in our case)

Thanks,
Prasanna.

Jark Wu-3

Re: Creating Kafka Topic dynamically in Flink

I think "auto.create.topics.enable" is enabled by default [1]?

Best,

Jark

[1]: https://kafka.apache.org/documentation/#auto.create.topics.enable

On Mon, 1 Jun 2020 at 19:55, Leonard Xu <[hidden email]> wrote:

I think @brat is right, I didn’t know the Kafka property 'auto.create.topics.enable’ , you can pass the property to Kafka Producer, that should work.
Best,
Leonard Xu

在 2020年6月1日，18:33，satya brat <[hidden email]> 写道：

Prasanna,
You might want to check the kafka broker configs where 'auto.create.topics.enable' helps with creating a new topic whenever a new message with non existent topic is published.
https://kafka.apache.org/documentation/#brokerconfigs

I am not too sure about pitfalls if any.

On Mon, Jun 1, 2020 at 3:20 PM Leonard Xu <[hidden email]> wrote:
Hi, kumar

Sorry for missed the original question, I think we can not create topic dynamically current, creating topic should belong to control flow rather a data flow, and user may has some custom configurations of the topic from my understanding. Maybe you need implement the logic of check/create/manage topic in your custom SinkFunction so that the topic can create dynamically in runtime.

Best,
Leonard Xu

在 2020年6月1日，17:02，Prasanna kumar <[hidden email]> 写道：

Leaonard,

Thanks for the reply and would look into those options.
But as for the original question, could we create a topic dynamically when required .

Prasanna.

On Mon, Jun 1, 2020 at 2:18 PM Leonard Xu <[hidden email]> wrote:
Hi, kumar

Flink support consume/produce from/to multiple kafka topics[1], in your case you can implement KeyedSerializationSchema(legacy interface) or KafkaSerializationSchema[2] to make one producer instance support send data to multiple topics. There is an ITCase you can reference[3].

Best,
Leonard Xu

[1] https://ci.apache.org/projects/flink/flink-docs-master/dev/connectors/kafka.html#kafka-producer
[2]https://github.com/apache/flink/blob/master/flink-connectors/flink-connector-kafka-base/src/main/java/org/apache/flink/streaming/connectors/kafka/KafkaSerializationSchema.java
[3]https://github.com/apache/flink/blob/master/flink-connectors/flink-connector-kafka-0.11/src/test/java/org/apache/flink/streaming/connectors/kafka/Kafka011ITCase.java#L126

在 2020年6月1日，15:35，Prasanna kumar <[hidden email]> 写道：

Hi,

I have Use Case where i read events from a Single kafka Stream comprising of JSON messages.

Requirement is to split the stream into multiple output streams based on some criteria say based on Type of Event or Based on Type and Customer associated with the event.

We could achieve the splitting of stream using Side outputs as i have seen in the documentation.

Our business environment is such that there could be new event types flowing in and would the Flink Kafka producer create the topics dynamically based on the inflowing events. I did not see any documentation saying that it could create.

Or should it be always pre created by running a script separately. (Not a good scalable practice in our case)

Thanks,
Prasanna.

Arvid Heise-3

Re: Creating Kafka Topic dynamically in Flink

Hi Prasanna,

auto.create.topics.enable is only recommended for development clusters and not in production use cases (as one programming error could potentially flood the whole broker with a large amount of topics). I have experienced first hand the mess it makes.

I'd suggest finding a supplemental external solution to that. You need to configure retention policies and ACLs anyways on the topics on all real environments.

In any case, I'd also discourage splitting data that is in one Kafka topic at all. I'd rather split it into separate partitions of the same topic and then only consume the respective partition. But it's usually so much cheaper to just filter irrelevant events on the original topic than for example later correlate a subset of events in the split topics. Only in the original topic, you will easily have a clear ordering of events happening to the same entity (key).

On Tue, Jun 2, 2020 at 10:37 AM Jark Wu <[hidden email]> wrote:

I think "auto.create.topics.enable" is enabled by default [1]?

Best,
Jark

[1]: https://kafka.apache.org/documentation/#auto.create.topics.enable

On Mon, 1 Jun 2020 at 19:55, Leonard Xu <[hidden email]> wrote:
I think @brat is right, I didn’t know the Kafka property 'auto.create.topics.enable’ , you can pass the property to Kafka Producer, that should work.
Best,
Leonard Xu

在 2020年6月1日，18:33，satya brat <[hidden email]> 写道：

Prasanna,
You might want to check the kafka broker configs where 'auto.create.topics.enable' helps with creating a new topic whenever a new message with non existent topic is published.
https://kafka.apache.org/documentation/#brokerconfigs

I am not too sure about pitfalls if any.

On Mon, Jun 1, 2020 at 3:20 PM Leonard Xu <[hidden email]> wrote:
Hi, kumar

Sorry for missed the original question, I think we can not create topic dynamically current, creating topic should belong to control flow rather a data flow, and user may has some custom configurations of the topic from my understanding. Maybe you need implement the logic of check/create/manage topic in your custom SinkFunction so that the topic can create dynamically in runtime.

Best,
Leonard Xu

在 2020年6月1日，17:02，Prasanna kumar <[hidden email]> 写道：

Leaonard,

Thanks for the reply and would look into those options.
But as for the original question, could we create a topic dynamically when required .

Prasanna.

On Mon, Jun 1, 2020 at 2:18 PM Leonard Xu <[hidden email]> wrote:
Hi, kumar

Flink support consume/produce from/to multiple kafka topics[1], in your case you can implement KeyedSerializationSchema(legacy interface) or KafkaSerializationSchema[2] to make one producer instance support send data to multiple topics. There is an ITCase you can reference[3].

Best,
Leonard Xu

[1] https://ci.apache.org/projects/flink/flink-docs-master/dev/connectors/kafka.html#kafka-producer
[2]https://github.com/apache/flink/blob/master/flink-connectors/flink-connector-kafka-base/src/main/java/org/apache/flink/streaming/connectors/kafka/KafkaSerializationSchema.java
[3]https://github.com/apache/flink/blob/master/flink-connectors/flink-connector-kafka-0.11/src/test/java/org/apache/flink/streaming/connectors/kafka/Kafka011ITCase.java#L126

在 2020年6月1日，15:35，Prasanna kumar <[hidden email]> 写道：

Hi,

I have Use Case where i read events from a Single kafka Stream comprising of JSON messages.

Requirement is to split the stream into multiple output streams based on some criteria say based on Type of Event or Based on Type and Customer associated with the event.

We could achieve the splitting of stream using Side outputs as i have seen in the documentation.

Our business environment is such that there could be new event types flowing in and would the Flink Kafka producer create the topics dynamically based on the inflowing events. I did not see any documentation saying that it could create.

Or should it be always pre created by running a script separately. (Not a good scalable practice in our case)

Thanks,
Prasanna.

Arvid Heise | Senior Java Developer

Join Flink Forward - The Apache Flink Conference

Stream Processing | Event Driven | Real Time

Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany

Ververica GmbH
Registered at Amtsgericht Charlottenburg: HRB 158244 B
Managing Directors: Timothy Alexander Steinert, Yip Park Tung Jason, Ji (Toni) Cheng