Hi,
I have Use Case where i read events from a Single kafka Stream comprising of JSON messages. Requirement is to split the stream into multiple output streams based on some criteria say based on Type of Event or Based on Type and Customer associated with the event. We could achieve the splitting of stream using Side outputs as i have seen in the documentation. Our business environment is such that there could be new event types flowing in and would the Flink Kafka producer create the topics dynamically based on the inflowing events. I did not see any documentation saying that it could create. Or should it be always pre created by running a script separately. (Not a good scalable practice in our case) Thanks, Prasanna. |
Hi, kumar
Flink support consume/produce from/to multiple kafka topics[1], in your case you can implement KeyedSerializationSchema(legacy interface) or KafkaSerializationSchema[2] to make one producer instance support send data to multiple topics. There is an ITCase you can reference[3]. Best, Leonard Xu
|
Leaonard, Thanks for the reply and would look into those options. But as for the original question, could we create a topic dynamically when required . Prasanna. On Mon, Jun 1, 2020 at 2:18 PM Leonard Xu <[hidden email]> wrote:
|
Hi, kumar Best, Leonard Xu
|
Prasanna, You might want to check the kafka broker configs where 'auto.create.topics.enable' helps with creating a new topic whenever a new message with non existent topic is published. I am not too sure about pitfalls if any. On Mon, Jun 1, 2020 at 3:20 PM Leonard Xu <[hidden email]> wrote:
|
I think @brat is right, I didn’t know the Kafka property 'auto.create.topics.enable’ , you can pass the property to Kafka Producer, that should work.
Best, Leonard Xu
|
I think "auto.create.topics.enable" is enabled by default [1]? Best, Jark On Mon, 1 Jun 2020 at 19:55, Leonard Xu <[hidden email]> wrote:
|
Hi Prasanna, auto.create.topics.enable is only recommended for development clusters and not in production use cases (as one programming error could potentially flood the whole broker with a large amount of topics). I have experienced first hand the mess it makes. I'd suggest finding a supplemental external solution to that. You need to configure retention policies and ACLs anyways on the topics on all real environments. In any case, I'd also discourage splitting data that is in one Kafka topic at all. I'd rather split it into separate partitions of the same topic and then only consume the respective partition. But it's usually so much cheaper to just filter irrelevant events on the original topic than for example later correlate a subset of events in the split topics. Only in the original topic, you will easily have a clear ordering of events happening to the same entity (key). On Tue, Jun 2, 2020 at 10:37 AM Jark Wu <[hidden email]> wrote:
-- Arvid Heise | Senior Java Developer Follow us @VervericaData -- Join Flink Forward - The Apache Flink Conference Stream Processing | Event Driven | Real Time -- Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany -- Ververica GmbHRegistered at Amtsgericht Charlottenburg: HRB 158244 B Managing Directors: Timothy Alexander Steinert, Yip Park Tung Jason, Ji (Toni) Cheng |
Free forum by Nabble | Edit this page |