multiple kafka topics

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

multiple kafka topics

Aissa Elaffani
Hello Guys,
I am working on a Flink application, in which I consume data from Apache Kafka, the data is published in three topics of the cluster, and I need to read from them, I suppose I can create three FlikKafkaConsumer constructors.  The data I am consuming is in the same format {Id_sensor:, Id_equipement, Date:, Value{...}, ...}, the problem is the "Value" field changes from topic to topic, in fact in the first topic I have the temperature as a value "Value":{"temperature":26}  , the second topic contains oil data as a value "Value":{"oil_data":26}, the third topic the value field is "Value": {"Pitch":, "Roll", "Yaw"}.
So I created three FlinkKafkaConsumer, and I defined three DeserializationSchema for each data of a topic, the problem is I want to do some aggregations on those data all together in order to apply a function. So I am wondering if It is a problem to join the three streams together in one stream and then do my aggregation by a field, and then apply the function, and finally sink it. and if so, am I going to have a problem sinking the data, because actually as I explained the value field is different from topic to another. Can anyone give me an explanation Please, I would be so grateful. Thank you for your time !! 
Aissa 
Reply | Threaded
Open this post in threaded view
|

Re: multiple kafka topics

Dmytro Dragan

Hi Aissa,

 

  1. To join 3 streams you can chain 2 coflatmap functions:

https://stackoverflow.com/questions/54277910/how-do-i-join-two-streams-in-apache-flink

 

  1. If your aggregation function can be also applied partially you can chain 2 joins:

https://ci.apache.org/projects/flink/flink-docs-stable/dev/stream/operators/joining.html

 

  1. create tables from each stream and make one sql join:

https://ci.apache.org/projects/flink/flink-docs-release-1.11/dev/table/tableApi.html#joins

 

It is important to understand what type of join is suits to your scenario,  could one of streams miss some data + max delay.

 

 

From: Aissa Elaffani <[hidden email]>
Date: Monday, 10 August 2020 at 05:11
To: <[hidden email]>
Subject: multiple kafka topics

 

Hello Guys,

I am working on a Flink application, in which I consume data from Apache Kafka, the data is published in three topics of the cluster, and I need to read from them, I suppose I can create three FlikKafkaConsumer constructors.  The data I am consuming is in the same format {Id_sensor:, Id_equipement, Date:, Value{...}, ...}, the problem is the "Value" field changes from topic to topic, in fact in the first topic I have the temperature as a value "Value":{"temperature":26}  , the second topic contains oil data as a value "Value":{"oil_data":26}, the third topic the value field is "Value": {"Pitch":, "Roll", "Yaw"}.

So I created three FlinkKafkaConsumer, and I defined three DeserializationSchema for each data of a topic, the problem is I want to do some aggregations on those data all together in order to apply a function. So I am wondering if It is a problem to join the three streams together in one stream and then do my aggregation by a field, and then apply the function, and finally sink it. and if so, am I going to have a problem sinking the data, because actually as I explained the value field is different from topic to another. Can anyone give me an explanation Please, I would be so grateful. Thank you for your time !! 

Aissa