Hello everyone,
I am a Flink newcomer and I would like to implement a Flink application with two Kafka sources: one for the data stream to be processed and the other one for control purposes. The application should be able to read from the control stream and then apply the control operation to the data coming from the data stream. To be more clear, I would like to have something like: if the application reads from the control source a control operation with identifier 22, then it should apply a certain transformation to all the incoming data values that are marked with id 22. I would like to ask you if having two Kafka sources (one for the data and another for control purposes) is actually a good practice. I’d like also to ask you if you have some advices or suggestions for me regarding how to keep a queue of such active control operations. Thank you so much. Best, Gabriele |
Hi Gabriele,
I think this is actually a quite common pattern. Generally, you can `join` the two streams and then use a `CoFlatMapFunction`. A `CoFlatMapFunction` allows you to keep shared (checkpointed) state between two streams. It has two callbacks `flatMap1` and `flatMap2` which are called whenever a record from the respective stream arrives. You can update a data structure (state), which holds the information on which information to apply to which kind of element, on each element of the control stream. On each element of the data stream you apply to correct transformation based on the current state of the operator. Does this makes sense to you? If you share a little bit about the use case. In particular, it would be relevant if both streams share a common key, on which they can be partitioned. Cheers, Konstantin On 18.07.2017 21:14, Gabriele Di Bernardo wrote: > Hello everyone, > > I am a Flink newcomer and I would like to implement a Flink application with two Kafka sources: one for the data stream to be processed and the other one for control purposes. The application should be able to read from the control stream and then apply the control operation to the data coming from the data stream. To be more clear, I would like to have something like: if the application reads from the control source a control operation with identifier 22, then it should apply a certain transformation to all the incoming data values that are marked with id 22. > > I would like to ask you if having two Kafka sources (one for the data and another for control purposes) is actually a good practice. I’d like also to ask you if you have some advices or suggestions for me regarding how to keep a queue of such active control operations. > > Thank you so much. > > Best, > > > Gabriele > -- Konstantin Knauf * [hidden email] * +49-174-3413182 TNG Technology Consulting GmbH, Betastr. 13a, 85774 Unterföhring Geschäftsführer: Henrik Klagges, Christoph Stock, Dr. Robert Dahlke Sitz: Unterföhring * Amtsgericht München * HRB 135082 |
Hi Konstantin,
Thank you so much for your answer. Yes, I think this is exactly what I need. Thank you. Best, Gabriele > On 18 Jul 2017, at 21:27, Konstantin Knauf <[hidden email]> wrote: > > Hi Gabriele, > > I think this is actually a quite common pattern. Generally, you can > `join` the two streams and then use a `CoFlatMapFunction`. A > `CoFlatMapFunction` allows you to keep shared (checkpointed) state > between two streams. It has two callbacks `flatMap1` and `flatMap2` > which are called whenever a record from the respective stream arrives. > You can update a data structure (state), which holds the information on > which information to apply to which kind of element, on each element of > the control stream. On each element of the data stream you apply to > correct transformation based on the current state of the operator. > > Does this makes sense to you? If you share a little bit about the use > case. In particular, it would be relevant if both streams share a common > key, on which they can be partitioned. > > Cheers, > > Konstantin > > On 18.07.2017 21:14, Gabriele Di Bernardo wrote: >> Hello everyone, >> >> I am a Flink newcomer and I would like to implement a Flink application with two Kafka sources: one for the data stream to be processed and the other one for control purposes. The application should be able to read from the control stream and then apply the control operation to the data coming from the data stream. To be more clear, I would like to have something like: if the application reads from the control source a control operation with identifier 22, then it should apply a certain transformation to all the incoming data values that are marked with id 22. >> >> I would like to ask you if having two Kafka sources (one for the data and another for control purposes) is actually a good practice. I’d like also to ask you if you have some advices or suggestions for me regarding how to keep a queue of such active control operations. >> >> Thank you so much. >> >> Best, >> >> >> Gabriele >> > > -- > Konstantin Knauf * [hidden email] * +49-174-3413182 > TNG Technology Consulting GmbH, Betastr. 13a, 85774 Unterföhring > Geschäftsführer: Henrik Klagges, Christoph Stock, Dr. Robert Dahlke > Sitz: Unterföhring * Amtsgericht München * HRB 135082 |
Free forum by Nabble | Edit this page |