Kafka control source in addition to Kafka data source

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Kafka control source in addition to Kafka data source

gdibernardo
Hello everyone,

I am a Flink newcomer and I would like to implement a Flink application with two Kafka sources: one for the data stream to be processed and the other one for control purposes. The application should be able to read from the control stream and then apply the control operation to the data coming from the data stream. To be more clear, I would like to have something like: if the application reads from the control source a control operation with identifier 22, then it should apply a certain transformation to all the incoming data values that are marked with id 22.

I would like to ask you if having two Kafka sources (one for the data and another for control purposes) is actually a good practice. I’d like also to ask you if you have some advices or suggestions for me regarding how to keep a queue of such active control operations.

Thank you so much.

Best,


Gabriele

Reply | Threaded
Open this post in threaded view
|

Re: Kafka control source in addition to Kafka data source

snntr
Hi Gabriele,

I think this is actually a quite common pattern. Generally, you can
`join` the two streams and then use a `CoFlatMapFunction`. A
`CoFlatMapFunction` allows you to keep shared (checkpointed) state
between two streams. It has two callbacks `flatMap1` and `flatMap2`
which are called whenever a record from the respective stream arrives.
You can update a data structure (state), which holds the information on
which information to apply to which kind of element, on each element of
the control stream. On each element of the data stream you apply to
correct transformation based on the current state of the operator.

Does this makes sense to you? If you share a little bit about the use
case. In particular, it would be relevant if both streams share a common
key, on which they can be partitioned.

Cheers,

Konstantin

On 18.07.2017 21:14, Gabriele Di Bernardo wrote:

> Hello everyone,
>
> I am a Flink newcomer and I would like to implement a Flink application with two Kafka sources: one for the data stream to be processed and the other one for control purposes. The application should be able to read from the control stream and then apply the control operation to the data coming from the data stream. To be more clear, I would like to have something like: if the application reads from the control source a control operation with identifier 22, then it should apply a certain transformation to all the incoming data values that are marked with id 22.
>
> I would like to ask you if having two Kafka sources (one for the data and another for control purposes) is actually a good practice. I’d like also to ask you if you have some advices or suggestions for me regarding how to keep a queue of such active control operations.
>
> Thank you so much.
>
> Best,
>
>
> Gabriele
>

--
Konstantin Knauf * [hidden email] * +49-174-3413182
TNG Technology Consulting GmbH, Betastr. 13a, 85774 Unterföhring
Geschäftsführer: Henrik Klagges, Christoph Stock, Dr. Robert Dahlke
Sitz: Unterföhring * Amtsgericht München * HRB 135082
Reply | Threaded
Open this post in threaded view
|

Re: Kafka control source in addition to Kafka data source

gdibernardo
Hi Konstantin,

Thank you so much for your answer. Yes, I think this is exactly what I need.

Thank you.

Best,


Gabriele

> On 18 Jul 2017, at 21:27, Konstantin Knauf <[hidden email]> wrote:
>
> Hi Gabriele,
>
> I think this is actually a quite common pattern. Generally, you can
> `join` the two streams and then use a `CoFlatMapFunction`. A
> `CoFlatMapFunction` allows you to keep shared (checkpointed) state
> between two streams. It has two callbacks `flatMap1` and `flatMap2`
> which are called whenever a record from the respective stream arrives.
> You can update a data structure (state), which holds the information on
> which information to apply to which kind of element, on each element of
> the control stream. On each element of the data stream you apply to
> correct transformation based on the current state of the operator.
>
> Does this makes sense to you? If you share a little bit about the use
> case. In particular, it would be relevant if both streams share a common
> key, on which they can be partitioned.
>
> Cheers,
>
> Konstantin
>
> On 18.07.2017 21:14, Gabriele Di Bernardo wrote:
>> Hello everyone,
>>
>> I am a Flink newcomer and I would like to implement a Flink application with two Kafka sources: one for the data stream to be processed and the other one for control purposes. The application should be able to read from the control stream and then apply the control operation to the data coming from the data stream. To be more clear, I would like to have something like: if the application reads from the control source a control operation with identifier 22, then it should apply a certain transformation to all the incoming data values that are marked with id 22.
>>
>> I would like to ask you if having two Kafka sources (one for the data and another for control purposes) is actually a good practice. I’d like also to ask you if you have some advices or suggestions for me regarding how to keep a queue of such active control operations.
>>
>> Thank you so much.
>>
>> Best,
>>
>>
>> Gabriele
>>
>
> --
> Konstantin Knauf * [hidden email] * +49-174-3413182
> TNG Technology Consulting GmbH, Betastr. 13a, 85774 Unterföhring
> Geschäftsführer: Henrik Klagges, Christoph Stock, Dr. Robert Dahlke
> Sitz: Unterföhring * Amtsgericht München * HRB 135082