FLINK DATASTREAM Processing Question

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

FLINK DATASTREAM Processing Question

Vijayendra Yadav
Hi Team,

I have a generic Question.

Let's say I have 2 Actions to be taken on Flink DATASTREAM (Kafka).
1) Convert some data fields, and write to external Database 
2) Transform #1 converted data fields in to different record format say AVRO

Here are Two approaches that are possible.

a) One Map function doing both #1 and #2 actions

datastream.map(1)

b) Two maps doing separate Actions #1 and #2 .
 
 datastream.map().map(1).map(2)     

Which one would you prefer ?

Is there a way I can see Execution plan ?

Regards,
Vijay
Reply | Threaded
Open this post in threaded view
|

Re: FLINK DATASTREAM Processing Question

Dawid Wysakowicz-2

Hi,

You can see the execution plan via StreamExecutionEnvironment#getExecutionPlan(). You can visualize it in[1]. You can also submit your job and check the execution plan in Web UI.

As for the question which option is preferred it is very subjective. As long as in the option b) both maps are chained, there will be no much difference how the two options behave. Executing the map(2) will be just a method call.

You could also think about completely separating the tasks:

convertedFields = datastream.map(#convertFields)

convertedFields.addSink(.../*write to DB*/);

convertedFields.map(/* convert to avro */);

Best,

Dawid

[1] https://flink.apache.org/visualizer/

On 05/09/2020 01:13, Vijayendra Yadav wrote:
Hi Team,

I have a generic Question.

Let's say I have 2 Actions to be taken on Flink DATASTREAM (Kafka).
1) Convert some data fields, and write to external Database 
2) Transform #1 converted data fields in to different record format say AVRO

Here are Two approaches that are possible.

a) One Map function doing both #1 and #2 actions

datastream.map(1)

b) Two maps doing separate Actions #1 and #2 .
 
 datastream.map().map(1).map(2)     

Which one would you prefer ?

Is there a way I can see Execution plan ?

Regards,
Vijay

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: FLINK DATASTREAM Processing Question

Vijayendra Yadav
In reply to this post by Vijayendra Yadav
Thank You Dawid.

Sent from my iPhone

> On Sep 7, 2020, at 9:03 AM, Dawid Wysakowicz <[hidden email]> wrote:
>
Reply | Threaded
Open this post in threaded view
|

Re: FLINK DATASTREAM Processing Question

Timo Walther
In reply to this post by Dawid Wysakowicz-2
Hi Vijay,

one comment to add is that the performance might suffer with multiple
map() calls. For safety reason, records between chained operators are
serialized and deserialized in order to strictly don't influence each
other. If all functions of a pipeline are guaranteed to not modify
incoming objects, you can enable the object reuse mode (see
enableObjectReuse)[1].

Please correct me if I'm wrong.

Regards,
Timo

[1]
https://ci.apache.org/projects/flink/flink-docs-stable/dev/execution_configuration.html

On 07.09.20 18:02, Dawid Wysakowicz wrote:

> Hi,
>
> You can see the execution plan via
> StreamExecutionEnvironment#getExecutionPlan(). You can visualize it
> in[1]. You can also submit your job and check the execution plan in Web UI.
>
> As for the question which option is preferred it is very subjective. As
> long as in the option b) both maps are chained, there will be no much
> difference how the two options behave. Executing the map(2) will be just
> a method call.
>
> You could also think about completely separating the tasks:
>
> convertedFields = datastream.map(#convertFields)
>
> convertedFields.addSink(.../*write to DB*/);
>
> convertedFields.map(/* convert to avro */);
>
> Best,
>
> Dawid
>
> [1] https://flink.apache.org/visualizer/
>
> On 05/09/2020 01:13, Vijayendra Yadav wrote:
>> Hi Team,
>>
>> I have a generic Question.
>>
>> Let's say I have 2 Actions to be taken on Flink DATASTREAM (Kafka).
>> 1) Convert some data fields, and write to external Database
>> 2) Transform #1 converted data fields in to different record format
>> say AVRO
>>
>> *Here are Two approaches that are possible.*
>>
>> a) One Map function doing both #1 and #2 actions
>>
>> *datastream.map(1)*
>>
>> b) Two maps doing separate Actions #1 and #2 .
>> *datastream.map().map(1).map(2) *
>> *
>> *
>> Which one would you prefer ?
>>
>> Is there a way I can see Execution plan ?
>> *
>> *
>> Regards,
>> Vijay