Broadcasting sets in Flink Streaming

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Broadcasting sets in Flink Streaming

Tamara Mendt
Hello,

I have been trying to use the function withBroadcastSet on a SingleOutputStreamOperator (map) the same way I would on a MapOperator for a DataSet. From what I see, this cannot be done. I wonder if there is some way to broadcast a DataSet to the tasks that are performing transformations on a DataStream?

I am basically pre-calculating some things with Flink which I later need for the transformations on the incoming data from the stream. So I want to broadcast the resulting datasets from the pre-calculations.

Any ideas on how to best approach this?

Thanks, cheers

Tamara.
Reply | Threaded
Open this post in threaded view
|

Re: Broadcasting sets in Flink Streaming

Till Rohrmann
Hi Tamara,

I think this is not officially supported by Flink yet. However, I think that Gyula had once an example where he did something comparable. Maybe he can chime in here.

Cheers,
Till

On Tue, Aug 25, 2015 at 11:15 AM, Tamara Mendt <[hidden email]> wrote:
Hello,

I have been trying to use the function withBroadcastSet on a SingleOutputStreamOperator (map) the same way I would on a MapOperator for a DataSet. From what I see, this cannot be done. I wonder if there is some way to broadcast a DataSet to the tasks that are performing transformations on a DataStream?

I am basically pre-calculating some things with Flink which I later need for the transformations on the incoming data from the stream. So I want to broadcast the resulting datasets from the pre-calculations.

Any ideas on how to best approach this?

Thanks, cheers

Tamara.

Reply | Threaded
Open this post in threaded view
|

Re: Broadcasting sets in Flink Streaming

Stephan Ewen
You can do something very similar like broadcast sets like this:

Use a Co-Map function and connect your main data set regularly ("forward" partitioning) to one input and your broadcast set via "broadcast" to the other input. You can then retrieve the data in the two map functions separately.

This approach misses the logic that the broadcast data arrives fully before the non-broadcast data (you may receive events from the main data set before all broadcast data was received), but maybe you can work around that...

On Tue, Aug 25, 2015 at 2:45 PM, Till Rohrmann <[hidden email]> wrote:
Hi Tamara,

I think this is not officially supported by Flink yet. However, I think that Gyula had once an example where he did something comparable. Maybe he can chime in here.

Cheers,
Till

On Tue, Aug 25, 2015 at 11:15 AM, Tamara Mendt <[hidden email]> wrote:
Hello,

I have been trying to use the function withBroadcastSet on a SingleOutputStreamOperator (map) the same way I would on a MapOperator for a DataSet. From what I see, this cannot be done. I wonder if there is some way to broadcast a DataSet to the tasks that are performing transformations on a DataStream?

I am basically pre-calculating some things with Flink which I later need for the transformations on the incoming data from the stream. So I want to broadcast the resulting datasets from the pre-calculations.

Any ideas on how to best approach this?

Thanks, cheers

Tamara.


Reply | Threaded
Open this post in threaded view
|

Re: Broadcasting sets in Flink Streaming

Tamara Mendt
Ok, I'll try that. Thanks a lot!

On Tue, Aug 25, 2015 at 4:19 PM, Stephan Ewen <[hidden email]> wrote:
You can do something very similar like broadcast sets like this:

Use a Co-Map function and connect your main data set regularly ("forward" partitioning) to one input and your broadcast set via "broadcast" to the other input. You can then retrieve the data in the two map functions separately.

This approach misses the logic that the broadcast data arrives fully before the non-broadcast data (you may receive events from the main data set before all broadcast data was received), but maybe you can work around that...

On Tue, Aug 25, 2015 at 2:45 PM, Till Rohrmann <[hidden email]> wrote:
Hi Tamara,

I think this is not officially supported by Flink yet. However, I think that Gyula had once an example where he did something comparable. Maybe he can chime in here.

Cheers,
Till

On Tue, Aug 25, 2015 at 11:15 AM, Tamara Mendt <[hidden email]> wrote:
Hello,

I have been trying to use the function withBroadcastSet on a SingleOutputStreamOperator (map) the same way I would on a MapOperator for a DataSet. From what I see, this cannot be done. I wonder if there is some way to broadcast a DataSet to the tasks that are performing transformations on a DataStream?

I am basically pre-calculating some things with Flink which I later need for the transformations on the incoming data from the stream. So I want to broadcast the resulting datasets from the pre-calculations.

Any ideas on how to best approach this?

Thanks, cheers

Tamara.





--
Tamara Mendt