DataSet in Streaming application under Flink

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

DataSet in Streaming application under Flink

Sylvain Hotte
Hi,
I want to know if it is possible to load a small dataset in a stream
application under flink.

Here's an example:
I have a data stream A and a Data Set B
I need to compare all A tuple to tuple of B.
Since B is small, it would be loaded on all node and be persistent (not
reloaded at every computation)

I am doing a Master on realtime geospatial  operator in Big Data and I
looking at different strategy to spatially distribute the stream base on
application and operation characteristic.
One of them involve comparing dataset & datastream.

Regards,

Sylvain Hotte


Reply | Threaded
Open this post in threaded view
|

Re: DataSet in Streaming application under Flink

Till Rohrmann

Hi Sylvain,

what you could do for example is to load a static data set, e.g. from HDFS, in the open method of your comparator and cache it there. The open method is called for each task once when it is created. The comparator could then be a RichMapFunction implementation. By making the field storing the small data set static, you can even share the data among all tasks which run on the same TaskManager.

Cheers,
Till


On Tue, Jan 19, 2016 at 5:53 PM, Sylvain Hotte <[hidden email]> wrote:
Hi,
I want to know if it is possible to load a small dataset in a stream application under flink.

Here's an example:
I have a data stream A and a Data Set B
I need to compare all A tuple to tuple of B.
Since B is small, it would be loaded on all node and be persistent (not reloaded at every computation)

I am doing a Master on realtime geospatial  operator in Big Data and I looking at different strategy to spatially distribute the stream base on application and operation characteristic.
One of them involve comparing dataset & datastream.

Regards,

Sylvain Hotte