Cross product of datastream and dataset

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Cross product of datastream and dataset

Charlie Moad
We're having trouble mapping our problem to Flink.

- For each incoming item
- Generate tuples of the item crossed with a data set
- Filter the tuples based on a condition
- Know the count of matching tuples

This seems to be mashup of DataStream and DataSet, but it appears you can't operate with those together. We are also wondering if kicking off a batch job for each incoming item is a feasible approach. We don't have time window constraints.

Any recommendations would be greatly appreciated.

- Charlie

Reply | Threaded
Open this post in threaded view
|

Re: Cross product of datastream and dataset

Fabian Hueske-2
Hi,

it is not possible to mix the DataSet and DataStream APIs at the moment.

If the DataSet is constant and not too big (which I assume, since otherwise crossing would be extremely expensive), you can load the data into a stateful MapFunction.
For that you can implement a RichFlatMapFunction and read the data in open(). For each incoming record, i.e., each call of map(), you cross it with the records in the state and immediately evaluate the condition and count. That way you don't generate too many records.

If your DataSet is slowly changing, you can think of using a stateful CoFlatmapFunction and use on input to read the stream and the other to update the dataset.

Hope this helps,
Fabian


2016-11-17 22:36 GMT+01:00 Charlie Moad <[hidden email]>:
We're having trouble mapping our problem to Flink.

- For each incoming item
- Generate tuples of the item crossed with a data set
- Filter the tuples based on a condition
- Know the count of matching tuples

This seems to be mashup of DataStream and DataSet, but it appears you can't operate with those together. We are also wondering if kicking off a batch job for each incoming item is a feasible approach. We don't have time window constraints.

Any recommendations would be greatly appreciated.

- Charlie