flink batch data processing

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

flink batch data processing

Paul Joireman

I'm evaluating for some processing batches of data.  As a simple example say I have 2000 points which I would like to pass through an FIR filter using functionality provided by the Python scipy libraryjk.  The scipy filter is a simple function which accepts a set of coefficients and the data to filter and returns the data.   Is is possible to create a transformation to handle this in flink?  It seems flink transformations are applied on a point by point basis but I may be missing something.

Paul

Reply | Threaded
Open this post in threaded view
|

Re: flink batch data processing

Ufuk Celebi
Are you using the DataSet or DataStream API?

Yes, most Flink transformations operate on single tuples, but you can
work around it:
- You could write a custom source function, which emits records that
contain X points
(https://ci.apache.org/projects/flink/flink-docs-master/apis/batch/index.html#data-sources)
- You can use a mapPartition
(https://ci.apache.org/projects/flink/flink-docs-master/apis/batch/dataset_transformations.html#mappartition)
or FlatMap (https://ci.apache.org/projects/flink/flink-docs-master/apis/batch/dataset_transformations.html#flatmap)
function and create the batches manually.

Does this help?

On Fri, Jul 22, 2016 at 7:21 PM, Paul Joireman <[hidden email]> wrote:
> I'm evaluating for some processing batches of data.  As a simple example say
> I have 2000 points which I would like to pass through an FIR filter using
> functionality provided by the Python scipy libraryjk.  The scipy filter is a
> simple function which accepts a set of coefficients and the data to filter
> and returns the data.   Is is possible to create a transformation to handle
> this in flink?  It seems flink transformations are applied on a point by
> point basis but I may be missing something.
>
> Paul