Hello!
I have a data source that emits Arrays that I collect into windows via countWindow. Rather than parallelize my subsequent operations by groups of these arrays, I'd like to parallelize my operations across the elements of the array (rows rather than columns, if you will) within each window. Some context: I'm attempting a time series analysis across some number of voxels. Each time step, I receive an Array of voxel data, but I'd like to analyze the voxels across time. It sounds like this approach mixes DataStream and DataSet concepts (where each window is a DataSet), which I know are not supported. Perhaps there is some other way to accomplish this task? Thanks! Daniel |
Hi Daniel, I'm not sure whether I grasp the whole problem, but can't you split the vector up into the different rows, group by the row index and then apply some kind of continuous aggregation or window function? Maybe it helps if you can share some of your code with the community to discuss the implementation. Cheers, Till On Fri, Nov 4, 2016 at 5:45 PM, Daniel Suo <[hidden email]> wrote:
|
This post was updated on .
So I could flatMap my incoming Arrays into (rowId, arrayElement) and gather them appropriately in a window operation. Here is brief code to describe the problem:
// Source emits Array[Double]
Now I have a windowSize (representing time) by arrayLength (representing voxels) matrix. Flink lets me parallelize by time easily, but I'd like to parallelize by voxel.
|
I was able to resolve my issue by collecting all the 'column' Arrays via countWindowAll and using flatMap to emit 'row' Arrays.
|
In order to parallelize by voxel you have to do a Glad to hear that you’ve resolved the problem :-) Cheers, On Sat, Nov 5, 2016 at 2:47 AM, danielsuo <[hidden email]> wrote: I was able to resolve my issue by collecting all the 'column' Arrays via |
Free forum by Nabble | Edit this page |