In my map functions i have an object containing a list which must be changed, executing some logic.
So, considering java 8 parallel streams would it be worth to use them or does IterativeStreams offer a better performance without java 8 streams parallel overhead? Thanks |
Hi,
How would you use IterativeStream? In Flink IterativeStream is a pipeline-level concept whereas your problem seems to be scoped to one user function. Best, Aljoscha > On 12. Jun 2017, at 19:17, nragon <[hidden email]> wrote: > > In my map functions i have an object containing a list which must be changed, > executing some logic. > So, considering java 8 parallel streams would it be worth to use them or > does IterativeStreams offer a better performance without java 8 streams > parallel overhead? > > Thanks > > > > -- > View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Java-parallel-streams-vs-IterativeStream-tp13655.html > Sent from the Apache Flink User Mailing List archive. mailing list archive at Nabble.com. |
Iterate until all elements were changed perhaps. But just wanted to know if there areimplementations out there using java 8 streams, in cases where you want to parallelize a map function even if it is function scoped.
So, in my case, if the computation for each list element is to heavy, how can one parallelize it? |
I think for that you would unpack to List of values, for example with a FlatMap<List<T>, T>. This would emit each element of the list as a separate element. Then, downstream operations can operate on each element individually and you will exploit parallelism in the cluster.
Best, Aljoscha > On 13. Jun 2017, at 10:49, nragon <[hidden email]> wrote: > > Iterate until all elements were changed perhaps. But just wanted to know if > there areimplementations out there using java 8 streams, in cases where you > want to parallelize a map function even if it is function scoped. > So, in my case, if the computation for each list element is to heavy, how > can one parallelize it? > > > > -- > View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Java-parallel-streams-vs-IterativeStream-tp13655p13677.html > Sent from the Apache Flink User Mailing List archive. mailing list archive at Nabble.com. |
This post was updated on .
That would work but after FlatMap<List<T>, T> I would have to downstream all elements into one.
Map function would go from for (Attribute dimension : this.dimensions) { this.transformed.setParameterNoCheck(dimension.getName(), dimension.get(this.context)); } return this.transformed to this.dimensions.parallelStream().forEach(dimension -> this.transformed.setParameterNoCheck(dimension.getName(), dimension.get(this.context))); return this.transformed |
I see, I’m afraid that is not easily possible except by doing a custom stateful function that waits for all elements to arrive and combines them again.
Another thing you could look at is the async I/O operator: https://ci.apache.org/projects/flink/flink-docs-release-1.3/dev/stream/asyncio.html. You would not necessarily use it for I/O but spawn threads for processing your data in parallel. Best, Aljoscha
|
I've mentioned java 8 stream beacuse avoids leaving map, thus decreasing network io, if not chained, and takes advantage of multiple cpus. Guess will have to test it.
|
Ah yes, I forgot Java8 streams. That could probably be your best option. Yes!
> On 14. Jun 2017, at 16:25, nragon <[hidden email]> wrote: > > I've mentioned java 8 stream beacuse avoids leaving map, thus decreasing > network io, if not chained, and takes advantage of multiple cpus. Guess will > have to test it. > > > > -- > View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Java-parallel-streams-vs-IterativeStream-tp13655p13735.html > Sent from the Apache Flink User Mailing List archive. mailing list archive at Nabble.com. |
Free forum by Nabble | Edit this page |