in-memory optimization

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

in-memory optimization

Robert Schwarzenberg
Hello,

I have a question regarding the loop-awareness of Flink wrt invariant
datasets.

Does Flink serialize the DataSet 'points' in line 85

https://github.com/apache/flink/blob/master/flink-examples/flink-examples-batch/src/main/scala/org/apache/flink/examples/scala/clustering/KMeans.scala 


each iteration or are there in-memory optimization procedures in place?

Thanks for your help!

Regards,
Robert
Reply | Threaded
Open this post in threaded view
|

Re: in-memory optimization

Ufuk Celebi
Loop invariant data should be kept in Flink's managed memory in
serialized form (in a custom hash table). That means that they are not
read back again from the CSV file, but they are kept in serialized
form and need be deserialized again on access.

CC'ing Fabian to double check...

On Mon, Apr 24, 2017 at 4:20 PM, Robert Schwarzenberg
<[hidden email]> wrote:

> Hello,
>
> I have a question regarding the loop-awareness of Flink wrt invariant
> datasets.
>
> Does Flink serialize the DataSet 'points' in line 85
>
> https://github.com/apache/flink/blob/master/flink-examples/flink-examples-batch/src/main/scala/org/apache/flink/examples/scala/clustering/KMeans.scala
>
> each iteration or are there in-memory optimization procedures in place?
>
> Thanks for your help!
>
> Regards,
> Robert