Flink Scala performance

classic Classic list List threaded Threaded
17 messages Options
Reply | Threaded
Open this post in threaded view
|

Flink Scala performance

Vinh June
I just realized that Flink program takes a lot of time to run, for example, just the simple word count example in 0.9 takes 18s to run on my laptop (mbp mac os 10.9, i5, 8gb ram, ssd).
Any one can explain this / suggest a work around ?
Reply | Threaded
Open this post in threaded view
|

Re: Flink Scala performance

Aljoscha Krettek
Hi,
that depends. How are you executing the program? Inside an IDE? By starting a local cluster? And then, how big is your input data?

Cheers,
Aljoscha

On Wed, 15 Jul 2015 at 23:45 Vinh June <[hidden email]> wrote:
I just realized that Flink program takes a lot of time to run, for example,
just the simple word count example in 0.9 takes 18s to run on my laptop (mbp
mac os 10.9, i5, 8gb ram, ssd).
Any one can explain this / suggest a work around ?



--
View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Flink-Scala-performance-tp2065.html
Sent from the Apache Flink User Mailing List archive. mailing list archive at Nabble.com.
Reply | Threaded
Open this post in threaded view
|

Re: Flink Scala performance

Vinh June
I ran it on local, from terminal.
And it's the Word Count example so it's small
Reply | Threaded
Open this post in threaded view
|

Re: Flink Scala performance

Maximilian Michels
HI Vinh,

If you run your program locally, then Flink uses the local execution mode which allocates only little managed memory. Managed memory is used by Flink to perform operations on serialized data. These operations can get slow if too little memory gets allocated because data needs to be spilled to disk. That would of course be different in a cluster environment where you configure the memory explicitly.

When the task manager starts, it tells you how much memory it allocates. For example, in my case:

10:12:37,655 INFO  org.apache.flink.runtime.taskmanager.TaskManager              - Using 1227 MB for Flink managed memory.

How does that look in your case?

Cheers,
Max



On Thu, Jul 16, 2015 at 8:54 AM, Vinh June <[hidden email]> wrote:
I ran it on local, from terminal.
And it's the Word Count example so it's small



--
View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Flink-Scala-performance-tp2065p2074.html
Sent from the Apache Flink User Mailing List archive. mailing list archive at Nabble.com.

Reply | Threaded
Open this post in threaded view
|

Re: Flink Scala performance

Vinh June
Hi Max,
When I call 'flink run', it doesn't show any information like that
Reply | Threaded
Open this post in threaded view
|

Re: Flink Scala performance

Ufuk Celebi
Hey Vinh,

you have to look into the logs folder and find the log of the TaskManager (something like *taskmanager*.log)

– Ufuk

On 16 Jul 2015, at 11:35, Vinh June <[hidden email]> wrote:

> Hi Max,
> When I call 'flink run', it doesn't show any information like that
>
>
>
> --
> View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Flink-Scala-performance-tp2065p2083.html
> Sent from the Apache Flink User Mailing List archive. mailing list archive at Nabble.com.

Reply | Threaded
Open this post in threaded view
|

Re: Flink Scala performance

Stephan Ewen
In reply to this post by Vinh June
Vinh,

Are you using the sample data built into the example, or are you using your own data?

On Thu, Jul 16, 2015 at 8:54 AM, Vinh June <[hidden email]> wrote:
I ran it on local, from terminal.
And it's the Word Count example so it's small



--
View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Flink-Scala-performance-tp2065p2074.html
Sent from the Apache Flink User Mailing List archive. mailing list archive at Nabble.com.

Reply | Threaded
Open this post in threaded view
|

Re: Flink Scala performance

Vinh June
In reply to this post by Ufuk Celebi
I found it in JobManager log

"21:16:54,986 INFO  org.apache.flink.runtime.taskmanager.TaskManager              - Using 25 MB for Flink managed memory."

is there a way to explicitly assign this for local ?
Reply | Threaded
Open this post in threaded view
|

Re: Flink Scala performance

Chiwan Park-2
You can increase Flink managed memory by increasing Taskmanager JVM Heap (taskmanager.heap.mb) in flink-conf.yaml.
There is some explanation of options in Flink documentation [1].

Regards,
Chiwan Park

[1] https://ci.apache.org/projects/flink/flink-docs-master/setup/config.html#common-options

> On Jul 16, 2015, at 7:23 PM, Vinh June <[hidden email]> wrote:
>
> I found it in JobManager log
>
> "21:16:54,986 INFO  org.apache.flink.runtime.taskmanager.TaskManager            
> - Using 25 MB for Flink managed memory."
>
> is there a way to explicitly assign this for local ?
>
>
>
> --
> View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Flink-Scala-performance-tp2065p2087.html
> Sent from the Apache Flink User Mailing List archive. mailing list archive at Nabble.com.



Reply | Threaded
Open this post in threaded view
|

Re: Flink Scala performance

Vinh June
In reply to this post by Stephan Ewen
@Stephan: I use the sample data comes with the sample
Reply | Threaded
Open this post in threaded view
|

Re: Flink Scala performance

Stephan Ewen
If you use the sample data from the example, there must be an issue with the setup.

In Flink's standalone mode, it runs in 100ms on my machine.

It may be possible that the command line client takes a long time to start up, so it appears that the program run time is long. If it takes so long, one reason may be slow DNS resolution.

You can check that by looking at the logs of the client process (int the "log" folder).

Stephan


On Thu, Jul 16, 2015 at 2:06 PM, Vinh June <[hidden email]> wrote:
@Stephan: I use the sample data comes with the sample



--
View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Flink-Scala-performance-tp2065p2091.html
Sent from the Apache Flink User Mailing List archive. mailing list archive at Nabble.com.

Reply | Threaded
Open this post in threaded view
|

Re: Flink Scala performance

Vinh June
Here are my logs
http://pastebin.com/AJwiy2D8
http://pastebin.com/K05H3Qur
from client log, it seems to take ~2s, but with "time flink run ...", actual time is ~18s
Reply | Threaded
Open this post in threaded view
|

Re: Flink Scala performance

Stephan Ewen
Is it possible that it takes a long time to spawn JVMs on your system? That this takes up all the time?

On Thu, Jul 16, 2015 at 3:34 PM, Vinh June <[hidden email]> wrote:
Here are my logs
http://pastebin.com/AJwiy2D8
http://pastebin.com/K05H3Qur
from client log, it seems to take ~2s, but with "time flink run ...", actual
time is ~18s



--
View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Flink-Scala-performance-tp2065p2095.html
Sent from the Apache Flink User Mailing List archive. mailing list archive at Nabble.com.

Reply | Threaded
Open this post in threaded view
|

Re: Flink Scala performance

Vinh June
I just checked on web job manager, it says that runtime for flink job is 349ms, but actually it takes 18s using "time" command in terminal
Should I care more about the latter timing ?
Reply | Threaded
Open this post in threaded view
|

Re: Flink Scala performance

Stephan Ewen
The 349ms is how long it takes to run the job. The 18s is what it takes the command line client to submit the job.

Like I said before, may be there are super long delays on your system when you spawn JVMs, or in your DNS resolution. Thay way, connecting to the cluster to submit the job will take a long time...

On Thu, Jul 16, 2015 at 5:53 PM, Vinh June <[hidden email]> wrote:
I just checked on web job manager, it says that runtime for flink job is
349ms, but actually it takes 18s using "time" command in terminal
Should I care more about the latter timing ?



--
View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Flink-Scala-performance-tp2065p2106.html
Sent from the Apache Flink User Mailing List archive. mailing list archive at Nabble.com.

Reply | Threaded
Open this post in threaded view
|

Re: Flink Scala performance

Vinh June
it sounds unreasonable for me, because I'm working on other Java projects also, non of them takes that long to fire up JVM. Strange !
Do you have any suggestion to fix this ?
Reply | Threaded
Open this post in threaded view
|

Re: Flink Scala performance

Michele Bertoni
hi, actually the same happens to me on my macbook pro when not plugged to power but with battery
and twice if i am using hdfs

in my case it seems like in power saving mode jvm commands has a very high latency

i.e. a simple "hdfs dfs -ls /“ takes about 20 seconds when only on battery, so it is not related to flink

cheers


> Il giorno 18/lug/2015, alle ore 23:22, Vinh June <[hidden email]> ha scritto:
>
> it sounds unreasonable for me, because I'm working on other Java projects
> also, non of them takes that long to fire up JVM. Strange !
> Do you have any suggestion to fix this ?
>
>
>
> --
> View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Flink-Scala-performance-tp2065p2151.html
> Sent from the Apache Flink User Mailing List archive. mailing list archive at Nabble.com.