(DEPRECATED) Apache Flink User Mailing List archive.

Re: long runtime

Posted by Fabian Hueske on
URL: http://deprecated-apache-flink-user-mailing-list-archive.369.s1.nabble.com/long-runtime-tp104p107.html

Hi,

how did you specify the degree of parallelism DOP for your program?

Via the command-line client or system-configuration or otherwise?

The JobManager log file (./log/*jobManager*.log) contains you the DOP of each task.

Best, Fabian

2014-09-24 18:41 GMT+02:00 Stephan Ewen <[hidden email]>:

Hi!

Ad-hoc, that is not easy to say. It depends on your algorithm, how much data replication it does...

We'd need a bit of time to look into the code. It would help if you could roughly sketch the algorithm for us and give us a breakdown of how much time is spent in which operator (like a screenshot of the runtime web monitor).

Greetings,
Stephan

On Wed, Sep 24, 2014 at 6:18 PM, Florian Hönicke <[hidden email]> wrote:
Hello :)

my Flink program is extreme slow.
I implemented a set similarity join in Flink (Mass-Join).
Furthermore, I implemented a local version in Java.
I compared both Implementations.
The Local version needs one minute to compute a 500MB Dataset.
My Flink program needs 5 minutes (cluster: 11 nodes, 20 000 MB RAM).
I use the Flink version 0.6.
What could be the cause?

I would welcome your response,
Florian Hönicke