RE: Flink logs only written to one host

Posted by Emmanuel on
URL: http://deprecated-apache-flink-user-mailing-list-archive.369.s1.nabble.com/Flink-logs-only-written-to-one-host-tp831p832.html

It appears actually that the slots used are all on the same host.
My guess is because I am using the default partitioning method (forward, which defaults to the same host) 

However I now tried .shuffle() and .distribute() without any luck:

I have a 
DataStream<String> text = env.socketTextStream(inHostName, inPort);
this is the one socket input stream.
Adding text.distribute().map(...)
does not seem to distribute the .map() process on the other hosts.
Is this the correct way to use .distribute() on a stream input? 
Thanks
Emmanuel


From: [hidden email]
To: [hidden email]
Subject: Flink logs only written to one host
Date: Thu, 12 Mar 2015 17:30:28 +0000

Hello,

I'm using a 3 nodes (3VMs) cluster, 3CPUs each, parallelism of 9, 
I usually only see taskmanager.out logs generated only on one of the 3 nodes when I use the System.out.println() method, to print debug info in my main processing function.

Is this expected? Or am I just doing something wrong? 
I stream from a socket with socketTextStream; I understand that this job runs on a single process, and I see that in the UI (using one slot only), but the computation task runs on 9 slots. That task includes the System.out.println() statement, yet it only shows on one host's .out log folder. 
The host is not always the same, so I have to tail all logs on all hosts, but I'm surprised of this behavior.
Am I just missing something? 
Are 'print' statement to stdout aggregated on one host somehow? If so how is this controlled? Why would that host change?

I would love to understand what is going on, and if maybe somehow the 9 slots may be running on a single host which would defeat the purpose.

Thanks for the insight

Emmanuel