Flink on YARN: Cannot connect to JobManager

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Flink on YARN: Cannot connect to JobManager

Malte Schwarzer-3
Hi all,

I trying to run a Flink job on YARN via "$/bin/flink run -m yarn-cluster
-yn 2 ..." with two nodes. But only one JobManager seems to be connected.

Flinks hangs at this stage (look up message repeats every second):

017-01-11 15:12:13,653 DEBUG org.apache.flink.yarn.YarnClusterClient
                   - Looking up JobManager
2017-01-11 15:12:13,678 INFO org.apache.flink.yarn.YarnClusterClient
                   - TaskManager status (1/2)
TaskManager status (1/2)
2017-01-11 15:12:13,929 DEBUG org.apache.flink.yarn.YarnClusterClient
                    - Looking up JobManager
2017-01-11 15:12:14,197 DEBUG org.apache.flink.yarn.YarnClusterClient
                    - Looking up JobManager
2017-01-11 15:12:14,451 DEBUG org.apache.hadoop.ipc.Client
                    - IPC Client (20529812) connection to ____/10.68.17
.206:8032 from user sending #104
2017-01-11 15:12:14,452 DEBUG org.apache.hadoop.ipc.Client
                    - IPC Client (20529812) connection to ___:8032 from
user got value #104
2017-01-11 15:12:14,452 DEBUG org.apache.hadoop.ipc.ProtobufRpcEngine
                    - Call: getApplicationReport took 1ms
2017-01-11 15:12:14,462 DEBUG org.apache.flink.yarn.YarnClusterClient
                    - Looking up JobManager
2017-01-11 15:12:14,745 DEBUG org.apache.flink.yarn.YarnClusterClient
                    - Looking up JobManager
2017-01-11 15:12:15,014 DEBUG org.apache.flink.yarn.YarnClusterClient
                    - Looking up JobManager
2017-01-11 15:12:15,276 DEBUG org.apache.flink.yarn.YarnClusterClient
                    - Looking up JobManager
2017-01-11 15:12:15,322 DEBUG org.apache.hadoop.ipc.Client
                    - IPC Client (20529812) connection to ___:8020 from
user: closed
...

Any suggestions what can cause this?

Standard MapReduce jobs work without any problem on YARN.

Best regards,
Malte
Reply | Threaded
Open this post in threaded view
|

Re: Flink on YARN: Cannot connect to JobManager

Till Rohrmann

Hi Malte,

can it be that you’re trying to request more resources from your yarn cluster than there are currently available? It depends a little bit on your other settings but -yn 2 says that you request 2 TaskManagers. Additionally, Flink will also allocate another container for the JobManager. Per default, the TaskManager containers and the JobManager containers will be started with 1 GB of memory. Thus, it needs at least 3 containers with 3 GB of memory. Could you check whether you have these resources available in your YARN cluster?

If you have them available, then it indicates a faulty behaviour. Then it would be great if you could share the aggregated YARN logs for the Flink application with us (available after terminating the YARN application). This would help with the further debugging of the problem.

Cheers,
Till


On Thu, Jan 12, 2017 at 4:13 PM, Malte Schwarzer <[hidden email]> wrote:
Hi all,

I trying to run a Flink job on YARN via "$/bin/flink run -m yarn-cluster -yn 2 ..." with two nodes. But only one JobManager seems to be connected.

Flinks hangs at this stage (look up message repeats every second):

017-01-11 15:12:13,653 DEBUG org.apache.flink.yarn.YarnClusterClient                   - Looking up JobManager
2017-01-11 15:12:13,678 INFO org.apache.flink.yarn.YarnClusterClient                   - TaskManager status (1/2)
TaskManager status (1/2)
2017-01-11 15:12:13,929 DEBUG org.apache.flink.yarn.YarnClusterClient                    - Looking up JobManager
2017-01-11 15:12:14,197 DEBUG org.apache.flink.yarn.YarnClusterClient                    - Looking up JobManager
2017-01-11 15:12:14,451 DEBUG org.apache.hadoop.ipc.Client                    - IPC Client (20529812) connection to ____/10.68.17
.206:8032 from user sending #104
2017-01-11 15:12:14,452 DEBUG org.apache.hadoop.ipc.Client                    - IPC Client (20529812) connection to ___:8032 from user got value #104
2017-01-11 15:12:14,452 DEBUG org.apache.hadoop.ipc.ProtobufRpcEngine                    - Call: getApplicationReport took 1ms
2017-01-11 15:12:14,462 DEBUG org.apache.flink.yarn.YarnClusterClient                    - Looking up JobManager
2017-01-11 15:12:14,745 DEBUG org.apache.flink.yarn.YarnClusterClient                    - Looking up JobManager
2017-01-11 15:12:15,014 DEBUG org.apache.flink.yarn.YarnClusterClient                    - Looking up JobManager
2017-01-11 15:12:15,276 DEBUG org.apache.flink.yarn.YarnClusterClient                    - Looking up JobManager
2017-01-11 15:12:15,322 DEBUG org.apache.hadoop.ipc.Client                    - IPC Client (20529812) connection to ___:8020 from user: closed
...

Any suggestions what can cause this?

Standard MapReduce jobs work without any problem on YARN.

Best regards,
Malte