This post was updated on .
Hi,
may be i just missing smth, but i just have no more ideas where to look. here is an screen of the failed state <http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/file/t1383/Bildschirmfoto_2018-11-12_um_16.png> i read messages from 2 sources, make a join based on a common key and sink it all in a kafka. val env = StreamExecutionEnvironment.getExecutionEnvironment env.setParallelism(3) ... source1 .keyBy(_.searchId) .connect(source2.keyBy(_.searchId)) .process(new SearchResultsJoinFunction) .addSink(KafkaSink.sink) so it perfectly works when i launch it locally. it also works on cluster with Parallelism set to 1, but not with 3 any more. When i deploy it to 1 job manager and 3 taskmanagers and get every Task in "RUNNING" state, after 2 minutes (when nothing is comming to sink) one of the taskmanagers gets following in log: Flat Map (1/3) (9598c11996f4b52a2e2f9f532f91ff66) switched from RUNNING to FAILED. java.io.IOException: Connecting the channel failed: Connecting to remote task manager + 'flink-taskmanager-11-dn9cj/10.81.27.84:37708' has failed. This might indicate that the remote task manager has been lost. at org.apache.flink.runtime.io.network.netty.PartitionRequestClientFactory$ConnectingChannel.waitForChannel(PartitionRequestClientFactory.java:196) at org.apache.flink.runtime.io.network.netty.PartitionRequestClientFactory$ConnectingChannel.access$000(PartitionRequestClientFactory.java:133) at org.apache.flink.runtime.io.network.netty.PartitionRequestClientFactory.createPartitionRequestClient(PartitionRequestClientFactory.java:85) at org.apache.flink.runtime.io.network.netty.NettyConnectionManager.createPartitionRequestClient(NettyConnectionManager.java:60) at org.apache.flink.runtime.io.network.partition.consumer.RemoteInputChannel.requestSubpartition(RemoteInputChannel.java:166) at org.apache.flink.runtime.io.network.partition.consumer.SingleInputGate.requestPartitions(SingleInputGate.java:494) at org.apache.flink.runtime.io.network.partition.consumer.SingleInputGate.getNextBufferOrEvent(SingleInputGate.java:525) at org.apache.flink.runtime.io.network.partition.consumer.SingleInputGate.getNextBufferOrEvent(SingleInputGate.java:508) at org.apache.flink.streaming.runtime.io.BarrierTracker.getNextNonBlocked(BarrierTracker.java:94) at org.apache.flink.streaming.runtime.io.StreamInputProcessor.processInput(StreamInputProcessor.java:209) at org.apache.flink.streaming.runtime.tasks.OneInputStreamTask.run(OneInputStreamTask.java:105) at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:300) at org.apache.flink.runtime.taskmanager.Task.run(Task.java:711) at java.lang.Thread.run(Thread.java:748) Caused by: org.apache.flink.runtime.io.network.netty.exception.RemoteTransportException: Connecting to remote task manager + 'flink-taskmanager-11-dn9cj/10.81.27.84:37708' has failed. This might indicate that the remote task manager has been lost. at org.apache.flink.runtime.io.network.netty.PartitionRequestClientFactory$ConnectingChannel.operationComplete(PartitionRequestClientFactory.java:219) at org.apache.flink.runtime.io.network.netty.PartitionRequestClientFactory$ConnectingChannel.operationComplete(PartitionRequestClientFactory.java:133) at org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:511) at org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.notifyListeners0(DefaultPromise.java:504) at org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:483) at org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:424) at org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:121) at org.apache.flink.shaded.netty4.io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe$1.run(AbstractNioChannel.java:269) at org.apache.flink.shaded.netty4.io.netty.util.concurrent.PromiseTask$RunnableAdapter.call(PromiseTask.java:38) at org.apache.flink.shaded.netty4.io.netty.util.concurrent.ScheduledFutureTask.run(ScheduledFutureTask.java:125) at org.apache.flink.shaded.netty4.io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163) at org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:404) at org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:463) at org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:884) ... 1 more Caused by: org.apache.flink.shaded.netty4.io.netty.channel.ConnectTimeoutException: connection timed out: flink-taskmanager-11-dn9cj/10.81.27.84:37708 at org.apache.flink.shaded.netty4.io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe$1.run(AbstractNioChannel.java:267) ... 7 more 2018-11-12 15:47:57,198 INFO org.apache.flink.runtime.taskmanager.Task - Flat Map (1/3) (171e84d98f94a83e1f3e7cd598c7dbbc) switched from RUNNING to FAILED. i would appreciate any hint. thx a lot. -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/ |
Hey, DId You try to run any other job on your setup? Also, could You please tell what are the sources you are trying to use, do all messages come from Kafka?? From the first look, it seems that the JobManager can't connect to one of the TaskManagers. Best Regards, Dom. pon., 12 lis 2018 o 17:12 zavalit <[hidden email]> napisał(a): Hi, |
PS. Could You also post the whole log for the application run ?? Best Regards, Dom. czw., 15 lis 2018 o 11:04 Dominik Wosiński <[hidden email]> napisał(a):
|
Hey, Dominik,
tnx for getting back. i've posted also by stackoverflow and David Anderson gave a good tipp where to look. https://stackoverflow.com/questions/53282967/run-flink-with-parallelism-more-than-1/53289840 issues is resolved, everything is running. thx. again -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/ |
Free forum by Nabble | Edit this page |