Hi Community,
I was trying to run a big batch job which use JDBCInputFormat to retrieve a large amount data from a mysql database and do some joins in flink, the environment is AWS EMR. But it always failed very fast. I'm using flink on yarn, flink 1.6.1 my cluster has 1000GB memory, my job parameter is: -yD akka.ask.timeout=60s -yD akka.framesize=300m -yn 50 -ys 2 -yjm 8192 -ytm 8192 -p 40 I have 6 data sources with different tables and most of them are set with 100 parallelism. I can only see below WARN logs from the yarn aggregated yarn logs, the whole log is too big. 2019-03-21 09:10:16,430 WARN org.apache.flink.runtime.taskmanager.Task - Task 'CHAIN DataSource (at createInput(ExecutionEnvironment.java:548) (org.apache.flink.api.java.io.jdbc.JDBCInputFormat)) -> FlatMap (where: (AND(=(VALUE, _UTF-16LE'true'), =(ATTRIBUTENAME, _UTF-16LE'QuoteCurrencyId'))), select: (QUOTEID)) (10/50)' did not react to cancelling signal for 30 seconds, but is stuck in method: java.net.SocketInputStream.socketRead0(Native Method) java.net.SocketInputStream.socketRead(SocketInputStream.java:116) java.net.SocketInputStream.read(SocketInputStream.java:171) java.net.SocketInputStream.read(SocketInputStream.java:141) sun.security.ssl.InputRecord.readFully(InputRecord.java:465) sun.security.ssl.InputRecord.read(InputRecord.java:503) sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:975) sun.security.ssl.SSLSocketImpl.readDataRecord(SSLSocketImpl.java:933) sun.security.ssl.AppInputStream.read(AppInputStream.java:105) java.io.FilterInputStream.read(FilterInputStream.java:133) com.mysql.cj.protocol.FullReadInputStream.readFully(FullReadInputStream.java:64) com.mysql.cj.protocol.a.SimplePacketReader.readMessage(SimplePacketReader.java:108) com.mysql.cj.protocol.a.SimplePacketReader.readMessage(SimplePacketReader.java:45) com.mysql.cj.protocol.a.TimeTrackingPacketReader.readMessage(TimeTrackingPacketReader.java:57) com.mysql.cj.protocol.a.TimeTrackingPacketReader.readMessage(TimeTrackingPacketReader.java:41) com.mysql.cj.protocol.a.MultiPacketReader.readMessage(MultiPacketReader.java:61) com.mysql.cj.protocol.a.MultiPacketReader.readMessage(MultiPacketReader.java:44) com.mysql.cj.protocol.a.ResultsetRowReader.read(ResultsetRowReader.java:75) com.mysql.cj.protocol.a.ResultsetRowReader.read(ResultsetRowReader.java:42) com.mysql.cj.protocol.a.NativeProtocol.read(NativeProtocol.java:1685) com.mysql.cj.protocol.a.TextResultsetReader.read(TextResultsetReader.java:87) com.mysql.cj.protocol.a.TextResultsetReader.read(TextResultsetReader.java:48) com.mysql.cj.protocol.a.NativeProtocol.read(NativeProtocol.java:1698) com.mysql.cj.protocol.a.NativeProtocol.readAllResults(NativeProtocol.java:1752) com.mysql.cj.protocol.a.NativeProtocol.sendQueryPacket(NativeProtocol.java:1041) com.mysql.cj.NativeSession.execSQL(NativeSession.java:1157) com.mysql.cj.jdbc.ClientPreparedStatement.executeInternal(ClientPreparedStatement.java:947) com.mysql.cj.jdbc.ClientPreparedStatement.executeQuery(ClientPreparedStatement.java:1020) org.apache.flink.api.java.io.jdbc.JDBCInputFormat.open(JDBCInputFormat.java:238) org.apache.flink.runtime.operators.DataSourceTask.invoke(DataSourceTask.java:170) org.apache.flink.runtime.taskmanager.Task.run(Task.java:711) java.lang.Thread.run(Thread.java:748) 2019-03-21 09:10:16,883 WARN akka.remote.ReliableDeliverySupervisor - Association with remote system [akka.tcp://[hidden email]:41133] has failed, address is now gated for [50] ms. Reason: [Disassociated] 2019-03-21 09:10:18,447 INFO org.apache.flink.yarn.YarnTaskExecutorRunner - RECEIVED SIGNAL 15: SIGTERM. Shutting down as requested. 2019-03-21 09:10:18,448 INFO org.apache.flink.runtime.state.TaskExecutorLocalStateStoresManager - Shutting down TaskExecutorLocalStateStoresManager. 2019-03-21 09:10:18,448 INFO org.apache.flink.runtime.blob.PermanentBlobCache - Shutting down BLOB cache 2019-03-21 09:10:18,448 INFO org.apache.flink.runtime.blob.TransientBlobCache - Shutting down BLOB cache 2019-03-21 09:08:36,913 WARN akka.remote.transport.netty.NettyTransport - Remote connection to [/10.97.33.195:39282] failed with java.io.IOException: Connection reset by peer 2019-03-21 09:08:36,913 WARN akka.remote.ReliableDeliverySupervisor - Association with remote system [akka.tcp://[hidden email]:45423] has failed, address is now gated for [50] ms. Reason: [Disassociated] 2019-03-21 09:08:37,020 WARN org.apache.flink.yarn.YarnResourceManager - Discard registration from TaskExecutor container_1553143971811_0015_01_000059 at (akka.tcp://[hidden email]:44685/user/taskmanager_0) because the framework did not recognize it 2019-03-21 09:08:37,020 WARN org.apache.flink.yarn.YarnResourceManager - Discard registration from TaskExecutor container_1553143971811_0015_01_000067 at (akka.tcp://[hidden email]:45931/user/taskmanager_0) because the framework did not recognize it 2019-03-21 09:08:37,008 WARN akka.remote.transport.netty.NettyTransport - Remote connection to [null] failed with java.net.ConnectException: Connection refused: ip-10-97-36-43.tr-fr-nonprod.aws-int.thomsonreuters.com/10.97.36.43:38139 Appreciate if someone can help to have a look, thanks. -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/ |
Free forum by Nabble | Edit this page |