Job failing during restore in different cluster

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Job failing during restore in different cluster

shashank734
This post was updated on .
Hi, I am trying to move my job from one cluster to another cluster using
Savepoint. But It's failing while restoring on the new cluster.

In error, it's still trying to connect from some URL of old cluster. I have
checked all the properties and configuration. Is flink save the URL's while
savepoint ?

Flink Version :1.5.3 (Using same version while savepoint and restore)


2018-09-27 05:28:55,112 WARN
org.apache.flink.streaming.api.operators.BackendRestorerProcedure  -
Exception while restoring keyed state backend for
CoStreamFlatMap_069308bcb6f685b62dae685c4647854e_(9/10) from alternative
(1/1), will retry while more alternatives are available.
java.io.IOException: Failed on local exception:
java.nio.channels.ClosedByInterruptException; Host Details : local host is:
"21.newcluster.co/10.0.3.1"; destination host is: "005.oldcluster":8020;
    at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:782)
    at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1558)
    at org.apache.hadoop.ipc.Client.call(Client.java:1498)
    at org.apache.hadoop.ipc.Client.call(Client.java:1398)
    at
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:233)
    at com.sun.proxy.$Proxy13.getBlockLocations(Unknown Source)
    at
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getBlockLocations(ClientNamenodeProtocolTranslatorPB.java:272)
    at sun.reflect.GeneratedMethodAccessor69.invoke(Unknown Source)
    at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:291)
    at
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:203)
    at
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:185)
    at com.sun.proxy.$Proxy14.getBlockLocations(Unknown Source)
    at
org.apache.hadoop.hdfs.DFSClient.callGetBlockLocations(DFSClient.java:1238)
    at
org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:1225)
    at
org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:1213)
    at
org.apache.hadoop.hdfs.DFSInputStream.fetchLocatedBlocksAndGetLastBlockLength(DFSInputStream.java:309)
    at
org.apache.hadoop.hdfs.DFSInputStream.openInfo(DFSInputStream.java:274)
    at org.apache.hadoop.hdfs.DFSInputStream.<init>(DFSInputStream.java:266)
    at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:1538)
    at
org.apache.hadoop.hdfs.DistributedFileSystem$4.doCall(DistributedFileSystem.java:331)
    at
org.apache.hadoop.hdfs.DistributedFileSystem$4.doCall(DistributedFileSystem.java:327)
    at
org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
    at
org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:327)
    at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:786)
    at
org.apache.flink.runtime.fs.hdfs.HadoopFileSystem.open(HadoopFileSystem.java:119)
    at
org.apache.flink.runtime.fs.hdfs.HadoopFileSystem.open(HadoopFileSystem.java:36)
    at
org.apache.flink.core.fs.SafetyNetWrapperFileSystem.open(SafetyNetWrapperFileSystem.java:80)
    at
org.apache.flink.runtime.state.filesystem.FileStateHandle.openInputStream(FileStateHandle.java:68)
    at
org.apache.flink.runtime.state.KeyGroupsStateHandle.openInputStream(KeyGroupsStateHandle.java:112)
    at
org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend$RocksDBFullRestoreOperation.restoreKeyGroupsInStateHandle(RocksDBKeyedStateBackend.java:569)
    at
org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend$RocksDBFullRestoreOperation.doRestore(RocksDBKeyedStateBackend.java:558)
    at
org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend.restore(RocksDBKeyedStateBackend.java:446)
    at
org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend.restore(RocksDBKeyedStateBackend.java:149)
    at
org.apache.flink.streaming.api.operators.BackendRestorerProcedure.attemptCreateAndRestore(BackendRestorerProcedure.java:151)
    at
org.apache.flink.streaming.api.operators.BackendRestorerProcedure.createAndRestore(BackendRestorerProcedure.java:123)
    at
org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.keyedStatedBackend(StreamTaskStateInitializerImpl.java:276)
    at
org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.streamOperatorStateContext(StreamTaskStateInitializerImpl.java:132)
    at
org.apache.flink.streaming.api.operators.AbstractStreamOperator.initializeState(AbstractStreamOperator.java:227)
    at
org.apache.flink.streaming.runtime.tasks.StreamTask.initializeState(StreamTask.java:730)
    at
org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:295)
    at org.apache.flink.runtime.taskmanager.Task.run(Task.java:712)
    at java.lang.Thread.run(Thread.java:748)
Caused by: java.nio.channels.ClosedByInterruptException
    at
java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:202)
    at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:659)
    at
org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:192)
    at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:531)
    at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:495)
    at
org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:650)
    at
org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:745)
    at org.apache.hadoop.ipc.Client$Connection.access$3200(Client.java:397)
    at org.apache.hadoop.ipc.Client.getConnection(Client.java:1620)
    at org.apache.hadoop.ipc.Client.call(Client.java:1451)
    ... 41 more



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re: Job failing during restore in different cluster

vino yang
Hi,

Which version of Flink do you use? Also, can you give more details about how you migrate your job?

Thanks, vino.

shashank734 <[hidden email]> 于2018年9月28日周五 下午10:21写道:
Hi, I am trying to move my job from one cluster to another cluster using
Savepoint. But It's failing while restoring on the new cluster.

In error, it's still trying to connect from some URL of old cluster. I have
checked all the properties and configuration. Is flink save the URL's while
savepoint ?


2018-09-27 05:28:55,112 WARN
org.apache.flink.streaming.api.operators.BackendRestorerProcedure  -
Exception while restoring keyed state backend for
CoStreamFlatMap_069308bcb6f685b62dae685c4647854e_(9/10) from alternative
(1/1), will retry while more alternatives are available.
java.io.IOException: Failed on local exception:
java.nio.channels.ClosedByInterruptException; Host Details : local host is:
"21.newcluster.co/10.0.3.1"; destination host is: "005.oldcluster":8020;
    at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:782)
    at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1558)
    at org.apache.hadoop.ipc.Client.call(Client.java:1498)
    at org.apache.hadoop.ipc.Client.call(Client.java:1398)
    at
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:233)
    at com.sun.proxy.$Proxy13.getBlockLocations(Unknown Source)
    at
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getBlockLocations(ClientNamenodeProtocolTranslatorPB.java:272)
    at sun.reflect.GeneratedMethodAccessor69.invoke(Unknown Source)
    at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:291)
    at
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:203)
    at
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:185)
    at com.sun.proxy.$Proxy14.getBlockLocations(Unknown Source)
    at
org.apache.hadoop.hdfs.DFSClient.callGetBlockLocations(DFSClient.java:1238)
    at
org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:1225)
    at
org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:1213)
    at
org.apache.hadoop.hdfs.DFSInputStream.fetchLocatedBlocksAndGetLastBlockLength(DFSInputStream.java:309)
    at
org.apache.hadoop.hdfs.DFSInputStream.openInfo(DFSInputStream.java:274)
    at org.apache.hadoop.hdfs.DFSInputStream.<init>(DFSInputStream.java:266)
    at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:1538)
    at
org.apache.hadoop.hdfs.DistributedFileSystem$4.doCall(DistributedFileSystem.java:331)
    at
org.apache.hadoop.hdfs.DistributedFileSystem$4.doCall(DistributedFileSystem.java:327)
    at
org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
    at
org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:327)
    at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:786)
    at
org.apache.flink.runtime.fs.hdfs.HadoopFileSystem.open(HadoopFileSystem.java:119)
    at
org.apache.flink.runtime.fs.hdfs.HadoopFileSystem.open(HadoopFileSystem.java:36)
    at
org.apache.flink.core.fs.SafetyNetWrapperFileSystem.open(SafetyNetWrapperFileSystem.java:80)
    at
org.apache.flink.runtime.state.filesystem.FileStateHandle.openInputStream(FileStateHandle.java:68)
    at
org.apache.flink.runtime.state.KeyGroupsStateHandle.openInputStream(KeyGroupsStateHandle.java:112)
    at
org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend$RocksDBFullRestoreOperation.restoreKeyGroupsInStateHandle(RocksDBKeyedStateBackend.java:569)
    at
org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend$RocksDBFullRestoreOperation.doRestore(RocksDBKeyedStateBackend.java:558)
    at
org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend.restore(RocksDBKeyedStateBackend.java:446)
    at
org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend.restore(RocksDBKeyedStateBackend.java:149)
    at
org.apache.flink.streaming.api.operators.BackendRestorerProcedure.attemptCreateAndRestore(BackendRestorerProcedure.java:151)
    at
org.apache.flink.streaming.api.operators.BackendRestorerProcedure.createAndRestore(BackendRestorerProcedure.java:123)
    at
org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.keyedStatedBackend(StreamTaskStateInitializerImpl.java:276)
    at
org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.streamOperatorStateContext(StreamTaskStateInitializerImpl.java:132)
    at
org.apache.flink.streaming.api.operators.AbstractStreamOperator.initializeState(AbstractStreamOperator.java:227)
    at
org.apache.flink.streaming.runtime.tasks.StreamTask.initializeState(StreamTask.java:730)
    at
org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:295)
    at org.apache.flink.runtime.taskmanager.Task.run(Task.java:712)
    at java.lang.Thread.run(Thread.java:748)
Caused by: java.nio.channels.ClosedByInterruptException
    at
java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:202)
    at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:659)
    at
org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:192)
    at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:531)
    at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:495)
    at
org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:650)
    at
org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:745)
    at org.apache.hadoop.ipc.Client$Connection.access$3200(Client.java:397)
    at org.apache.hadoop.ipc.Client.getConnection(Client.java:1620)
    at org.apache.hadoop.ipc.Client.call(Client.java:1451)
    ... 41 more



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/