StandAlone job on k8s fails with "Unknown method truncate" on restore

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

StandAlone job on k8s fails with "Unknown method truncate" on restore

Vishal Santoshi
The job uses a RolllingFileSink to push data to hdfs. Run an HA standalone cluster on k8s, 

* get the job running 
* kill the pod. 

The k8s deployment relaunches the pod but fails with

java.io.IOException: Missing data in tmp file: hdfs://nn-crunchy:8020/tmp/kafka-to-hdfs/ls_kraken_events/dt=2019-02-14/evt=ad_fill/.part-2-16.inprogress.449e8668-e886-4f89-b5f6-45ac68e25987


Unknown method truncate called on org.apache.hadoop.hdfs.protocol.ClientProtocol protocol.


The file does exist. We work with hadoop 2.6 , which does no have truncate. The previous version would see that "truncate" was not supported and drop a length file for the ,inprogress file and rename it to a valid part file.



Is this a known issue ?


Regards.




Reply | Threaded
Open this post in threaded view
|

Re: StandAlone job on k8s fails with "Unknown method truncate" on restore

Vishal Santoshi
And yes  cannot work with RollingFleSink for hadoop 2.6 release of 1.7.1 b'coz of this. 

java.lang.UnsupportedOperationException: Recoverable writers on Hadoop are only supported for HDFS and for Hadoop version 2.7 or newer
	at org.apache.flink.runtime.fs.hdfs.HadoopRecoverableWriter.<init>(HadoopRecoverableWriter.java:57)
	at org.apache.flink.runtime.fs.hdfs.HadoopFileSystem.createRecoverableWriter(HadoopFileSystem.java:202)
	at org.apache.flink.core.fs.SafetyNetWrapperFileSystem.createRecoverableWriter(SafetyNetWrapperFileSystem.java:69)
	at org.apache.flink.streaming.api.functions.sink.filesystem.Buckets.<init>(Buckets.java:112)

Any work around ?

On Thu, Feb 14, 2019 at 1:42 PM Vishal Santoshi <[hidden email]> wrote:
The job uses a RolllingFileSink to push data to hdfs. Run an HA standalone cluster on k8s, 

* get the job running 
* kill the pod. 

The k8s deployment relaunches the pod but fails with

java.io.IOException: Missing data in tmp file: hdfs://nn-crunchy:8020/tmp/kafka-to-hdfs/ls_kraken_events/dt=2019-02-14/evt=ad_fill/.part-2-16.inprogress.449e8668-e886-4f89-b5f6-45ac68e25987


Unknown method truncate called on org.apache.hadoop.hdfs.protocol.ClientProtocol protocol.


The file does exist. We work with hadoop 2.6 , which does no have truncate. The previous version would see that "truncate" was not supported and drop a length file for the ,inprogress file and rename it to a valid part file.



Is this a known issue ?


Regards.




Reply | Threaded
Open this post in threaded view
|

Re: StandAlone job on k8s fails with "Unknown method truncate" on restore

Yun Tang
Hi

When 'RollingSink' try to initialize state, it would first check current file system supported truncate method. If file system not supported, it would use another work-around solution, which means you should not meet the problem. Otherwise 'RollingSink' thought and found the reflection method of 'truncate' while the file system actually not support. You could try to open DEBUG level to see whether log below could  be printed:
Truncate not found. Will write a file with suffix '.valid-length' and prefix '_' to specify how many bytes in a bucket are valid.

However, from your second email, the more serious problem should be using 'Buckets' with Hadoop-2.6. From what I know the `RecoverableWriter` within 'Buckets' can only support Hadoop-2.7+ , I'm not sure whether existed work around solution.

Best
Yun Tang

From: Vishal Santoshi <[hidden email]>
Sent: Friday, February 15, 2019 3:43
To: user
Subject: Re: StandAlone job on k8s fails with "Unknown method truncate" on restore
 
And yes  cannot work with RollingFleSink for hadoop 2.6 release of 1.7.1 b'coz of this. 

java.lang.UnsupportedOperationException: Recoverable writers on Hadoop are only supported for HDFS and for Hadoop version 2.7 or newer
	at org.apache.flink.runtime.fs.hdfs.HadoopRecoverableWriter.<init>(HadoopRecoverableWriter.java:57)
	at org.apache.flink.runtime.fs.hdfs.HadoopFileSystem.createRecoverableWriter(HadoopFileSystem.java:202)
	at org.apache.flink.core.fs.SafetyNetWrapperFileSystem.createRecoverableWriter(SafetyNetWrapperFileSystem.java:69)
	at org.apache.flink.streaming.api.functions.sink.filesystem.Buckets.<init>(Buckets.java:112)

Any work around ?

On Thu, Feb 14, 2019 at 1:42 PM Vishal Santoshi <[hidden email]> wrote:
The job uses a RolllingFileSink to push data to hdfs. Run an HA standalone cluster on k8s, 

* get the job running 
* kill the pod. 

The k8s deployment relaunches the pod but fails with

java.io.IOException: Missing data in tmp file: hdfs://nn-crunchy:8020/tmp/kafka-to-hdfs/ls_kraken_events/dt=2019-02-14/evt=ad_fill/.part-2-16.inprogress.449e8668-e886-4f89-b5f6-45ac68e25987


Unknown method truncate called on org.apache.hadoop.hdfs.protocol.ClientProtocol protocol.


The file does exist. We work with hadoop 2.6 , which does no have truncate. The previous version would see that "truncate" was not supported and drop a length file for the ,inprogress file and rename it to a valid part file.



Is this a known issue ?


Regards.




Reply | Threaded
Open this post in threaded view
|

Re: StandAlone job on k8s fails with "Unknown method truncate" on restore

Vishal Santoshi
That log does not appear. It looks like we have egg and chicken issue. 

2019-02-15 16:49:15,045 DEBUG org.apache.hadoop.hdfs.DFSClient                              - Connecting to datanode 10.246.221.10:50010

2019-02-15 16:49:15,045 DEBUG org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient  - SASL client skipping handshake in unsecured configuration for 

addr = /10.246.221.10, datanodeId = DatanodeInfoWithStorage[10.246.221.10:50010,DS-c57a7667-f697-4f03-9fb1-532c5b82a9e8,DISK]

2019-02-15 16:49:15,072 DEBUG org.apache.flink.runtime.fs.hdfs.HadoopFsFactory              - Instantiating for file system scheme hdfs Hadoop File System org.apache.hadoop.hdfs.DistributedFileSystem

2019-02-15 16:49:15,072 DEBUG org.apache.hadoop.hdfs.BlockReaderLocal                       - dfs.client.use.legacy.blockreader.local = false

2019-02-15 16:49:15,072 DEBUG org.apache.hadoop.hdfs.BlockReaderLocal                       - dfs.client.read.shortcircuit = false

2019-02-15 16:49:15,072 DEBUG org.apache.hadoop.hdfs.BlockReaderLocal                       - dfs.client.domain.socket.data.traffic = false

2019-02-15 16:49:15,072 DEBUG org.apache.hadoop.hdfs.BlockReaderLocal                       - dfs.domain.socket.path = 

2019-02-15 16:49:15,076 DEBUG org.apache.hadoop.io.retry.RetryUtils                         - multipleLinearRandomRetry = null

2019-02-15 16:49:15,076 DEBUG org.apache.hadoop.ipc.Client                                  - getting client out of cache: org.apache.hadoop.ipc.Client@31920ade

2019-02-15 16:49:15,076 DEBUG org.apache.hadoop.hdfs.protocol.datatransfer.sasl.DataTransferSaslUtil  - DataTransferProtocol not using SaslPropertiesResolver, no QOP found in configuration for dfs.data.transfer.protection

2019-02-15 16:49:15,080 INFO  org.apache.flink.streaming.api.functions.sink.filesystem.Buckets  - Subtask 3 initializing its state (max part counter=58).

2019-02-15 16:49:15,081 DEBUG org.apache.flink.streaming.api.functions.sink.filesystem.Buckets  - Subtask 3 restoring: BucketState for bucketId=ls_kraken_events/dt=2019-02-14/evt=ad_fill and bucketPath=hdfs://nn-crunchy:8020/tmp/kafka-to-hdfs/ls_kraken_events/dt=2019-02-14/evt=ad_fill, has open part file created @ 1550247946437

2019-02-15 16:49:15,085 DEBUG org.apache.hadoop.ipc.Client                                  - IPC Client (1270836494) connection to nn-crunchy.bf2.tumblr.net/10.246.199.154:8020 from root sending #56

2019-02-15 16:49:15,188 DEBUG org.apache.hadoop.ipc.Client                                  - IPC Client (1270836494) connection to nn-crunchy.bf2.tumblr.net/10.246.199.154:8020 from root got value #56

2019-02-15 16:49:15,196 INFO  org.apache.flink.runtime.taskmanager.Task                     - Source: Custom Source -> (Sink: Unnamed, Process -> Timestamps/Watermarks) (4/4) (f73403ac4763c99e6a244cba3797f7e9) switched from RUNNING to FAILED.

java.io.IOException: Missing data in tmp file: hdfs://nn-crunchy:8020/tmp/kafka-to-hdfs/ls_kraken_events/dt=2019-02-14/evt=ad_fill/.part-3-32.inprogress.da2a75d1-0c83-47bc-9c83-950360c55c86

        at org.apache.flink.runtime.fs.hdfs.HadoopRecoverableFsDataOutputStream.<init>(HadoopRecoverableFsDataOutputStream.java:93)






I do see 


2019-02-15 16:47:33,582 INFO  org.apache.flink.runtime.taskexecutor.TaskManagerRunner       -  Current Hadoop/Kerberos user: root

2019-02-15 16:47:33,582 INFO  org.apache.flink.runtime.taskexecutor.TaskManagerRunner       -  JVM: OpenJDK 64-Bit Server VM - Oracle Corporation - 1.8/25.181-b13

2019-02-15 16:47:33,582 INFO  org.apache.flink.runtime.taskexecutor.TaskManagerRunner       -  Maximum heap size: 1204 MiBytes

2019-02-15 16:47:33,582 INFO  org.apache.flink.runtime.taskexecutor.TaskManagerRunner       -  JAVA_HOME: /docker-java-home

2019-02-15 16:47:33,585 INFO  org.apache.flink.runtime.taskexecutor.TaskManagerRunner       -  Hadoop version: 2.7.5



which has to be expected given that we are running the hadoop27flink 1.7.1 version. 



Does it make sense to go with a hadoop less version and inject the required jar files ?  Has that been done by anyone ? 






On Fri, Feb 15, 2019 at 2:33 AM Yun Tang <[hidden email]> wrote:
Hi

When 'RollingSink' try to initialize state, it would first check current file system supported truncate method. If file system not supported, it would use another work-around solution, which means you should not meet the problem. Otherwise 'RollingSink' thought and found the reflection method of 'truncate' while the file system actually not support. You could try to open DEBUG level to see whether log below could  be printed:
Truncate not found. Will write a file with suffix '.valid-length' and prefix '_' to specify how many bytes in a bucket are valid.

However, from your second email, the more serious problem should be using 'Buckets' with Hadoop-2.6. From what I know the `RecoverableWriter` within 'Buckets' can only support Hadoop-2.7+ , I'm not sure whether existed work around solution.

Best
Yun Tang

From: Vishal Santoshi <[hidden email]>
Sent: Friday, February 15, 2019 3:43
To: user
Subject: Re: StandAlone job on k8s fails with "Unknown method truncate" on restore
 
And yes  cannot work with RollingFleSink for hadoop 2.6 release of 1.7.1 b'coz of this. 

java.lang.UnsupportedOperationException: Recoverable writers on Hadoop are only supported for HDFS and for Hadoop version 2.7 or newer
	at org.apache.flink.runtime.fs.hdfs.HadoopRecoverableWriter.<init>(HadoopRecoverableWriter.java:57)
	at org.apache.flink.runtime.fs.hdfs.HadoopFileSystem.createRecoverableWriter(HadoopFileSystem.java:202)
	at org.apache.flink.core.fs.SafetyNetWrapperFileSystem.createRecoverableWriter(SafetyNetWrapperFileSystem.java:69)
	at org.apache.flink.streaming.api.functions.sink.filesystem.Buckets.<init>(Buckets.java:112)

Any work around ?

On Thu, Feb 14, 2019 at 1:42 PM Vishal Santoshi <[hidden email]> wrote:
The job uses a RolllingFileSink to push data to hdfs. Run an HA standalone cluster on k8s, 

* get the job running 
* kill the pod. 

The k8s deployment relaunches the pod but fails with

java.io.IOException: Missing data in tmp file: hdfs://nn-crunchy:8020/tmp/kafka-to-hdfs/ls_kraken_events/dt=2019-02-14/evt=ad_fill/.part-2-16.inprogress.449e8668-e886-4f89-b5f6-45ac68e25987


Unknown method truncate called on org.apache.hadoop.hdfs.protocol.ClientProtocol protocol.


The file does exist. We work with hadoop 2.6 , which does no have truncate. The previous version would see that "truncate" was not supported and drop a length file for the ,inprogress file and rename it to a valid part file.



Is this a known issue ?


Regards.




Reply | Threaded
Open this post in threaded view
|

Re: StandAlone job on k8s fails with "Unknown method truncate" on restore

Vishal Santoshi
Not sure,  but it seems this https://issues.apache.org/jira/browse/FLINK-10203 may be a connected issue.

On Fri, Feb 15, 2019 at 11:57 AM Vishal Santoshi <[hidden email]> wrote:
That log does not appear. It looks like we have egg and chicken issue. 

2019-02-15 16:49:15,045 DEBUG org.apache.hadoop.hdfs.DFSClient                              - Connecting to datanode 10.246.221.10:50010

2019-02-15 16:49:15,045 DEBUG org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient  - SASL client skipping handshake in unsecured configuration for 

addr = /10.246.221.10, datanodeId = DatanodeInfoWithStorage[10.246.221.10:50010,DS-c57a7667-f697-4f03-9fb1-532c5b82a9e8,DISK]

2019-02-15 16:49:15,072 DEBUG org.apache.flink.runtime.fs.hdfs.HadoopFsFactory              - Instantiating for file system scheme hdfs Hadoop File System org.apache.hadoop.hdfs.DistributedFileSystem

2019-02-15 16:49:15,072 DEBUG org.apache.hadoop.hdfs.BlockReaderLocal                       - dfs.client.use.legacy.blockreader.local = false

2019-02-15 16:49:15,072 DEBUG org.apache.hadoop.hdfs.BlockReaderLocal                       - dfs.client.read.shortcircuit = false

2019-02-15 16:49:15,072 DEBUG org.apache.hadoop.hdfs.BlockReaderLocal                       - dfs.client.domain.socket.data.traffic = false

2019-02-15 16:49:15,072 DEBUG org.apache.hadoop.hdfs.BlockReaderLocal                       - dfs.domain.socket.path = 

2019-02-15 16:49:15,076 DEBUG org.apache.hadoop.io.retry.RetryUtils                         - multipleLinearRandomRetry = null

2019-02-15 16:49:15,076 DEBUG org.apache.hadoop.ipc.Client                                  - getting client out of cache: org.apache.hadoop.ipc.Client@31920ade

2019-02-15 16:49:15,076 DEBUG org.apache.hadoop.hdfs.protocol.datatransfer.sasl.DataTransferSaslUtil  - DataTransferProtocol not using SaslPropertiesResolver, no QOP found in configuration for dfs.data.transfer.protection

2019-02-15 16:49:15,080 INFO  org.apache.flink.streaming.api.functions.sink.filesystem.Buckets  - Subtask 3 initializing its state (max part counter=58).

2019-02-15 16:49:15,081 DEBUG org.apache.flink.streaming.api.functions.sink.filesystem.Buckets  - Subtask 3 restoring: BucketState for bucketId=ls_kraken_events/dt=2019-02-14/evt=ad_fill and bucketPath=hdfs://nn-crunchy:8020/tmp/kafka-to-hdfs/ls_kraken_events/dt=2019-02-14/evt=ad_fill, has open part file created @ 1550247946437

2019-02-15 16:49:15,085 DEBUG org.apache.hadoop.ipc.Client                                  - IPC Client (1270836494) connection to nn-crunchy.bf2.tumblr.net/10.246.199.154:8020 from root sending #56

2019-02-15 16:49:15,188 DEBUG org.apache.hadoop.ipc.Client                                  - IPC Client (1270836494) connection to nn-crunchy.bf2.tumblr.net/10.246.199.154:8020 from root got value #56

2019-02-15 16:49:15,196 INFO  org.apache.flink.runtime.taskmanager.Task                     - Source: Custom Source -> (Sink: Unnamed, Process -> Timestamps/Watermarks) (4/4) (f73403ac4763c99e6a244cba3797f7e9) switched from RUNNING to FAILED.

java.io.IOException: Missing data in tmp file: hdfs://nn-crunchy:8020/tmp/kafka-to-hdfs/ls_kraken_events/dt=2019-02-14/evt=ad_fill/.part-3-32.inprogress.da2a75d1-0c83-47bc-9c83-950360c55c86

        at org.apache.flink.runtime.fs.hdfs.HadoopRecoverableFsDataOutputStream.<init>(HadoopRecoverableFsDataOutputStream.java:93)






I do see 


2019-02-15 16:47:33,582 INFO  org.apache.flink.runtime.taskexecutor.TaskManagerRunner       -  Current Hadoop/Kerberos user: root

2019-02-15 16:47:33,582 INFO  org.apache.flink.runtime.taskexecutor.TaskManagerRunner       -  JVM: OpenJDK 64-Bit Server VM - Oracle Corporation - 1.8/25.181-b13

2019-02-15 16:47:33,582 INFO  org.apache.flink.runtime.taskexecutor.TaskManagerRunner       -  Maximum heap size: 1204 MiBytes

2019-02-15 16:47:33,582 INFO  org.apache.flink.runtime.taskexecutor.TaskManagerRunner       -  JAVA_HOME: /docker-java-home

2019-02-15 16:47:33,585 INFO  org.apache.flink.runtime.taskexecutor.TaskManagerRunner       -  Hadoop version: 2.7.5



which has to be expected given that we are running the hadoop27flink 1.7.1 version. 



Does it make sense to go with a hadoop less version and inject the required jar files ?  Has that been done by anyone ? 






On Fri, Feb 15, 2019 at 2:33 AM Yun Tang <[hidden email]> wrote:
Hi

When 'RollingSink' try to initialize state, it would first check current file system supported truncate method. If file system not supported, it would use another work-around solution, which means you should not meet the problem. Otherwise 'RollingSink' thought and found the reflection method of 'truncate' while the file system actually not support. You could try to open DEBUG level to see whether log below could  be printed:
Truncate not found. Will write a file with suffix '.valid-length' and prefix '_' to specify how many bytes in a bucket are valid.

However, from your second email, the more serious problem should be using 'Buckets' with Hadoop-2.6. From what I know the `RecoverableWriter` within 'Buckets' can only support Hadoop-2.7+ , I'm not sure whether existed work around solution.

Best
Yun Tang

From: Vishal Santoshi <[hidden email]>
Sent: Friday, February 15, 2019 3:43
To: user
Subject: Re: StandAlone job on k8s fails with "Unknown method truncate" on restore
 
And yes  cannot work with RollingFleSink for hadoop 2.6 release of 1.7.1 b'coz of this. 

java.lang.UnsupportedOperationException: Recoverable writers on Hadoop are only supported for HDFS and for Hadoop version 2.7 or newer
	at org.apache.flink.runtime.fs.hdfs.HadoopRecoverableWriter.<init>(HadoopRecoverableWriter.java:57)
	at org.apache.flink.runtime.fs.hdfs.HadoopFileSystem.createRecoverableWriter(HadoopFileSystem.java:202)
	at org.apache.flink.core.fs.SafetyNetWrapperFileSystem.createRecoverableWriter(SafetyNetWrapperFileSystem.java:69)
	at org.apache.flink.streaming.api.functions.sink.filesystem.Buckets.<init>(Buckets.java:112)

Any work around ?

On Thu, Feb 14, 2019 at 1:42 PM Vishal Santoshi <[hidden email]> wrote:
The job uses a RolllingFileSink to push data to hdfs. Run an HA standalone cluster on k8s, 

* get the job running 
* kill the pod. 

The k8s deployment relaunches the pod but fails with

java.io.IOException: Missing data in tmp file: hdfs://nn-crunchy:8020/tmp/kafka-to-hdfs/ls_kraken_events/dt=2019-02-14/evt=ad_fill/.part-2-16.inprogress.449e8668-e886-4f89-b5f6-45ac68e25987


Unknown method truncate called on org.apache.hadoop.hdfs.protocol.ClientProtocol protocol.


The file does exist. We work with hadoop 2.6 , which does no have truncate. The previous version would see that "truncate" was not supported and drop a length file for the ,inprogress file and rename it to a valid part file.



Is this a known issue ?


Regards.




Reply | Threaded
Open this post in threaded view
|

Re: StandAlone job on k8s fails with "Unknown method truncate" on restore

Vishal Santoshi
Any one ? I am sure there are hadoop 2.6 integrations with 1.7.1 OR I am overlooking something...

On Fri, Feb 15, 2019 at 2:44 PM Vishal Santoshi <[hidden email]> wrote:
Not sure,  but it seems this https://issues.apache.org/jira/browse/FLINK-10203 may be a connected issue.

On Fri, Feb 15, 2019 at 11:57 AM Vishal Santoshi <[hidden email]> wrote:
That log does not appear. It looks like we have egg and chicken issue. 

2019-02-15 16:49:15,045 DEBUG org.apache.hadoop.hdfs.DFSClient                              - Connecting to datanode 10.246.221.10:50010

2019-02-15 16:49:15,045 DEBUG org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient  - SASL client skipping handshake in unsecured configuration for 

addr = /10.246.221.10, datanodeId = DatanodeInfoWithStorage[10.246.221.10:50010,DS-c57a7667-f697-4f03-9fb1-532c5b82a9e8,DISK]

2019-02-15 16:49:15,072 DEBUG org.apache.flink.runtime.fs.hdfs.HadoopFsFactory              - Instantiating for file system scheme hdfs Hadoop File System org.apache.hadoop.hdfs.DistributedFileSystem

2019-02-15 16:49:15,072 DEBUG org.apache.hadoop.hdfs.BlockReaderLocal                       - dfs.client.use.legacy.blockreader.local = false

2019-02-15 16:49:15,072 DEBUG org.apache.hadoop.hdfs.BlockReaderLocal                       - dfs.client.read.shortcircuit = false

2019-02-15 16:49:15,072 DEBUG org.apache.hadoop.hdfs.BlockReaderLocal                       - dfs.client.domain.socket.data.traffic = false

2019-02-15 16:49:15,072 DEBUG org.apache.hadoop.hdfs.BlockReaderLocal                       - dfs.domain.socket.path = 

2019-02-15 16:49:15,076 DEBUG org.apache.hadoop.io.retry.RetryUtils                         - multipleLinearRandomRetry = null

2019-02-15 16:49:15,076 DEBUG org.apache.hadoop.ipc.Client                                  - getting client out of cache: org.apache.hadoop.ipc.Client@31920ade

2019-02-15 16:49:15,076 DEBUG org.apache.hadoop.hdfs.protocol.datatransfer.sasl.DataTransferSaslUtil  - DataTransferProtocol not using SaslPropertiesResolver, no QOP found in configuration for dfs.data.transfer.protection

2019-02-15 16:49:15,080 INFO  org.apache.flink.streaming.api.functions.sink.filesystem.Buckets  - Subtask 3 initializing its state (max part counter=58).

2019-02-15 16:49:15,081 DEBUG org.apache.flink.streaming.api.functions.sink.filesystem.Buckets  - Subtask 3 restoring: BucketState for bucketId=ls_kraken_events/dt=2019-02-14/evt=ad_fill and bucketPath=hdfs://nn-crunchy:8020/tmp/kafka-to-hdfs/ls_kraken_events/dt=2019-02-14/evt=ad_fill, has open part file created @ 1550247946437

2019-02-15 16:49:15,085 DEBUG org.apache.hadoop.ipc.Client                                  - IPC Client (1270836494) connection to nn-crunchy.bf2.tumblr.net/10.246.199.154:8020 from root sending #56

2019-02-15 16:49:15,188 DEBUG org.apache.hadoop.ipc.Client                                  - IPC Client (1270836494) connection to nn-crunchy.bf2.tumblr.net/10.246.199.154:8020 from root got value #56

2019-02-15 16:49:15,196 INFO  org.apache.flink.runtime.taskmanager.Task                     - Source: Custom Source -> (Sink: Unnamed, Process -> Timestamps/Watermarks) (4/4) (f73403ac4763c99e6a244cba3797f7e9) switched from RUNNING to FAILED.

java.io.IOException: Missing data in tmp file: hdfs://nn-crunchy:8020/tmp/kafka-to-hdfs/ls_kraken_events/dt=2019-02-14/evt=ad_fill/.part-3-32.inprogress.da2a75d1-0c83-47bc-9c83-950360c55c86

        at org.apache.flink.runtime.fs.hdfs.HadoopRecoverableFsDataOutputStream.<init>(HadoopRecoverableFsDataOutputStream.java:93)






I do see 


2019-02-15 16:47:33,582 INFO  org.apache.flink.runtime.taskexecutor.TaskManagerRunner       -  Current Hadoop/Kerberos user: root

2019-02-15 16:47:33,582 INFO  org.apache.flink.runtime.taskexecutor.TaskManagerRunner       -  JVM: OpenJDK 64-Bit Server VM - Oracle Corporation - 1.8/25.181-b13

2019-02-15 16:47:33,582 INFO  org.apache.flink.runtime.taskexecutor.TaskManagerRunner       -  Maximum heap size: 1204 MiBytes

2019-02-15 16:47:33,582 INFO  org.apache.flink.runtime.taskexecutor.TaskManagerRunner       -  JAVA_HOME: /docker-java-home

2019-02-15 16:47:33,585 INFO  org.apache.flink.runtime.taskexecutor.TaskManagerRunner       -  Hadoop version: 2.7.5



which has to be expected given that we are running the hadoop27flink 1.7.1 version. 



Does it make sense to go with a hadoop less version and inject the required jar files ?  Has that been done by anyone ? 






On Fri, Feb 15, 2019 at 2:33 AM Yun Tang <[hidden email]> wrote:
Hi

When 'RollingSink' try to initialize state, it would first check current file system supported truncate method. If file system not supported, it would use another work-around solution, which means you should not meet the problem. Otherwise 'RollingSink' thought and found the reflection method of 'truncate' while the file system actually not support. You could try to open DEBUG level to see whether log below could  be printed:
Truncate not found. Will write a file with suffix '.valid-length' and prefix '_' to specify how many bytes in a bucket are valid.

However, from your second email, the more serious problem should be using 'Buckets' with Hadoop-2.6. From what I know the `RecoverableWriter` within 'Buckets' can only support Hadoop-2.7+ , I'm not sure whether existed work around solution.

Best
Yun Tang

From: Vishal Santoshi <[hidden email]>
Sent: Friday, February 15, 2019 3:43
To: user
Subject: Re: StandAlone job on k8s fails with "Unknown method truncate" on restore
 
And yes  cannot work with RollingFleSink for hadoop 2.6 release of 1.7.1 b'coz of this. 

java.lang.UnsupportedOperationException: Recoverable writers on Hadoop are only supported for HDFS and for Hadoop version 2.7 or newer
	at org.apache.flink.runtime.fs.hdfs.HadoopRecoverableWriter.<init>(HadoopRecoverableWriter.java:57)
	at org.apache.flink.runtime.fs.hdfs.HadoopFileSystem.createRecoverableWriter(HadoopFileSystem.java:202)
	at org.apache.flink.core.fs.SafetyNetWrapperFileSystem.createRecoverableWriter(SafetyNetWrapperFileSystem.java:69)
	at org.apache.flink.streaming.api.functions.sink.filesystem.Buckets.<init>(Buckets.java:112)

Any work around ?

On Thu, Feb 14, 2019 at 1:42 PM Vishal Santoshi <[hidden email]> wrote:
The job uses a RolllingFileSink to push data to hdfs. Run an HA standalone cluster on k8s, 

* get the job running 
* kill the pod. 

The k8s deployment relaunches the pod but fails with

java.io.IOException: Missing data in tmp file: hdfs://nn-crunchy:8020/tmp/kafka-to-hdfs/ls_kraken_events/dt=2019-02-14/evt=ad_fill/.part-2-16.inprogress.449e8668-e886-4f89-b5f6-45ac68e25987


Unknown method truncate called on org.apache.hadoop.hdfs.protocol.ClientProtocol protocol.


The file does exist. We work with hadoop 2.6 , which does no have truncate. The previous version would see that "truncate" was not supported and drop a length file for the ,inprogress file and rename it to a valid part file.



Is this a known issue ?


Regards.




Reply | Threaded
Open this post in threaded view
|

Re: StandAlone job on k8s fails with "Unknown method truncate" on restore

Vishal Santoshi
We reverted back to BucketingSink and that works as expected. In conclusion RollingFileSink needs hadoop 2.7 for for hadoop File System.

On Sat, Feb 23, 2019 at 2:30 PM Vishal Santoshi <[hidden email]> wrote:
Any one ? I am sure there are hadoop 2.6 integrations with 1.7.1 OR I am overlooking something...

On Fri, Feb 15, 2019 at 2:44 PM Vishal Santoshi <[hidden email]> wrote:
Not sure,  but it seems this https://issues.apache.org/jira/browse/FLINK-10203 may be a connected issue.

On Fri, Feb 15, 2019 at 11:57 AM Vishal Santoshi <[hidden email]> wrote:
That log does not appear. It looks like we have egg and chicken issue. 

2019-02-15 16:49:15,045 DEBUG org.apache.hadoop.hdfs.DFSClient                              - Connecting to datanode 10.246.221.10:50010

2019-02-15 16:49:15,045 DEBUG org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient  - SASL client skipping handshake in unsecured configuration for 

addr = /10.246.221.10, datanodeId = DatanodeInfoWithStorage[10.246.221.10:50010,DS-c57a7667-f697-4f03-9fb1-532c5b82a9e8,DISK]

2019-02-15 16:49:15,072 DEBUG org.apache.flink.runtime.fs.hdfs.HadoopFsFactory              - Instantiating for file system scheme hdfs Hadoop File System org.apache.hadoop.hdfs.DistributedFileSystem

2019-02-15 16:49:15,072 DEBUG org.apache.hadoop.hdfs.BlockReaderLocal                       - dfs.client.use.legacy.blockreader.local = false

2019-02-15 16:49:15,072 DEBUG org.apache.hadoop.hdfs.BlockReaderLocal                       - dfs.client.read.shortcircuit = false

2019-02-15 16:49:15,072 DEBUG org.apache.hadoop.hdfs.BlockReaderLocal                       - dfs.client.domain.socket.data.traffic = false

2019-02-15 16:49:15,072 DEBUG org.apache.hadoop.hdfs.BlockReaderLocal                       - dfs.domain.socket.path = 

2019-02-15 16:49:15,076 DEBUG org.apache.hadoop.io.retry.RetryUtils                         - multipleLinearRandomRetry = null

2019-02-15 16:49:15,076 DEBUG org.apache.hadoop.ipc.Client                                  - getting client out of cache: org.apache.hadoop.ipc.Client@31920ade

2019-02-15 16:49:15,076 DEBUG org.apache.hadoop.hdfs.protocol.datatransfer.sasl.DataTransferSaslUtil  - DataTransferProtocol not using SaslPropertiesResolver, no QOP found in configuration for dfs.data.transfer.protection

2019-02-15 16:49:15,080 INFO  org.apache.flink.streaming.api.functions.sink.filesystem.Buckets  - Subtask 3 initializing its state (max part counter=58).

2019-02-15 16:49:15,081 DEBUG org.apache.flink.streaming.api.functions.sink.filesystem.Buckets  - Subtask 3 restoring: BucketState for bucketId=ls_kraken_events/dt=2019-02-14/evt=ad_fill and bucketPath=hdfs://nn-crunchy:8020/tmp/kafka-to-hdfs/ls_kraken_events/dt=2019-02-14/evt=ad_fill, has open part file created @ 1550247946437

2019-02-15 16:49:15,085 DEBUG org.apache.hadoop.ipc.Client                                  - IPC Client (1270836494) connection to nn-crunchy.bf2.tumblr.net/10.246.199.154:8020 from root sending #56

2019-02-15 16:49:15,188 DEBUG org.apache.hadoop.ipc.Client                                  - IPC Client (1270836494) connection to nn-crunchy.bf2.tumblr.net/10.246.199.154:8020 from root got value #56

2019-02-15 16:49:15,196 INFO  org.apache.flink.runtime.taskmanager.Task                     - Source: Custom Source -> (Sink: Unnamed, Process -> Timestamps/Watermarks) (4/4) (f73403ac4763c99e6a244cba3797f7e9) switched from RUNNING to FAILED.

java.io.IOException: Missing data in tmp file: hdfs://nn-crunchy:8020/tmp/kafka-to-hdfs/ls_kraken_events/dt=2019-02-14/evt=ad_fill/.part-3-32.inprogress.da2a75d1-0c83-47bc-9c83-950360c55c86

        at org.apache.flink.runtime.fs.hdfs.HadoopRecoverableFsDataOutputStream.<init>(HadoopRecoverableFsDataOutputStream.java:93)






I do see 


2019-02-15 16:47:33,582 INFO  org.apache.flink.runtime.taskexecutor.TaskManagerRunner       -  Current Hadoop/Kerberos user: root

2019-02-15 16:47:33,582 INFO  org.apache.flink.runtime.taskexecutor.TaskManagerRunner       -  JVM: OpenJDK 64-Bit Server VM - Oracle Corporation - 1.8/25.181-b13

2019-02-15 16:47:33,582 INFO  org.apache.flink.runtime.taskexecutor.TaskManagerRunner       -  Maximum heap size: 1204 MiBytes

2019-02-15 16:47:33,582 INFO  org.apache.flink.runtime.taskexecutor.TaskManagerRunner       -  JAVA_HOME: /docker-java-home

2019-02-15 16:47:33,585 INFO  org.apache.flink.runtime.taskexecutor.TaskManagerRunner       -  Hadoop version: 2.7.5



which has to be expected given that we are running the hadoop27flink 1.7.1 version. 



Does it make sense to go with a hadoop less version and inject the required jar files ?  Has that been done by anyone ? 






On Fri, Feb 15, 2019 at 2:33 AM Yun Tang <[hidden email]> wrote:
Hi

When 'RollingSink' try to initialize state, it would first check current file system supported truncate method. If file system not supported, it would use another work-around solution, which means you should not meet the problem. Otherwise 'RollingSink' thought and found the reflection method of 'truncate' while the file system actually not support. You could try to open DEBUG level to see whether log below could  be printed:
Truncate not found. Will write a file with suffix '.valid-length' and prefix '_' to specify how many bytes in a bucket are valid.

However, from your second email, the more serious problem should be using 'Buckets' with Hadoop-2.6. From what I know the `RecoverableWriter` within 'Buckets' can only support Hadoop-2.7+ , I'm not sure whether existed work around solution.

Best
Yun Tang

From: Vishal Santoshi <[hidden email]>
Sent: Friday, February 15, 2019 3:43
To: user
Subject: Re: StandAlone job on k8s fails with "Unknown method truncate" on restore
 
And yes  cannot work with RollingFleSink for hadoop 2.6 release of 1.7.1 b'coz of this. 

java.lang.UnsupportedOperationException: Recoverable writers on Hadoop are only supported for HDFS and for Hadoop version 2.7 or newer
	at org.apache.flink.runtime.fs.hdfs.HadoopRecoverableWriter.<init>(HadoopRecoverableWriter.java:57)
	at org.apache.flink.runtime.fs.hdfs.HadoopFileSystem.createRecoverableWriter(HadoopFileSystem.java:202)
	at org.apache.flink.core.fs.SafetyNetWrapperFileSystem.createRecoverableWriter(SafetyNetWrapperFileSystem.java:69)
	at org.apache.flink.streaming.api.functions.sink.filesystem.Buckets.<init>(Buckets.java:112)

Any work around ?

On Thu, Feb 14, 2019 at 1:42 PM Vishal Santoshi <[hidden email]> wrote:
The job uses a RolllingFileSink to push data to hdfs. Run an HA standalone cluster on k8s, 

* get the job running 
* kill the pod. 

The k8s deployment relaunches the pod but fails with

java.io.IOException: Missing data in tmp file: hdfs://nn-crunchy:8020/tmp/kafka-to-hdfs/ls_kraken_events/dt=2019-02-14/evt=ad_fill/.part-2-16.inprogress.449e8668-e886-4f89-b5f6-45ac68e25987


Unknown method truncate called on org.apache.hadoop.hdfs.protocol.ClientProtocol protocol.


The file does exist. We work with hadoop 2.6 , which does no have truncate. The previous version would see that "truncate" was not supported and drop a length file for the ,inprogress file and rename it to a valid part file.



Is this a known issue ?


Regards.