(DEPRECATED) Apache Flink User Mailing List archive.

StandAlone job on k8s fails with "Unknown method truncate" on restore

Classic

List

Threaded

7 messages Options

Vishal Santoshi

StandAlone job on k8s fails with "Unknown method truncate" on restore

The job uses a RolllingFileSink to push data to hdfs. Run an HA standalone cluster on k8s,

* get the job running

* kill the pod.

The k8s deployment relaunches the pod but fails with

java.io.IOException: Missing data in tmp file: hdfs://nn-crunchy:8020/tmp/kafka-to-hdfs/ls_kraken_events/dt=2019-02-14/evt=ad_fill/.part-2-16.inprogress.449e8668-e886-4f89-b5f6-45ac68e25987

Unknown method truncate called on org.apache.hadoop.hdfs.protocol.ClientProtocol protocol.

The file does exist. We work with hadoop 2.6 , which does no have truncate. The previous version would see that "truncate" was not supported and drop a length file for the ,inprogress file and rename it to a valid part file.

Is this a known issue ?

Regards.

Vishal Santoshi

Re: StandAlone job on k8s fails with "Unknown method truncate" on restore

And yes cannot work with RollingFleSink for hadoop 2.6 release of 1.7.1 b'coz of this.

java.lang.UnsupportedOperationException: Recoverable writers on Hadoop are only supported for HDFS and for Hadoop version 2.7 or newer
	at org.apache.flink.runtime.fs.hdfs.HadoopRecoverableWriter.<init>(HadoopRecoverableWriter.java:57)
	at org.apache.flink.runtime.fs.hdfs.HadoopFileSystem.createRecoverableWriter(HadoopFileSystem.java:202)
	at org.apache.flink.core.fs.SafetyNetWrapperFileSystem.createRecoverableWriter(SafetyNetWrapperFileSystem.java:69)
	at org.apache.flink.streaming.api.functions.sink.filesystem.Buckets.<init>(Buckets.java:112)

Any work around ?

On Thu, Feb 14, 2019 at 1:42 PM Vishal Santoshi <[hidden email]> wrote:

The job uses a RolllingFileSink to push data to hdfs. Run an HA standalone cluster on k8s,

* get the job running
* kill the pod.

The k8s deployment relaunches the pod but fails with

java.io.IOException: Missing data in tmp file: hdfs://nn-crunchy:8020/tmp/kafka-to-hdfs/ls_kraken_events/dt=2019-02-14/evt=ad_fill/.part-2-16.inprogress.449e8668-e886-4f89-b5f6-45ac68e25987

Unknown method truncate called on org.apache.hadoop.hdfs.protocol.ClientProtocol protocol.

The file does exist. We work with hadoop 2.6 , which does no have truncate. The previous version would see that "truncate" was not supported and drop a length file for the ,inprogress file and rename it to a valid part file.

Is this a known issue ?

Regards.

Yun Tang

Re: StandAlone job on k8s fails with "Unknown method truncate" on restore

When 'RollingSink' try to initialize state, it would first check current file system supported truncate method. If file system not supported, it would use another work-around solution, which means you should not meet the problem. Otherwise 'RollingSink' thought and found the reflection method of 'truncate' while the file system actually not support. You could try to open DEBUG level to see whether log below could be printed:

Truncate not found. Will write a file with suffix '.valid-length' and prefix '_' to specify how many bytes in a bucket are valid.

However, from your second email, the more serious problem should be using 'Buckets' with Hadoop-2.6. From what I know the `RecoverableWriter` within 'Buckets' can only support Hadoop-2.7+ , I'm not sure whether existed work around solution.

Best

Yun Tang

From: Vishal Santoshi <[hidden email]>
Sent: Friday, February 15, 2019 3:43
To: user
Subject: Re: StandAlone job on k8s fails with "Unknown method truncate" on restore

And yes cannot work with RollingFleSink for hadoop 2.6 release of 1.7.1 b'coz of this.

java.lang.UnsupportedOperationException: Recoverable writers on Hadoop are only supported for HDFS and for Hadoop version 2.7 or newer
	at org.apache.flink.runtime.fs.hdfs.HadoopRecoverableWriter.<init>(HadoopRecoverableWriter.java:57)
	at org.apache.flink.runtime.fs.hdfs.HadoopFileSystem.createRecoverableWriter(HadoopFileSystem.java:202)
	at org.apache.flink.core.fs.SafetyNetWrapperFileSystem.createRecoverableWriter(SafetyNetWrapperFileSystem.java:69)
	at org.apache.flink.streaming.api.functions.sink.filesystem.Buckets.<init>(Buckets.java:112)

Any work around ?

On Thu, Feb 14, 2019 at 1:42 PM Vishal Santoshi <[hidden email]> wrote:

The job uses a RolllingFileSink to push data to hdfs. Run an HA standalone cluster on k8s,

* get the job running

* kill the pod.

The k8s deployment relaunches the pod but fails with

java.io.IOException: Missing data in tmp file: hdfs://nn-crunchy:8020/tmp/kafka-to-hdfs/ls_kraken_events/dt=2019-02-14/evt=ad_fill/.part-2-16.inprogress.449e8668-e886-4f89-b5f6-45ac68e25987

Unknown method truncate called on org.apache.hadoop.hdfs.protocol.ClientProtocol protocol.

The file does exist. We work with hadoop 2.6 , which does no have truncate. The previous version would see that "truncate" was not supported and drop a length file for the ,inprogress file and rename it to a valid part file.

Is this a known issue ?

Regards.

Vishal Santoshi

Re: StandAlone job on k8s fails with "Unknown method truncate" on restore

That log does not appear. It looks like we have egg and chicken issue.

2019-02-15 16:49:15,045 DEBUG org.apache.hadoop.hdfs.DFSClient - Connecting to datanode 10.246.221.10:50010

2019-02-15 16:49:15,045 DEBUG org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient - SASL client skipping handshake in unsecured configuration for

addr = /10.246.221.10, datanodeId = DatanodeInfoWithStorage[10.246.221.10:50010,DS-c57a7667-f697-4f03-9fb1-532c5b82a9e8,DISK]

2019-02-15 16:49:15,072 DEBUG org.apache.flink.runtime.fs.hdfs.HadoopFsFactory - Instantiating for file system scheme hdfs Hadoop File System org.apache.hadoop.hdfs.DistributedFileSystem

2019-02-15 16:49:15,072 DEBUG org.apache.hadoop.hdfs.BlockReaderLocal - dfs.client.use.legacy.blockreader.local = false

2019-02-15 16:49:15,072 DEBUG org.apache.hadoop.hdfs.BlockReaderLocal - dfs.client.read.shortcircuit = false

2019-02-15 16:49:15,072 DEBUG org.apache.hadoop.hdfs.BlockReaderLocal - dfs.client.domain.socket.data.traffic = false

2019-02-15 16:49:15,072 DEBUG org.apache.hadoop.hdfs.BlockReaderLocal - dfs.domain.socket.path =

2019-02-15 16:49:15,076 DEBUG org.apache.hadoop.io.retry.RetryUtils - multipleLinearRandomRetry = null

2019-02-15 16:49:15,076 DEBUG org.apache.hadoop.ipc.Client - getting client out of cache: org.apache.hadoop.ipc.Client@31920ade

2019-02-15 16:49:15,076 DEBUG org.apache.hadoop.hdfs.protocol.datatransfer.sasl.DataTransferSaslUtil - DataTransferProtocol not using SaslPropertiesResolver, no QOP found in configuration for dfs.data.transfer.protection

2019-02-15 16:49:15,080 INFO org.apache.flink.streaming.api.functions.sink.filesystem.Buckets - Subtask 3 initializing its state (max part counter=58).

2019-02-15 16:49:15,081 DEBUG org.apache.flink.streaming.api.functions.sink.filesystem.Buckets - Subtask 3 restoring: BucketState for bucketId=ls_kraken_events/dt=2019-02-14/evt=ad_fill and bucketPath=hdfs://nn-crunchy:8020/tmp/kafka-to-hdfs/ls_kraken_events/dt=2019-02-14/evt=ad_fill, has open part file created @ 1550247946437

2019-02-15 16:49:15,085 DEBUG org.apache.hadoop.ipc.Client - IPC Client (1270836494) connection to nn-crunchy.bf2.tumblr.net/10.246.199.154:8020 from root sending #56

2019-02-15 16:49:15,188 DEBUG org.apache.hadoop.ipc.Client - IPC Client (1270836494) connection to nn-crunchy.bf2.tumblr.net/10.246.199.154:8020 from root got value #56

2019-02-15 16:49:15,196 INFO org.apache.flink.runtime.taskmanager.Task - Source: Custom Source -> (Sink: Unnamed, Process -> Timestamps/Watermarks) (4/4) (f73403ac4763c99e6a244cba3797f7e9) switched from RUNNING to FAILED.

java.io.IOException: Missing data in tmp file: hdfs://nn-crunchy:8020/tmp/kafka-to-hdfs/ls_kraken_events/dt=2019-02-14/evt=ad_fill/.part-3-32.inprogress.da2a75d1-0c83-47bc-9c83-950360c55c86

at org.apache.flink.runtime.fs.hdfs.HadoopRecoverableFsDataOutputStream.<init>(HadoopRecoverableFsDataOutputStream.java:93)

I do see

2019-02-15 16:47:33,582 INFO org.apache.flink.runtime.taskexecutor.TaskManagerRunner - Current Hadoop/Kerberos user: root

2019-02-15 16:47:33,582 INFO org.apache.flink.runtime.taskexecutor.TaskManagerRunner - JVM: OpenJDK 64-Bit Server VM - Oracle Corporation - 1.8/25.181-b13

2019-02-15 16:47:33,582 INFO org.apache.flink.runtime.taskexecutor.TaskManagerRunner - Maximum heap size: 1204 MiBytes

2019-02-15 16:47:33,582 INFO org.apache.flink.runtime.taskexecutor.TaskManagerRunner - JAVA_HOME: /docker-java-home

2019-02-15 16:47:33,585 INFO org.apache.flink.runtime.taskexecutor.TaskManagerRunner - Hadoop version: 2.7.5

which has to be expected given that we are running the hadoop27flink 1.7.1 version.

Does it make sense to go with a hadoop less version and inject the required jar files ? Has that been done by anyone ?

On Fri, Feb 15, 2019 at 2:33 AM Yun Tang <[hidden email]> wrote:

Hi

When 'RollingSink' try to initialize state, it would first check current file system supported truncate method. If file system not supported, it would use another work-around solution, which means you should not meet the problem. Otherwise 'RollingSink' thought and found the reflection method of 'truncate' while the file system actually not support. You could try to open DEBUG level to see whether log below could be printed:

Truncate not found. Will write a file with suffix '.valid-length' and prefix '_' to specify how many bytes in a bucket are valid.

However, from your second email, the more serious problem should be using 'Buckets' with Hadoop-2.6. From what I know the `RecoverableWriter` within 'Buckets' can only support Hadoop-2.7+ , I'm not sure whether existed work around solution.

Best

Yun Tang

From: Vishal Santoshi <[hidden email]>
Sent: Friday, February 15, 2019 3:43
To: user
Subject: Re: StandAlone job on k8s fails with "Unknown method truncate" on restore
And yes cannot work with RollingFleSink for hadoop 2.6 release of 1.7.1 b'coz of this.
java.lang.UnsupportedOperationException: Recoverable writers on Hadoop are only supported for HDFS and for Hadoop version 2.7 or newer
	at org.apache.flink.runtime.fs.hdfs.HadoopRecoverableWriter.<init>(HadoopRecoverableWriter.java:57)
	at org.apache.flink.runtime.fs.hdfs.HadoopFileSystem.createRecoverableWriter(HadoopFileSystem.java:202)
	at org.apache.flink.core.fs.SafetyNetWrapperFileSystem.createRecoverableWriter(SafetyNetWrapperFileSystem.java:69)
	at org.apache.flink.streaming.api.functions.sink.filesystem.Buckets.<init>(Buckets.java:112)
Any work around ?
On Thu, Feb 14, 2019 at 1:42 PM Vishal Santoshi <[hidden email]> wrote:

The job uses a RolllingFileSink to push data to hdfs. Run an HA standalone cluster on k8s,

* get the job running

* kill the pod.

The k8s deployment relaunches the pod but fails with

java.io.IOException: Missing data in tmp file: hdfs://nn-crunchy:8020/tmp/kafka-to-hdfs/ls_kraken_events/dt=2019-02-14/evt=ad_fill/.part-2-16.inprogress.449e8668-e886-4f89-b5f6-45ac68e25987

Unknown method truncate called on org.apache.hadoop.hdfs.protocol.ClientProtocol protocol.

The file does exist. We work with hadoop 2.6 , which does no have truncate. The previous version would see that "truncate" was not supported and drop a length file for the ,inprogress file and rename it to a valid part file.

Is this a known issue ?

Regards.

Vishal Santoshi

Re: StandAlone job on k8s fails with "Unknown method truncate" on restore

Not sure, but it seems this https://issues.apache.org/jira/browse/FLINK-10203 may be a connected issue.

On Fri, Feb 15, 2019 at 11:57 AM Vishal Santoshi <[hidden email]> wrote:

That log does not appear. It looks like we have egg and chicken issue.

2019-02-15 16:49:15,045 DEBUG org.apache.hadoop.hdfs.DFSClient - Connecting to datanode 10.246.221.10:50010

2019-02-15 16:49:15,045 DEBUG org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient - SASL client skipping handshake in unsecured configuration for

addr = /10.246.221.10, datanodeId = DatanodeInfoWithStorage[10.246.221.10:50010,DS-c57a7667-f697-4f03-9fb1-532c5b82a9e8,DISK]

2019-02-15 16:49:15,072 DEBUG org.apache.flink.runtime.fs.hdfs.HadoopFsFactory - Instantiating for file system scheme hdfs Hadoop File System org.apache.hadoop.hdfs.DistributedFileSystem

2019-02-15 16:49:15,072 DEBUG org.apache.hadoop.hdfs.BlockReaderLocal - dfs.client.use.legacy.blockreader.local = false

2019-02-15 16:49:15,072 DEBUG org.apache.hadoop.hdfs.BlockReaderLocal - dfs.client.read.shortcircuit = false

2019-02-15 16:49:15,072 DEBUG org.apache.hadoop.hdfs.BlockReaderLocal - dfs.client.domain.socket.data.traffic = false

2019-02-15 16:49:15,072 DEBUG org.apache.hadoop.hdfs.BlockReaderLocal - dfs.domain.socket.path =

2019-02-15 16:49:15,076 DEBUG org.apache.hadoop.io.retry.RetryUtils - multipleLinearRandomRetry = null

2019-02-15 16:49:15,076 DEBUG org.apache.hadoop.ipc.Client - getting client out of cache: org.apache.hadoop.ipc.Client@31920ade

2019-02-15 16:49:15,076 DEBUG org.apache.hadoop.hdfs.protocol.datatransfer.sasl.DataTransferSaslUtil - DataTransferProtocol not using SaslPropertiesResolver, no QOP found in configuration for dfs.data.transfer.protection

2019-02-15 16:49:15,080 INFO org.apache.flink.streaming.api.functions.sink.filesystem.Buckets - Subtask 3 initializing its state (max part counter=58).

2019-02-15 16:49:15,081 DEBUG org.apache.flink.streaming.api.functions.sink.filesystem.Buckets - Subtask 3 restoring: BucketState for bucketId=ls_kraken_events/dt=2019-02-14/evt=ad_fill and bucketPath=hdfs://nn-crunchy:8020/tmp/kafka-to-hdfs/ls_kraken_events/dt=2019-02-14/evt=ad_fill, has open part file created @ 1550247946437

2019-02-15 16:49:15,085 DEBUG org.apache.hadoop.ipc.Client - IPC Client (1270836494) connection to nn-crunchy.bf2.tumblr.net/10.246.199.154:8020 from root sending #56

2019-02-15 16:49:15,188 DEBUG org.apache.hadoop.ipc.Client - IPC Client (1270836494) connection to nn-crunchy.bf2.tumblr.net/10.246.199.154:8020 from root got value #56

2019-02-15 16:49:15,196 INFO org.apache.flink.runtime.taskmanager.Task - Source: Custom Source -> (Sink: Unnamed, Process -> Timestamps/Watermarks) (4/4) (f73403ac4763c99e6a244cba3797f7e9) switched from RUNNING to FAILED.

java.io.IOException: Missing data in tmp file: hdfs://nn-crunchy:8020/tmp/kafka-to-hdfs/ls_kraken_events/dt=2019-02-14/evt=ad_fill/.part-3-32.inprogress.da2a75d1-0c83-47bc-9c83-950360c55c86

at org.apache.flink.runtime.fs.hdfs.HadoopRecoverableFsDataOutputStream.<init>(HadoopRecoverableFsDataOutputStream.java:93)

I do see

2019-02-15 16:47:33,582 INFO org.apache.flink.runtime.taskexecutor.TaskManagerRunner - Current Hadoop/Kerberos user: root
2019-02-15 16:47:33,582 INFO org.apache.flink.runtime.taskexecutor.TaskManagerRunner - JVM: OpenJDK 64-Bit Server VM - Oracle Corporation - 1.8/25.181-b13
2019-02-15 16:47:33,582 INFO org.apache.flink.runtime.taskexecutor.TaskManagerRunner - Maximum heap size: 1204 MiBytes
2019-02-15 16:47:33,582 INFO org.apache.flink.runtime.taskexecutor.TaskManagerRunner - JAVA_HOME: /docker-java-home

2019-02-15 16:47:33,585 INFO org.apache.flink.runtime.taskexecutor.TaskManagerRunner - Hadoop version: 2.7.5

which has to be expected given that we are running the hadoop27flink 1.7.1 version.

Does it make sense to go with a hadoop less version and inject the required jar files ? Has that been done by anyone ?
On Fri, Feb 15, 2019 at 2:33 AM Yun Tang <[hidden email]> wrote:
Hi

When 'RollingSink' try to initialize state, it would first check current file system supported truncate method. If file system not supported, it would use another work-around solution, which means you should not meet the problem. Otherwise 'RollingSink' thought and found the reflection method of 'truncate' while the file system actually not support. You could try to open DEBUG level to see whether log below could be printed:

Truncate not found. Will write a file with suffix '.valid-length' and prefix '_' to specify how many bytes in a bucket are valid.

However, from your second email, the more serious problem should be using 'Buckets' with Hadoop-2.6. From what I know the `RecoverableWriter` within 'Buckets' can only support Hadoop-2.7+ , I'm not sure whether existed work around solution.

Best

Yun Tang

From: Vishal Santoshi <[hidden email]>
Sent: Friday, February 15, 2019 3:43
To: user
Subject: Re: StandAlone job on k8s fails with "Unknown method truncate" on restore
And yes cannot work with RollingFleSink for hadoop 2.6 release of 1.7.1 b'coz of this.
java.lang.UnsupportedOperationException: Recoverable writers on Hadoop are only supported for HDFS and for Hadoop version 2.7 or newer
	at org.apache.flink.runtime.fs.hdfs.HadoopRecoverableWriter.<init>(HadoopRecoverableWriter.java:57)
	at org.apache.flink.runtime.fs.hdfs.HadoopFileSystem.createRecoverableWriter(HadoopFileSystem.java:202)
	at org.apache.flink.core.fs.SafetyNetWrapperFileSystem.createRecoverableWriter(SafetyNetWrapperFileSystem.java:69)
	at org.apache.flink.streaming.api.functions.sink.filesystem.Buckets.<init>(Buckets.java:112)
Any work around ?
On Thu, Feb 14, 2019 at 1:42 PM Vishal Santoshi <[hidden email]> wrote:

The job uses a RolllingFileSink to push data to hdfs. Run an HA standalone cluster on k8s,

* get the job running

* kill the pod.

The k8s deployment relaunches the pod but fails with

java.io.IOException: Missing data in tmp file: hdfs://nn-crunchy:8020/tmp/kafka-to-hdfs/ls_kraken_events/dt=2019-02-14/evt=ad_fill/.part-2-16.inprogress.449e8668-e886-4f89-b5f6-45ac68e25987

Unknown method truncate called on org.apache.hadoop.hdfs.protocol.ClientProtocol protocol.

The file does exist. We work with hadoop 2.6 , which does no have truncate. The previous version would see that "truncate" was not supported and drop a length file for the ,inprogress file and rename it to a valid part file.

Is this a known issue ?

Regards.

Vishal Santoshi

Re: StandAlone job on k8s fails with "Unknown method truncate" on restore

Any one ? I am sure there are hadoop 2.6 integrations with 1.7.1 OR I am overlooking something...

On Fri, Feb 15, 2019 at 2:44 PM Vishal Santoshi <[hidden email]> wrote:

Not sure, but it seems this https://issues.apache.org/jira/browse/FLINK-10203 may be a connected issue.
On Fri, Feb 15, 2019 at 11:57 AM Vishal Santoshi <[hidden email]> wrote:
That log does not appear. It looks like we have egg and chicken issue.

2019-02-15 16:49:15,045 DEBUG org.apache.hadoop.hdfs.DFSClient - Connecting to datanode 10.246.221.10:50010

2019-02-15 16:49:15,045 DEBUG org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient - SASL client skipping handshake in unsecured configuration for

addr = /10.246.221.10, datanodeId = DatanodeInfoWithStorage[10.246.221.10:50010,DS-c57a7667-f697-4f03-9fb1-532c5b82a9e8,DISK]

2019-02-15 16:49:15,072 DEBUG org.apache.flink.runtime.fs.hdfs.HadoopFsFactory - Instantiating for file system scheme hdfs Hadoop File System org.apache.hadoop.hdfs.DistributedFileSystem

2019-02-15 16:49:15,072 DEBUG org.apache.hadoop.hdfs.BlockReaderLocal - dfs.client.use.legacy.blockreader.local = false

2019-02-15 16:49:15,072 DEBUG org.apache.hadoop.hdfs.BlockReaderLocal - dfs.client.read.shortcircuit = false

2019-02-15 16:49:15,072 DEBUG org.apache.hadoop.hdfs.BlockReaderLocal - dfs.client.domain.socket.data.traffic = false

2019-02-15 16:49:15,072 DEBUG org.apache.hadoop.hdfs.BlockReaderLocal - dfs.domain.socket.path =

2019-02-15 16:49:15,076 DEBUG org.apache.hadoop.io.retry.RetryUtils - multipleLinearRandomRetry = null

2019-02-15 16:49:15,076 DEBUG org.apache.hadoop.ipc.Client - getting client out of cache: org.apache.hadoop.ipc.Client@31920ade

2019-02-15 16:49:15,076 DEBUG org.apache.hadoop.hdfs.protocol.datatransfer.sasl.DataTransferSaslUtil - DataTransferProtocol not using SaslPropertiesResolver, no QOP found in configuration for dfs.data.transfer.protection

2019-02-15 16:49:15,080 INFO org.apache.flink.streaming.api.functions.sink.filesystem.Buckets - Subtask 3 initializing its state (max part counter=58).

2019-02-15 16:49:15,081 DEBUG org.apache.flink.streaming.api.functions.sink.filesystem.Buckets - Subtask 3 restoring: BucketState for bucketId=ls_kraken_events/dt=2019-02-14/evt=ad_fill and bucketPath=hdfs://nn-crunchy:8020/tmp/kafka-to-hdfs/ls_kraken_events/dt=2019-02-14/evt=ad_fill, has open part file created @ 1550247946437

2019-02-15 16:49:15,085 DEBUG org.apache.hadoop.ipc.Client - IPC Client (1270836494) connection to nn-crunchy.bf2.tumblr.net/10.246.199.154:8020 from root sending #56

2019-02-15 16:49:15,188 DEBUG org.apache.hadoop.ipc.Client - IPC Client (1270836494) connection to nn-crunchy.bf2.tumblr.net/10.246.199.154:8020 from root got value #56

2019-02-15 16:49:15,196 INFO org.apache.flink.runtime.taskmanager.Task - Source: Custom Source -> (Sink: Unnamed, Process -> Timestamps/Watermarks) (4/4) (f73403ac4763c99e6a244cba3797f7e9) switched from RUNNING to FAILED.

java.io.IOException: Missing data in tmp file: hdfs://nn-crunchy:8020/tmp/kafka-to-hdfs/ls_kraken_events/dt=2019-02-14/evt=ad_fill/.part-3-32.inprogress.da2a75d1-0c83-47bc-9c83-950360c55c86

at org.apache.flink.runtime.fs.hdfs.HadoopRecoverableFsDataOutputStream.<init>(HadoopRecoverableFsDataOutputStream.java:93)

I do see

2019-02-15 16:47:33,582 INFO org.apache.flink.runtime.taskexecutor.TaskManagerRunner - Current Hadoop/Kerberos user: root
2019-02-15 16:47:33,582 INFO org.apache.flink.runtime.taskexecutor.TaskManagerRunner - JVM: OpenJDK 64-Bit Server VM - Oracle Corporation - 1.8/25.181-b13
2019-02-15 16:47:33,582 INFO org.apache.flink.runtime.taskexecutor.TaskManagerRunner - Maximum heap size: 1204 MiBytes
2019-02-15 16:47:33,582 INFO org.apache.flink.runtime.taskexecutor.TaskManagerRunner - JAVA_HOME: /docker-java-home

2019-02-15 16:47:33,585 INFO org.apache.flink.runtime.taskexecutor.TaskManagerRunner - Hadoop version: 2.7.5

which has to be expected given that we are running the hadoop27flink 1.7.1 version.

Does it make sense to go with a hadoop less version and inject the required jar files ? Has that been done by anyone ?
On Fri, Feb 15, 2019 at 2:33 AM Yun Tang <[hidden email]> wrote:
Hi

When 'RollingSink' try to initialize state, it would first check current file system supported truncate method. If file system not supported, it would use another work-around solution, which means you should not meet the problem. Otherwise 'RollingSink' thought and found the reflection method of 'truncate' while the file system actually not support. You could try to open DEBUG level to see whether log below could be printed:

Truncate not found. Will write a file with suffix '.valid-length' and prefix '_' to specify how many bytes in a bucket are valid.

However, from your second email, the more serious problem should be using 'Buckets' with Hadoop-2.6. From what I know the `RecoverableWriter` within 'Buckets' can only support Hadoop-2.7+ , I'm not sure whether existed work around solution.

Best

Yun Tang

From: Vishal Santoshi <[hidden email]>
Sent: Friday, February 15, 2019 3:43
To: user
Subject: Re: StandAlone job on k8s fails with "Unknown method truncate" on restore
And yes cannot work with RollingFleSink for hadoop 2.6 release of 1.7.1 b'coz of this.
java.lang.UnsupportedOperationException: Recoverable writers on Hadoop are only supported for HDFS and for Hadoop version 2.7 or newer
	at org.apache.flink.runtime.fs.hdfs.HadoopRecoverableWriter.<init>(HadoopRecoverableWriter.java:57)
	at org.apache.flink.runtime.fs.hdfs.HadoopFileSystem.createRecoverableWriter(HadoopFileSystem.java:202)
	at org.apache.flink.core.fs.SafetyNetWrapperFileSystem.createRecoverableWriter(SafetyNetWrapperFileSystem.java:69)
	at org.apache.flink.streaming.api.functions.sink.filesystem.Buckets.<init>(Buckets.java:112)
Any work around ?
On Thu, Feb 14, 2019 at 1:42 PM Vishal Santoshi <[hidden email]> wrote:

The job uses a RolllingFileSink to push data to hdfs. Run an HA standalone cluster on k8s,

* get the job running

* kill the pod.

The k8s deployment relaunches the pod but fails with

java.io.IOException: Missing data in tmp file: hdfs://nn-crunchy:8020/tmp/kafka-to-hdfs/ls_kraken_events/dt=2019-02-14/evt=ad_fill/.part-2-16.inprogress.449e8668-e886-4f89-b5f6-45ac68e25987

Unknown method truncate called on org.apache.hadoop.hdfs.protocol.ClientProtocol protocol.

The file does exist. We work with hadoop 2.6 , which does no have truncate. The previous version would see that "truncate" was not supported and drop a length file for the ,inprogress file and rename it to a valid part file.

Is this a known issue ?

Regards.

Vishal Santoshi

Re: StandAlone job on k8s fails with "Unknown method truncate" on restore

We reverted back to BucketingSink and that works as expected. In conclusion RollingFileSink needs hadoop 2.7 for for hadoop File System.

On Sat, Feb 23, 2019 at 2:30 PM Vishal Santoshi <[hidden email]> wrote:

Any one ? I am sure there are hadoop 2.6 integrations with 1.7.1 OR I am overlooking something...
On Fri, Feb 15, 2019 at 2:44 PM Vishal Santoshi <[hidden email]> wrote:
Not sure, but it seems this https://issues.apache.org/jira/browse/FLINK-10203 may be a connected issue.
On Fri, Feb 15, 2019 at 11:57 AM Vishal Santoshi <[hidden email]> wrote:
That log does not appear. It looks like we have egg and chicken issue.

2019-02-15 16:49:15,045 DEBUG org.apache.hadoop.hdfs.DFSClient - Connecting to datanode 10.246.221.10:50010

2019-02-15 16:49:15,045 DEBUG org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient - SASL client skipping handshake in unsecured configuration for

addr = /10.246.221.10, datanodeId = DatanodeInfoWithStorage[10.246.221.10:50010,DS-c57a7667-f697-4f03-9fb1-532c5b82a9e8,DISK]

2019-02-15 16:49:15,072 DEBUG org.apache.flink.runtime.fs.hdfs.HadoopFsFactory - Instantiating for file system scheme hdfs Hadoop File System org.apache.hadoop.hdfs.DistributedFileSystem

2019-02-15 16:49:15,072 DEBUG org.apache.hadoop.hdfs.BlockReaderLocal - dfs.client.use.legacy.blockreader.local = false

2019-02-15 16:49:15,072 DEBUG org.apache.hadoop.hdfs.BlockReaderLocal - dfs.client.read.shortcircuit = false

2019-02-15 16:49:15,072 DEBUG org.apache.hadoop.hdfs.BlockReaderLocal - dfs.client.domain.socket.data.traffic = false

2019-02-15 16:49:15,072 DEBUG org.apache.hadoop.hdfs.BlockReaderLocal - dfs.domain.socket.path =

2019-02-15 16:49:15,076 DEBUG org.apache.hadoop.io.retry.RetryUtils - multipleLinearRandomRetry = null

2019-02-15 16:49:15,076 DEBUG org.apache.hadoop.ipc.Client - getting client out of cache: org.apache.hadoop.ipc.Client@31920ade

2019-02-15 16:49:15,076 DEBUG org.apache.hadoop.hdfs.protocol.datatransfer.sasl.DataTransferSaslUtil - DataTransferProtocol not using SaslPropertiesResolver, no QOP found in configuration for dfs.data.transfer.protection

2019-02-15 16:49:15,080 INFO org.apache.flink.streaming.api.functions.sink.filesystem.Buckets - Subtask 3 initializing its state (max part counter=58).

2019-02-15 16:49:15,081 DEBUG org.apache.flink.streaming.api.functions.sink.filesystem.Buckets - Subtask 3 restoring: BucketState for bucketId=ls_kraken_events/dt=2019-02-14/evt=ad_fill and bucketPath=hdfs://nn-crunchy:8020/tmp/kafka-to-hdfs/ls_kraken_events/dt=2019-02-14/evt=ad_fill, has open part file created @ 1550247946437

2019-02-15 16:49:15,085 DEBUG org.apache.hadoop.ipc.Client - IPC Client (1270836494) connection to nn-crunchy.bf2.tumblr.net/10.246.199.154:8020 from root sending #56

2019-02-15 16:49:15,188 DEBUG org.apache.hadoop.ipc.Client - IPC Client (1270836494) connection to nn-crunchy.bf2.tumblr.net/10.246.199.154:8020 from root got value #56

2019-02-15 16:49:15,196 INFO org.apache.flink.runtime.taskmanager.Task - Source: Custom Source -> (Sink: Unnamed, Process -> Timestamps/Watermarks) (4/4) (f73403ac4763c99e6a244cba3797f7e9) switched from RUNNING to FAILED.

java.io.IOException: Missing data in tmp file: hdfs://nn-crunchy:8020/tmp/kafka-to-hdfs/ls_kraken_events/dt=2019-02-14/evt=ad_fill/.part-3-32.inprogress.da2a75d1-0c83-47bc-9c83-950360c55c86

at org.apache.flink.runtime.fs.hdfs.HadoopRecoverableFsDataOutputStream.<init>(HadoopRecoverableFsDataOutputStream.java:93)

I do see

2019-02-15 16:47:33,582 INFO org.apache.flink.runtime.taskexecutor.TaskManagerRunner - Current Hadoop/Kerberos user: root
2019-02-15 16:47:33,582 INFO org.apache.flink.runtime.taskexecutor.TaskManagerRunner - JVM: OpenJDK 64-Bit Server VM - Oracle Corporation - 1.8/25.181-b13
2019-02-15 16:47:33,582 INFO org.apache.flink.runtime.taskexecutor.TaskManagerRunner - Maximum heap size: 1204 MiBytes
2019-02-15 16:47:33,582 INFO org.apache.flink.runtime.taskexecutor.TaskManagerRunner - JAVA_HOME: /docker-java-home

2019-02-15 16:47:33,585 INFO org.apache.flink.runtime.taskexecutor.TaskManagerRunner - Hadoop version: 2.7.5

which has to be expected given that we are running the hadoop27flink 1.7.1 version.

Does it make sense to go with a hadoop less version and inject the required jar files ? Has that been done by anyone ?
On Fri, Feb 15, 2019 at 2:33 AM Yun Tang <[hidden email]> wrote:
Hi

When 'RollingSink' try to initialize state, it would first check current file system supported truncate method. If file system not supported, it would use another work-around solution, which means you should not meet the problem. Otherwise 'RollingSink' thought and found the reflection method of 'truncate' while the file system actually not support. You could try to open DEBUG level to see whether log below could be printed:

Truncate not found. Will write a file with suffix '.valid-length' and prefix '_' to specify how many bytes in a bucket are valid.

However, from your second email, the more serious problem should be using 'Buckets' with Hadoop-2.6. From what I know the `RecoverableWriter` within 'Buckets' can only support Hadoop-2.7+ , I'm not sure whether existed work around solution.

Best

Yun Tang

From: Vishal Santoshi <[hidden email]>
Sent: Friday, February 15, 2019 3:43
To: user
Subject: Re: StandAlone job on k8s fails with "Unknown method truncate" on restore
And yes cannot work with RollingFleSink for hadoop 2.6 release of 1.7.1 b'coz of this.
java.lang.UnsupportedOperationException: Recoverable writers on Hadoop are only supported for HDFS and for Hadoop version 2.7 or newer
	at org.apache.flink.runtime.fs.hdfs.HadoopRecoverableWriter.<init>(HadoopRecoverableWriter.java:57)
	at org.apache.flink.runtime.fs.hdfs.HadoopFileSystem.createRecoverableWriter(HadoopFileSystem.java:202)
	at org.apache.flink.core.fs.SafetyNetWrapperFileSystem.createRecoverableWriter(SafetyNetWrapperFileSystem.java:69)
	at org.apache.flink.streaming.api.functions.sink.filesystem.Buckets.<init>(Buckets.java:112)
Any work around ?
On Thu, Feb 14, 2019 at 1:42 PM Vishal Santoshi <[hidden email]> wrote:

The job uses a RolllingFileSink to push data to hdfs. Run an HA standalone cluster on k8s,

* get the job running

* kill the pod.

The k8s deployment relaunches the pod but fails with

java.io.IOException: Missing data in tmp file: hdfs://nn-crunchy:8020/tmp/kafka-to-hdfs/ls_kraken_events/dt=2019-02-14/evt=ad_fill/.part-2-16.inprogress.449e8668-e886-4f89-b5f6-45ac68e25987

Unknown method truncate called on org.apache.hadoop.hdfs.protocol.ClientProtocol protocol.

The file does exist. We work with hadoop 2.6 , which does no have truncate. The previous version would see that "truncate" was not supported and drop a length file for the ,inprogress file and rename it to a valid part file.

Is this a known issue ?

Regards.