HDFS checkpoints for rocksDB state backend:

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

HDFS checkpoints for rocksDB state backend:

Andrea Spina
Dear community, 
I'm trying to use HDFS checkpoints in flink-1.6.4 with the following configuration

state.backend: rocksdb
state.checkpoints.dir: hdfs://rbl1.stage.certilogo.radicalbit.io:8020/flink/checkpoint
state.savepoints.dir: hdfs://rbl1.stage.certilogo.radicalbit.io:8020/flink/savepoints

and I record the following exceptions

Caused by: java.io.IOException: Could not flush and close the file system output stream to hdfs://my.rb.biz:8020/flink/checkpoint/fd35c7145e6911e1721cd0f03656b0a8/chk-2/48502e63-cb69-4944-8561-308da2f9f26a in order to obtain the stream state handle
        at org.apache.flink.runtime.state.filesystem.FsCheckpointStreamFactory$FsCheckpointStateOutputStream.closeAndGetHandle(FsCheckpointStreamFactory.java:328)
        at org.apache.flink.runtime.state.CheckpointStreamWithResultProvider$PrimaryStreamOnly.closeAndFinalizeCheckpointStreamResult(CheckpointStreamWithResultProvider.java:77)
        at org.apache.flink.runtime.state.heap.HeapKeyedStateBackend$HeapSnapshotStrategy$1.performOperation(HeapKeyedStateBackend.java:826)
        at org.apache.flink.runtime.state.heap.HeapKeyedStateBackend$HeapSnapshotStrategy$1.performOperation(HeapKeyedStateBackend.java:759)
        at org.apache.flink.runtime.io.async.AbstractAsyncCallableWithResources.call(AbstractAsyncCallableWithResources.java:75)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at org.apache.flink.util.FutureUtil.runIfNotDoneAndGet(FutureUtil.java:50)
        ... 7 more
Caused by: java.io.IOException: DataStreamer Exception:
        at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:562)
Caused by: java.lang.NoClassDefFoundError: Could not initialize class org.apache.hadoop.hdfs.protocol.HdfsConstants
        at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:1318)
        at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1262)
        at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:448)


or

       at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at org.apache.flink.util.FutureUtil.runIfNotDoneAndGet(FutureUtil.java:50)
        ... 7 more
Caused by: java.io.IOException: DataStreamer Exception:
        at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:562)
Caused by: javax.xml.parsers.FactoryConfigurationError: Provider for class javax.xml.parsers.DocumentBuilderFactory cannot be created
        at javax.xml.parsers.FactoryFinder.findServiceProvider(FactoryFinder.java:311)
        at javax.xml.parsers.FactoryFinder.find(FactoryFinder.java:267)
        at javax.xml.parsers.DocumentBuilderFactory.newInstance(DocumentBuilderFactory.java:120)
        at org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:2515)
        at org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:2492)
        at org.apache.hadoop.conf.Configuration.getProps(Configuration.java:2405)
        at org.apache.hadoop.conf.Configuration.get(Configuration.java:981)
        at org.apache.hadoop.conf.Configuration.getTrimmed(Configuration.java:1031)
        at org.apache.hadoop.conf.Configuration.getInt(Configuration.java:1251)
        at org.apache.hadoop.hdfs.protocol.HdfsConstants.<clinit>(HdfsConstants.java:76)
        at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:1318)
        at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1262)
        at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:448)


In my lib folder I have the uber jar about hdfs as usual but I am not able to let the Job checkpointing its state correctly.
I read also here [1] but is not helping.

Thank you for the precious help

[1] - https://www.cnblogs.com/chendapao/p/9170566.html
--
Andrea Spina
Head of R&D @ Radicalbit Srl
Via Giovanni Battista Pirelli 11, 20124, Milano - IT
Reply | Threaded
Open this post in threaded view
|

Re: HDFS checkpoints for rocksDB state backend:

Congxian Qiu
Hi  Andrea

As the NoClassDefFoundError, could you please verify that there exist `org.apache.hadoop.hdfs.protocol.HdfsConstants` in your jar.
Or could you use Arthas[1] to check if there exists the class when running the job?


Andrea Spina <[hidden email]> 于2019年6月27日周四 上午1:57写道:
Dear community, 
I'm trying to use HDFS checkpoints in flink-1.6.4 with the following configuration

state.backend: rocksdb
state.checkpoints.dir: hdfs://rbl1.stage.certilogo.radicalbit.io:8020/flink/checkpoint
state.savepoints.dir: hdfs://rbl1.stage.certilogo.radicalbit.io:8020/flink/savepoints

and I record the following exceptions

Caused by: java.io.IOException: Could not flush and close the file system output stream to hdfs://my.rb.biz:8020/flink/checkpoint/fd35c7145e6911e1721cd0f03656b0a8/chk-2/48502e63-cb69-4944-8561-308da2f9f26a in order to obtain the stream state handle
        at org.apache.flink.runtime.state.filesystem.FsCheckpointStreamFactory$FsCheckpointStateOutputStream.closeAndGetHandle(FsCheckpointStreamFactory.java:328)
        at org.apache.flink.runtime.state.CheckpointStreamWithResultProvider$PrimaryStreamOnly.closeAndFinalizeCheckpointStreamResult(CheckpointStreamWithResultProvider.java:77)
        at org.apache.flink.runtime.state.heap.HeapKeyedStateBackend$HeapSnapshotStrategy$1.performOperation(HeapKeyedStateBackend.java:826)
        at org.apache.flink.runtime.state.heap.HeapKeyedStateBackend$HeapSnapshotStrategy$1.performOperation(HeapKeyedStateBackend.java:759)
        at org.apache.flink.runtime.io.async.AbstractAsyncCallableWithResources.call(AbstractAsyncCallableWithResources.java:75)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at org.apache.flink.util.FutureUtil.runIfNotDoneAndGet(FutureUtil.java:50)
        ... 7 more
Caused by: java.io.IOException: DataStreamer Exception:
        at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:562)
Caused by: java.lang.NoClassDefFoundError: Could not initialize class org.apache.hadoop.hdfs.protocol.HdfsConstants
        at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:1318)
        at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1262)
        at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:448)


or

       at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at org.apache.flink.util.FutureUtil.runIfNotDoneAndGet(FutureUtil.java:50)
        ... 7 more
Caused by: java.io.IOException: DataStreamer Exception:
        at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:562)
Caused by: javax.xml.parsers.FactoryConfigurationError: Provider for class javax.xml.parsers.DocumentBuilderFactory cannot be created
        at javax.xml.parsers.FactoryFinder.findServiceProvider(FactoryFinder.java:311)
        at javax.xml.parsers.FactoryFinder.find(FactoryFinder.java:267)
        at javax.xml.parsers.DocumentBuilderFactory.newInstance(DocumentBuilderFactory.java:120)
        at org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:2515)
        at org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:2492)
        at org.apache.hadoop.conf.Configuration.getProps(Configuration.java:2405)
        at org.apache.hadoop.conf.Configuration.get(Configuration.java:981)
        at org.apache.hadoop.conf.Configuration.getTrimmed(Configuration.java:1031)
        at org.apache.hadoop.conf.Configuration.getInt(Configuration.java:1251)
        at org.apache.hadoop.hdfs.protocol.HdfsConstants.<clinit>(HdfsConstants.java:76)
        at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:1318)
        at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1262)
        at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:448)


In my lib folder I have the uber jar about hdfs as usual but I am not able to let the Job checkpointing its state correctly.
I read also here [1] but is not helping.

Thank you for the precious help

[1] - https://www.cnblogs.com/chendapao/p/9170566.html
--
Andrea Spina
Head of R&D @ Radicalbit Srl
Via Giovanni Battista Pirelli 11, 20124, Milano - IT
Reply | Threaded
Open this post in threaded view
|

Re: HDFS checkpoints for rocksDB state backend:

Andrea Spina
HI Qiu, 
my jar does not contain the class `org.apache.hadoop.hdfs.protocol.HdfsConstants`, but I do expect it is contained within `flink-shaded-hadoop2-uber-1.6.4.jar` which is located in Flink cluster libs.

Il giorno gio 27 giu 2019 alle ore 04:03 Congxian Qiu <[hidden email]> ha scritto:
Hi  Andrea

As the NoClassDefFoundError, could you please verify that there exist `org.apache.hadoop.hdfs.protocol.HdfsConstants` in your jar.
Or could you use Arthas[1] to check if there exists the class when running the job?


Andrea Spina <[hidden email]> 于2019年6月27日周四 上午1:57写道:
Dear community, 
I'm trying to use HDFS checkpoints in flink-1.6.4 with the following configuration

state.backend: rocksdb
state.checkpoints.dir: hdfs://rbl1.stage.certilogo.radicalbit.io:8020/flink/checkpoint
state.savepoints.dir: hdfs://rbl1.stage.certilogo.radicalbit.io:8020/flink/savepoints

and I record the following exceptions

Caused by: java.io.IOException: Could not flush and close the file system output stream to hdfs://my.rb.biz:8020/flink/checkpoint/fd35c7145e6911e1721cd0f03656b0a8/chk-2/48502e63-cb69-4944-8561-308da2f9f26a in order to obtain the stream state handle
        at org.apache.flink.runtime.state.filesystem.FsCheckpointStreamFactory$FsCheckpointStateOutputStream.closeAndGetHandle(FsCheckpointStreamFactory.java:328)
        at org.apache.flink.runtime.state.CheckpointStreamWithResultProvider$PrimaryStreamOnly.closeAndFinalizeCheckpointStreamResult(CheckpointStreamWithResultProvider.java:77)
        at org.apache.flink.runtime.state.heap.HeapKeyedStateBackend$HeapSnapshotStrategy$1.performOperation(HeapKeyedStateBackend.java:826)
        at org.apache.flink.runtime.state.heap.HeapKeyedStateBackend$HeapSnapshotStrategy$1.performOperation(HeapKeyedStateBackend.java:759)
        at org.apache.flink.runtime.io.async.AbstractAsyncCallableWithResources.call(AbstractAsyncCallableWithResources.java:75)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at org.apache.flink.util.FutureUtil.runIfNotDoneAndGet(FutureUtil.java:50)
        ... 7 more
Caused by: java.io.IOException: DataStreamer Exception:
        at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:562)
Caused by: java.lang.NoClassDefFoundError: Could not initialize class org.apache.hadoop.hdfs.protocol.HdfsConstants
        at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:1318)
        at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1262)
        at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:448)


or

       at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at org.apache.flink.util.FutureUtil.runIfNotDoneAndGet(FutureUtil.java:50)
        ... 7 more
Caused by: java.io.IOException: DataStreamer Exception:
        at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:562)
Caused by: javax.xml.parsers.FactoryConfigurationError: Provider for class javax.xml.parsers.DocumentBuilderFactory cannot be created
        at javax.xml.parsers.FactoryFinder.findServiceProvider(FactoryFinder.java:311)
        at javax.xml.parsers.FactoryFinder.find(FactoryFinder.java:267)
        at javax.xml.parsers.DocumentBuilderFactory.newInstance(DocumentBuilderFactory.java:120)
        at org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:2515)
        at org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:2492)
        at org.apache.hadoop.conf.Configuration.getProps(Configuration.java:2405)
        at org.apache.hadoop.conf.Configuration.get(Configuration.java:981)
        at org.apache.hadoop.conf.Configuration.getTrimmed(Configuration.java:1031)
        at org.apache.hadoop.conf.Configuration.getInt(Configuration.java:1251)
        at org.apache.hadoop.hdfs.protocol.HdfsConstants.<clinit>(HdfsConstants.java:76)
        at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:1318)
        at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1262)
        at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:448)


In my lib folder I have the uber jar about hdfs as usual but I am not able to let the Job checkpointing its state correctly.
I read also here [1] but is not helping.

Thank you for the precious help

[1] - https://www.cnblogs.com/chendapao/p/9170566.html
--
Andrea Spina
Head of R&D @ Radicalbit Srl
Via Giovanni Battista Pirelli 11, 20124, Milano - IT


--
Andrea Spina
Head of R&D @ Radicalbit Srl
Via Giovanni Battista Pirelli 11, 20124, Milano - IT
Reply | Threaded
Open this post in threaded view
|

Re: HDFS checkpoints for rocksDB state backend:

Yang Wang
Hi, Andrea

If you are running flink cluster on Yarn, the jar `flink-shaded-hadoop2-uber-1.6.4.jar` should exist in the lib dir of  the flink client, so that it could be uploaded to the Yarn Distributed Cache and then be available on JM and TM.
And if you are running flink standalone cluster, the jar `flink-shaded-hadoop2-uber-1.6.4.jar` should exist on each slaves which you want to start a TaskManager. 

You could check the classpath in the TaskManager log.

Andrea Spina <[hidden email]> 于2019年6月27日周四 下午3:52写道:
HI Qiu, 
my jar does not contain the class `org.apache.hadoop.hdfs.protocol.HdfsConstants`, but I do expect it is contained within `flink-shaded-hadoop2-uber-1.6.4.jar` which is located in Flink cluster libs.

Il giorno gio 27 giu 2019 alle ore 04:03 Congxian Qiu <[hidden email]> ha scritto:
Hi  Andrea

As the NoClassDefFoundError, could you please verify that there exist `org.apache.hadoop.hdfs.protocol.HdfsConstants` in your jar.
Or could you use Arthas[1] to check if there exists the class when running the job?


Andrea Spina <[hidden email]> 于2019年6月27日周四 上午1:57写道:
Dear community, 
I'm trying to use HDFS checkpoints in flink-1.6.4 with the following configuration

state.backend: rocksdb
state.checkpoints.dir: hdfs://rbl1.stage.certilogo.radicalbit.io:8020/flink/checkpoint
state.savepoints.dir: hdfs://rbl1.stage.certilogo.radicalbit.io:8020/flink/savepoints

and I record the following exceptions

Caused by: java.io.IOException: Could not flush and close the file system output stream to hdfs://my.rb.biz:8020/flink/checkpoint/fd35c7145e6911e1721cd0f03656b0a8/chk-2/48502e63-cb69-4944-8561-308da2f9f26a in order to obtain the stream state handle
        at org.apache.flink.runtime.state.filesystem.FsCheckpointStreamFactory$FsCheckpointStateOutputStream.closeAndGetHandle(FsCheckpointStreamFactory.java:328)
        at org.apache.flink.runtime.state.CheckpointStreamWithResultProvider$PrimaryStreamOnly.closeAndFinalizeCheckpointStreamResult(CheckpointStreamWithResultProvider.java:77)
        at org.apache.flink.runtime.state.heap.HeapKeyedStateBackend$HeapSnapshotStrategy$1.performOperation(HeapKeyedStateBackend.java:826)
        at org.apache.flink.runtime.state.heap.HeapKeyedStateBackend$HeapSnapshotStrategy$1.performOperation(HeapKeyedStateBackend.java:759)
        at org.apache.flink.runtime.io.async.AbstractAsyncCallableWithResources.call(AbstractAsyncCallableWithResources.java:75)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at org.apache.flink.util.FutureUtil.runIfNotDoneAndGet(FutureUtil.java:50)
        ... 7 more
Caused by: java.io.IOException: DataStreamer Exception:
        at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:562)
Caused by: java.lang.NoClassDefFoundError: Could not initialize class org.apache.hadoop.hdfs.protocol.HdfsConstants
        at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:1318)
        at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1262)
        at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:448)


or

       at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at org.apache.flink.util.FutureUtil.runIfNotDoneAndGet(FutureUtil.java:50)
        ... 7 more
Caused by: java.io.IOException: DataStreamer Exception:
        at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:562)
Caused by: javax.xml.parsers.FactoryConfigurationError: Provider for class javax.xml.parsers.DocumentBuilderFactory cannot be created
        at javax.xml.parsers.FactoryFinder.findServiceProvider(FactoryFinder.java:311)
        at javax.xml.parsers.FactoryFinder.find(FactoryFinder.java:267)
        at javax.xml.parsers.DocumentBuilderFactory.newInstance(DocumentBuilderFactory.java:120)
        at org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:2515)
        at org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:2492)
        at org.apache.hadoop.conf.Configuration.getProps(Configuration.java:2405)
        at org.apache.hadoop.conf.Configuration.get(Configuration.java:981)
        at org.apache.hadoop.conf.Configuration.getTrimmed(Configuration.java:1031)
        at org.apache.hadoop.conf.Configuration.getInt(Configuration.java:1251)
        at org.apache.hadoop.hdfs.protocol.HdfsConstants.<clinit>(HdfsConstants.java:76)
        at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:1318)
        at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1262)
        at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:448)


In my lib folder I have the uber jar about hdfs as usual but I am not able to let the Job checkpointing its state correctly.
I read also here [1] but is not helping.

Thank you for the precious help

[1] - https://www.cnblogs.com/chendapao/p/9170566.html
--
Andrea Spina
Head of R&D @ Radicalbit Srl
Via Giovanni Battista Pirelli 11, 20124, Milano - IT


--
Andrea Spina
Head of R&D @ Radicalbit Srl
Via Giovanni Battista Pirelli 11, 20124, Milano - IT