Hello, Regarding the Filesystem abstraction support, we are planning to use a distributed file system which complies with Hadoop Compatible File System (HCFS) standard in place of standard HDFS. According to the documentation (https://ci.apache.org/projects/flink/flink-docs-release-1.3/internals/filesystems.html), persistence gurantees is listed as one of the main requirement and to be precises it qualifies both visibility and durability gurantees. My question is, 1) Are we expecting the file system to support "Atomic Rename" characteristics? I believe checkpoint mechanism involves in renaming the files and will that have an impact if "atomic rename" is not guranteed by the underlying file system? 2) How does one certify Flink with HCFS (in place of standard HDFS) in terms of the scenarios/usecase that needs to be tested? Is there any general guidance on this? Thanks Vijay |
Following up on my question regarding backed Filesystem (HCFS) requirements. Appreciate any inputs. --- Regarding the Filesystem abstraction support, we are planning to use a distributed file system which complies with Hadoop Compatible File System (HCFS) standard in place of standard HDFS. According to the documentation (https://ci.apache.org/projects/flink/flink-docs-release-1.3/internals/filesystems.html), persistence gurantees is listed as one of the main requirement and to be precises it qualifies both visibility and durability gurantees. My question is, 1) Are we expecting the file system to support "Atomic Rename" characteristics? I believe checkpoint mechanism involves in renaming the files and will that have an impact if "atomic rename" is not guranteed by the underlying file system? 2) How does one certify Flink with HCFS (in place of standard HDFS) in terms of the scenarios/usecase that needs to be tested? Is there any general guidance on this? --- Regards Vijay On Wednesday, February 15, 2017 11:28 AM, Vijay Srinivasaraghavan <[hidden email]> wrote: Hello, Regarding the Filesystem abstraction support, we are planning to use a distributed file system which complies with Hadoop Compatible File System (HCFS) standard in place of standard HDFS. According to the documentation (https://ci.apache.org/projects/flink/flink-docs-release-1.3/internals/filesystems.html), persistence gurantees is listed as one of the main requirement and to be precises it qualifies both visibility and durability gurantees. My question is, 1) Are we expecting the file system to support "Atomic Rename" characteristics? I believe checkpoint mechanism involves in renaming the files and will that have an impact if "atomic rename" is not guranteed by the underlying file system? 2) How does one certify Flink with HCFS (in place of standard HDFS) in terms of the scenarios/usecase that needs to be tested? Is there any general guidance on this? Thanks Vijay |
Hi,
I think atomic rename is not part of the requirements. I'll add +Stephan who recently wrote this document in case he has any additional input. Cheers, Aljoscha On Thu, 16 Feb 2017 at 23:28 Vijay Srinivasaraghavan <[hidden email]> wrote:
|
Hi Vijay, Regarding your second question: First of all, the example jobs of Flink need to pass. Secondly, I would recommend implementing a test job that uses a lot of state, different state backends (file system and rocks) and some artificial failures. We at data Artisans have some testing jobs internally for testing such workloads. I'll try to publish them soon on Github so that others can use them as well. Please ping me if you urgently need them :) Regards, Robert On Fri, Feb 17, 2017 at 3:20 PM, Aljoscha Krettek <[hidden email]> wrote:
|
Free forum by Nabble | Edit this page |