StreamingFileSink with hdfs less than 2.7

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

StreamingFileSink with hdfs less than 2.7

Rinat
Hi mates, I decided to enable persist the state of our flink jobs, that write data into hdfs, but got some troubles with that.

I’m trying to use StreamingFileSink with cloudera hadoop, which version is 2.6.5,  and it doesn’t contain truncate method.

So, job fails immediately when it’s trying to start, when trying to initialize HadoopRecoverableWriterBecause it only works with hadoop fs, greater or equals than 2.7

Do you have any plans to adopt recovery for hadoop file systems, that doesn’t contain truncate method, or how I can workaround such limitation ?

If workaround does not exist, than the following behaviour will be good enough:

  1. get a path to the file, that should be restored
  2. get a valid-length from the state
  3. create a temporary directory and write stream from the restoring file into tmp until the valid-length is not reached
  4. replace the restoring file with the file from tmp catalog
  5. move file to the final state

what do you think about it ?

Sincerely yours,
Rinat Sharipov
Software Engineer at 1DMP CORE Team

mobile: +7 (925) 416-37-26

CleverDATA
make your data clever