Re: Flink + S3
Posted by
Till Rohrmann on
URL: http://deprecated-apache-flink-user-mailing-list-archive.369.s1.nabble.com/Flink-S3-tp6190p6192.html
Hi Michael-Keith,
you can use S3 as the checkpoint directory for the filesystem state backend. This means that whenever a checkpoint is performed the state data will be written to this directory.
The same holds true for the zookeeper recovery storage directory. This directory will contain the submitted and not yet finished jobs as well as some meta data for the checkpoints. With this information it is possible to restore running jobs if the job manager dies.
As far as I know, Flink relies on Hadoop's file system wrapper classes to support S3. Flink has built in support for hdfs, maprfs and the local file system. For everything else, Flink tries to find a Hadoop class. Therefore, I fear that you need at least Hadoop's s3 filesystem class in your classpath and a file called core-site.xml or hdfs-site.xml which is stored at a location specified by fs.hdfs.hdfsdefault in Flink's configuration. And in one of these files you have to create the xml tag to specify the class. But the easiest way would be to simply install Hadoop.
I'm not aware of any puppet scripts but I might miss something here. If you should complete a puppet script, then it would definitely be a valuable addition to Flink :-)
Cheers,
Till