Hi Mailing List, I want to write and read intermediates to/from disk. The following foo- codesnippet may illustrate my intention: public void mapPartition(Iterable<T> tuples, Collector<T> out) { for (T tuple : tuples) { if (Condition) out.collect(tuple); else writeTupleToDisk } While (‘TupleOnDisk’) out.collect(‘ReadNextTupleFromDisk’); } I'am wondering if flink provides an integrated class for this purpose. I also have to precise identify the files with the intermediates due parallelism of mapPartition. Thank you in advance! Robert |
Flink has support for spillable intermediate results. Currently they
are only set if necessary to avoid pipeline deadlocks. You can force this via env.getConfig().setExecutionMode(ExecutionMode.BATCH); This will write shuffles to disk, but you don't get the fine-grained control you probably need for your use case. – Ufuk On Thu, May 5, 2016 at 3:29 PM, Paschek, Robert <[hidden email]> wrote: > Hi Mailing List, > > > > I want to write and read intermediates to/from disk. > > The following foo- codesnippet may illustrate my intention: > > > > public void mapPartition(Iterable<T> tuples, Collector<T> out) { > > > > for (T tuple : tuples) { > > > > if (Condition) > > out.collect(tuple); > > else > > writeTupleToDisk > > } > > > > While (‘TupleOnDisk’) > > out.collect(‘ReadNextTupleFromDisk’); > > } > > > > I'am wondering if flink provides an integrated class for this purpose. I > also have to precise identify the files with the intermediates due > parallelism of mapPartition. > > > > > > Thank you in advance! > > Robert |
I do not know if I understand completely, but I would create a new DataSet based on filtering the condition and then persist this DataSet. So : DataSet ds2 = DataSet1.filter(Condition) 2ds.output(...) On Mon, May 9, 2016 at 11:09 AM, Ufuk Celebi <[hidden email]> wrote: Flink has support for spillable intermediate results. Currently they |
Hey,
thank you for your answers and sorry for my late response : - (
The intention was to store some of the data to disk, when the main memory gets full / my temporary ArrayList<Tuple> reaches a pre-defined size.
I used com.opencsv.CSVReader and import com.opencsv.CSVWriter for this task and getRuntimeContext().getIndexOfThisSubtask() to differ the filenames from other tasks, running on the same machine. Fortunately that isn’t no longer necessary form my work.
Best Robert
Von: Vikram Saxena <[hidden email]>
Gesendet: Montag, 9. Mai 2016 12:15 An: [hidden email] Betreff: Re: Writing Intermediates to disk I do not know if I understand completely, but I would create a new DataSet based on filtering the condition and then persist this DataSet.
So :
DataSet ds2 = DataSet1.filter(Condition)
2ds.output(...)
On Mon, May 9, 2016 at 11:09 AM, Ufuk Celebi
<[hidden email]> wrote:
Flink has support for spillable intermediate results. Currently they |
Free forum by Nabble | Edit this page |