Writing Intermediates to disk

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Writing Intermediates to disk

Paschek, Robert

Hi Mailing List,

 

I want to write and read intermediates to/from disk.

The following foo- codesnippet may illustrate my intention:

 

public void mapPartition(Iterable<T> tuples, Collector<T> out) {

 

                for (T tuple : tuples) {

                              

                               if (Condition)

                                               out.collect(tuple);

                               else

                                               writeTupleToDisk

                }

 

                While (‘TupleOnDisk’)

                               out.collect(‘ReadNextTupleFromDisk’);

}

                              

I'am wondering if flink provides an integrated class for this purpose. I also have to precise identify the files with the intermediates due parallelism of mapPartition.

 

 

Thank you in advance!

Robert

Reply | Threaded
Open this post in threaded view
|

Re: Writing Intermediates to disk

Ufuk Celebi
Flink has support for spillable intermediate results. Currently they
are only set if necessary to avoid pipeline deadlocks.

You can force this via

env.getConfig().setExecutionMode(ExecutionMode.BATCH);

This will write shuffles to disk, but you don't get the fine-grained
control you probably need for your use case.

– Ufuk

On Thu, May 5, 2016 at 3:29 PM, Paschek, Robert
<[hidden email]> wrote:

> Hi Mailing List,
>
>
>
> I want to write and read intermediates to/from disk.
>
> The following foo- codesnippet may illustrate my intention:
>
>
>
> public void mapPartition(Iterable<T> tuples, Collector<T> out) {
>
>
>
>                 for (T tuple : tuples) {
>
>
>
>                                if (Condition)
>
>                                                out.collect(tuple);
>
>                                else
>
>                                                writeTupleToDisk
>
>                 }
>
>
>
>                 While (‘TupleOnDisk’)
>
>                                out.collect(‘ReadNextTupleFromDisk’);
>
> }
>
>
>
> I'am wondering if flink provides an integrated class for this purpose. I
> also have to precise identify the files with the intermediates due
> parallelism of mapPartition.
>
>
>
>
>
> Thank you in advance!
>
> Robert
Reply | Threaded
Open this post in threaded view
|

Re: Writing Intermediates to disk

Vikram Saxena
I do not know if I understand completely, but I would  create a new DataSet based on filtering the condition and then persist this DataSet. 

So : 

DataSet ds2 = DataSet1.filter(Condition)

2ds.output(...)




On Mon, May 9, 2016 at 11:09 AM, Ufuk Celebi <[hidden email]> wrote:
Flink has support for spillable intermediate results. Currently they
are only set if necessary to avoid pipeline deadlocks.

You can force this via

env.getConfig().setExecutionMode(ExecutionMode.BATCH);

This will write shuffles to disk, but you don't get the fine-grained
control you probably need for your use case.

– Ufuk

On Thu, May 5, 2016 at 3:29 PM, Paschek, Robert
<[hidden email]> wrote:
> Hi Mailing List,
>
>
>
> I want to write and read intermediates to/from disk.
>
> The following foo- codesnippet may illustrate my intention:
>
>
>
> public void mapPartition(Iterable<T> tuples, Collector<T> out) {
>
>
>
>                 for (T tuple : tuples) {
>
>
>
>                                if (Condition)
>
>                                                out.collect(tuple);
>
>                                else
>
>                                                writeTupleToDisk
>
>                 }
>
>
>
>                 While (‘TupleOnDisk’)
>
>                                out.collect(‘ReadNextTupleFromDisk’);
>
> }
>
>
>
> I'am wondering if flink provides an integrated class for this purpose. I
> also have to precise identify the files with the intermediates due
> parallelism of mapPartition.
>
>
>
>
>
> Thank you in advance!
>
> Robert

Reply | Threaded
Open this post in threaded view
|

AW: Writing Intermediates to disk

Paschek, Robert

Hey,

 

thank you for your answers and sorry for my late response : - (

 

The intention was to store some of the data to disk, when the main memory gets full / my temporary ArrayList<Tuple> reaches a pre-defined size.

 

I used com.opencsv.CSVReader and import com.opencsv.CSVWriter for this task and getRuntimeContext().getIndexOfThisSubtask() to differ the filenames from other tasks, running on the same machine.

Fortunately that isn’t no longer necessary form my work.

 

Best

Robert





Von: Vikram Saxena <[hidden email]>
Gesendet: Montag, 9. Mai 2016 12:15
An: [hidden email]
Betreff: Re: Writing Intermediates to disk
 
I do not know if I understand completely, but I would  create a new DataSet based on filtering the condition and then persist this DataSet. 

So : 

DataSet ds2 = DataSet1.filter(Condition)

2ds.output(...)




On Mon, May 9, 2016 at 11:09 AM, Ufuk Celebi <[hidden email]> wrote:
Flink has support for spillable intermediate results. Currently they
are only set if necessary to avoid pipeline deadlocks.

You can force this via

env.getConfig().setExecutionMode(ExecutionMode.BATCH);

This will write shuffles to disk, but you don't get the fine-grained
control you probably need for your use case.

– Ufuk

On Thu, May 5, 2016 at 3:29 PM, Paschek, Robert
<[hidden email]> wrote:
> Hi Mailing List,
>
>
>
> I want to write and read intermediates to/from disk.
>
> The following foo- codesnippet may illustrate my intention:
>
>
>
> public void mapPartition(Iterable<T> tuples, Collector<T> out) {
>
>
>
>                 for (T tuple : tuples) {
>
>
>
>                                if (Condition)
>
>                                                out.collect(tuple);
>
>                                else
>
>                                                writeTupleToDisk
>
>                 }
>
>
>
>                 While (‘TupleOnDisk’)
>
>                                out.collect(‘ReadNextTupleFromDisk’);
>
> }
>
>
>
> I'am wondering if flink provides an integrated class for this purpose. I
> also have to precise identify the files with the intermediates due
> parallelism of mapPartition.
>
>
>
>
>
> Thank you in advance!
>
> Robert