write data set to a single file

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

write data set to a single file

Mihail Vieru
Hi,

I need to write a data set to a single file without setting the
parallelism to 1.
How can I achieve this?

Cheers,
Mihail

P.S.: it's for persisting intermediate results in loops and reading
those in the next iteration.
Which btw work for higher iteration counts with explicit persistence.
Reply | Threaded
Open this post in threaded view
|

Re: write data set to a single file

Stephan Ewen
If you want to write a single file, you need to write it with one task. So, you can run a program with parallelism 100 and just set the sink operator to parallelism 1.

You can set the parallelism of each individual operator by calling "setParallelism()" after the operation, for example "result.writeAsText(path).setParallelism(1)".


On Wed, May 13, 2015 at 8:02 PM, Mihail Vieru <[hidden email]> wrote:
Hi,

I need to write a data set to a single file without setting the parallelism to 1.
How can I achieve this?

Cheers,
Mihail

P.S.: it's for persisting intermediate results in loops and reading those in the next iteration.
Which btw work for higher iteration counts with explicit persistence.

Reply | Threaded
Open this post in threaded view
|

Re: write data set to a single file

Mihail Vieru
Awesome, it works. Thanks! :)

On 13.05.2015 20:05, Stephan Ewen wrote:
If you want to write a single file, you need to write it with one task. So, you can run a program with parallelism 100 and just set the sink operator to parallelism 1.

You can set the parallelism of each individual operator by calling "setParallelism()" after the operation, for example "result.writeAsText(path).setParallelism(1)".


On Wed, May 13, 2015 at 8:02 PM, Mihail Vieru <[hidden email]> wrote:
Hi,

I need to write a data set to a single file without setting the parallelism to 1.
How can I achieve this?

Cheers,
Mihail

P.S.: it's for persisting intermediate results in loops and reading those in the next iteration.
Which btw work for higher iteration counts with explicit persistence.