OutputFormat in streaming job

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

OutputFormat in streaming job

Andrea Sella
Hi,

I created a custom OutputFormat to send data to a remote actor, there are issues to use an OutputFormat into a stream job? Or it will treat like a Sink?

I prefer to use it in order to create a custom ActorSystem per TM in the configure method.

Cheers,
Andrea
Reply | Threaded
Open this post in threaded view
|

Re: OutputFormat in streaming job

Fabian Hueske-2
Hi Andrea,

you can use any OutputFormat to emit data from a DataStream using the writeUsingOutputFormat() method.
However, this method does not guarantee exactly-once processing. In case of a failure, it might emit some records a second time. Hence the results will be written at least once.

Hope this helps,
Fabian

2016-05-06 12:45 GMT+02:00 Andrea Sella <[hidden email]>:
Hi,

I created a custom OutputFormat to send data to a remote actor, there are issues to use an OutputFormat into a stream job? Or it will treat like a Sink?

I prefer to use it in order to create a custom ActorSystem per TM in the configure method.

Cheers,
Andrea

Reply | Threaded
Open this post in threaded view
|

Re: OutputFormat in streaming job

Andrea Sella
Hi Fabian,

ATM I am not interesting to guarantee exactly-once processing, thank you for the clarification. 

As far as I know, it is not present a similar method as OutputFormat's configure for RichSinkFunction, correct? So I am not able to instantiate an ActorSystem per TM but I have to instantiate an ActorSystem per TaskSlot, which it is unsuitable because ActorSystem is very heavy.

Example:
Outputformat => 2 TM => 16 parallelism => 2 ActorSystem (instantiate into OutputFormat's configure method)
Sink => 2 TM => 16 parallelism => 16 Actor System  (instantiate into RichSinkFunction's open method)

Am I wrong?

Thanks again,
Andrea

2016-05-06 13:47 GMT+02:00 Fabian Hueske <[hidden email]>:
Hi Andrea,

you can use any OutputFormat to emit data from a DataStream using the writeUsingOutputFormat() method.
However, this method does not guarantee exactly-once processing. In case of a failure, it might emit some records a second time. Hence the results will be written at least once.

Hope this helps,
Fabian

2016-05-06 12:45 GMT+02:00 Andrea Sella <[hidden email]>:
Hi,

I created a custom OutputFormat to send data to a remote actor, there are issues to use an OutputFormat into a stream job? Or it will treat like a Sink?

I prefer to use it in order to create a custom ActorSystem per TM in the configure method.

Cheers,
Andrea


Reply | Threaded
Open this post in threaded view
|

Re: OutputFormat in streaming job

Fabian Hueske-2
Hi Andrea,

actually, OutputFormat.configure() will also be invoked per task. So you would also end up with 16 ActorSystems.
However, I think you can use synchronized singleton object to start one ActorSystem per TM (each TM and all tasks run in a single JVM).

So from the point of view of configure(), I think it does not make a difference whether to use an OutputFormat or a RichSinkFunction.
I would rather go for the SinkFunction, which is better suited for streaming jobs.

Cheers, Fabian

2016-05-06 14:10 GMT+02:00 Andrea Sella <[hidden email]>:
Hi Fabian,

ATM I am not interesting to guarantee exactly-once processing, thank you for the clarification. 

As far as I know, it is not present a similar method as OutputFormat's configure for RichSinkFunction, correct? So I am not able to instantiate an ActorSystem per TM but I have to instantiate an ActorSystem per TaskSlot, which it is unsuitable because ActorSystem is very heavy.

Example:
Outputformat => 2 TM => 16 parallelism => 2 ActorSystem (instantiate into OutputFormat's configure method)
Sink => 2 TM => 16 parallelism => 16 Actor System  (instantiate into RichSinkFunction's open method)

Am I wrong?

Thanks again,
Andrea

2016-05-06 13:47 GMT+02:00 Fabian Hueske <[hidden email]>:
Hi Andrea,

you can use any OutputFormat to emit data from a DataStream using the writeUsingOutputFormat() method.
However, this method does not guarantee exactly-once processing. In case of a failure, it might emit some records a second time. Hence the results will be written at least once.

Hope this helps,
Fabian

2016-05-06 12:45 GMT+02:00 Andrea Sella <[hidden email]>:
Hi,

I created a custom OutputFormat to send data to a remote actor, there are issues to use an OutputFormat into a stream job? Or it will treat like a Sink?

I prefer to use it in order to create a custom ActorSystem per TM in the configure method.

Cheers,
Andrea



Reply | Threaded
Open this post in threaded view
|

Re: OutputFormat in streaming job

Andrea Sella
Hi Fabian,

So I misunderstood the behaviour of configure(), thank you.

Andrea

2016-05-06 14:17 GMT+02:00 Fabian Hueske <[hidden email]>:
Hi Andrea,

actually, OutputFormat.configure() will also be invoked per task. So you would also end up with 16 ActorSystems.
However, I think you can use synchronized singleton object to start one ActorSystem per TM (each TM and all tasks run in a single JVM).

So from the point of view of configure(), I think it does not make a difference whether to use an OutputFormat or a RichSinkFunction.
I would rather go for the SinkFunction, which is better suited for streaming jobs.

Cheers, Fabian

2016-05-06 14:10 GMT+02:00 Andrea Sella <[hidden email]>:
Hi Fabian,

ATM I am not interesting to guarantee exactly-once processing, thank you for the clarification. 

As far as I know, it is not present a similar method as OutputFormat's configure for RichSinkFunction, correct? So I am not able to instantiate an ActorSystem per TM but I have to instantiate an ActorSystem per TaskSlot, which it is unsuitable because ActorSystem is very heavy.

Example:
Outputformat => 2 TM => 16 parallelism => 2 ActorSystem (instantiate into OutputFormat's configure method)
Sink => 2 TM => 16 parallelism => 16 Actor System  (instantiate into RichSinkFunction's open method)

Am I wrong?

Thanks again,
Andrea

2016-05-06 13:47 GMT+02:00 Fabian Hueske <[hidden email]>:
Hi Andrea,

you can use any OutputFormat to emit data from a DataStream using the writeUsingOutputFormat() method.
However, this method does not guarantee exactly-once processing. In case of a failure, it might emit some records a second time. Hence the results will be written at least once.

Hope this helps,
Fabian

2016-05-06 12:45 GMT+02:00 Andrea Sella <[hidden email]>:
Hi,

I created a custom OutputFormat to send data to a remote actor, there are issues to use an OutputFormat into a stream job? Or it will treat like a Sink?

I prefer to use it in order to create a custom ActorSystem per TM in the configure method.

Cheers,
Andrea