Batch Processing as Streaming

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Batch Processing as Streaming

tambunanw
Hi All,

I see that the way batch processing works in Flink is quite different with Spark. It's all about using streaming engine in Flink.

I have a couple of question

1. Is there any support on Checkpointing on batch processing also ? Or that's only for streaming

2. I want to ask about operator lifecyle ? is that short live or long live ? Any docs where i can read about this more ?


Cheers
Reply | Threaded
Open this post in threaded view
|

Re: Batch Processing as Streaming

Stephan Ewen
Hi!

I am actually working to get some more docs out there, there is a lack right now, agreed.

Concerning your questions:

(1) Batch programs basically recover from the data sources right now. Checkpointing as in the streaming case does not happen for batch programs. We have branches that materialize the intermediate streams and apply backtracking logic for batch programs, but they are not merged into the master at this point.

(2) Streaming operators and user functions are long lived. They are started once and live to the end of the stream, or the machine failure.

Greetings,
Stephan


On Thu, Jul 2, 2015 at 11:48 AM, tambunanw <[hidden email]> wrote:
Hi All,

I see that the way batch processing works in Flink is quite different with
Spark. It's all about using streaming engine in Flink.

I have a couple of question

1. Is there any support on Checkpointing on batch processing also ? Or
that's only for streaming

2. I want to ask about operator lifecyle ? is that short live or long live ?
Any docs where i can read about this more ?


Cheers



--
View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Batch-Processing-as-Streaming-tp1909.html
Sent from the Apache Flink User Mailing List archive. mailing list archive at Nabble.com.

Reply | Threaded
Open this post in threaded view
|

Re: Batch Processing as Streaming

tambunanw
Thanks Stephan, 

That's clear !

Cheers

On Thu, Jul 2, 2015 at 6:13 PM, Stephan Ewen <[hidden email]> wrote:
Hi!

I am actually working to get some more docs out there, there is a lack right now, agreed.

Concerning your questions:

(1) Batch programs basically recover from the data sources right now. Checkpointing as in the streaming case does not happen for batch programs. We have branches that materialize the intermediate streams and apply backtracking logic for batch programs, but they are not merged into the master at this point.

(2) Streaming operators and user functions are long lived. They are started once and live to the end of the stream, or the machine failure.

Greetings,
Stephan


On Thu, Jul 2, 2015 at 11:48 AM, tambunanw <[hidden email]> wrote:
Hi All,

I see that the way batch processing works in Flink is quite different with
Spark. It's all about using streaming engine in Flink.

I have a couple of question

1. Is there any support on Checkpointing on batch processing also ? Or
that's only for streaming

2. I want to ask about operator lifecyle ? is that short live or long live ?
Any docs where i can read about this more ?


Cheers



--
View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Batch-Processing-as-Streaming-tp1909.html
Sent from the Apache Flink User Mailing List archive. mailing list archive at Nabble.com.




--