Alter Flink's execution graph at run-time

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Alter Flink's execution graph at run-time

Xtra Coder
Hello, 

I'm curious about ability to alter processing of streams in Flink at run-time.

Potential use-case may look like following:

1. I have a stream already running (i.e. data processing is already started) in the Flink's cluster

2. At some point of time I decide that I need to add some more steps in the data-processing graph, for example i'd like to add logging of intermediate results after some 'flatMap()' function.Roughly this may look like adding another `flatMap()` into original graph - it just pipes 'in' to 'out' without modification (thus not affecting logic of 'current' flow and already generated state) and internally writes payload to log or sends to Kafka for further processing.

I guess, same way it should be possible to remove 'transformationless' functions from the execution graph at run-time without stopping the execution.

Is this possible? or planed to be possible?

Thanks.
Reply | Threaded
Open this post in threaded view
|

Re: Alter Flink's execution graph at run-time

Ufuk Celebi
Hey, currently this is not possible. You can use savepoints
(https://ci.apache.org/projects/flink/flink-docs-release-1.0/apis/streaming/savepoints.html)
to stop the job and then resume with the altered job version. There
are plans to allow dynamic rescaling of the execution graph, but I
think they will also work by first stopping and then resuming the job.
Ideally, for a neglectable amount of time. Till implemented the
initial version of dynamic scaling, but he has not started working on
the execution graph side yet as far as I know. CC'ing him here, maybe
he already has some ideas. ;)

On Tue, May 31, 2016 at 8:04 AM, Xtra Coder <[hidden email]> wrote:

> Hello,
>
> I'm curious about ability to alter processing of streams in Flink at
> run-time.
>
> Potential use-case may look like following:
>
> 1. I have a stream already running (i.e. data processing is already started)
> in the Flink's cluster
>
> 2. At some point of time I decide that I need to add some more steps in the
> data-processing graph, for example i'd like to add logging of intermediate
> results after some 'flatMap()' function.Roughly this may look like adding
> another `flatMap()` into original graph - it just pipes 'in' to 'out'
> without modification (thus not affecting logic of 'current' flow and already
> generated state) and internally writes payload to log or sends to Kafka for
> further processing.
>
> I guess, same way it should be possible to remove 'transformationless'
> functions from the execution graph at run-time without stopping the
> execution.
>
> Is this possible? or planed to be possible?
>
> Thanks.
Reply | Threaded
Open this post in threaded view
|

Re: Alter Flink's execution graph at run-time

Xtra Coder
Thanks, altering via pause/update/resume is OK, at least for now. Will try it on practice.

Just in case - question was inspired by Apache NiFi. If you haven't seen this https://www.youtube.com/watch?v=sQCgtCoZyFQ - at 29:10.
I would say such thing is a must have feature in "production" where stoppage of processing of live stream is not acceptable.