Executing detached data stream programs

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Executing detached data stream programs

jganoff
Hi,

I plan to leveraging Flink data stream programs within a larger application. I’d like to be able to execute a data stream program in detached mode directly from the StreamExecutionEnvironment similar to how I can execute a program in blocking mode. I was expecting to find StreamExecutionEnvironment.executeDetached(). What’s the best practice for executing a detached data stream program and monitoring it’s progress?

From what I can tell one approach could be to refactor the client configuration and interaction logic from CliFrontend into a more embed-friendly API.

Thoughts?

Thanks,
Jordan Ganoff

smime.p7s (4K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Executing detached data stream programs

rmetzger0
Hi Jordan,

the ./bin/flink client's run command has a -d / --detached flag for detached execution.
However, this doesn't allow you to programatically control the running job.
What you probably have to do is using the RemoteEnvironment submitting the job in a blocking way using a separate thread.
Then, another thread can monitor the application's progress (using the jobmanager's REST APIs) or some custom monitoring.

I know that this is not very handy right now, and there are more arguments for having a nice API for other features like metrics, accumulators, state queries or data samples. But I think such features will be added to Flink eventually.


On Mon, May 30, 2016 at 4:14 PM, Jordan Ganoff <[hidden email]> wrote:
Hi,

I plan to leveraging Flink data stream programs within a larger application. I’d like to be able to execute a data stream program in detached mode directly from the StreamExecutionEnvironment similar to how I can execute a program in blocking mode. I was expecting to find StreamExecutionEnvironment.executeDetached(). What’s the best practice for executing a detached data stream program and monitoring it’s progress?

From what I can tell one approach could be to refactor the client configuration and interaction logic from CliFrontend into a more embed-friendly API.

Thoughts?

Thanks,
Jordan Ganoff

Reply | Threaded
Open this post in threaded view
|

Re: Executing detached data stream programs

jganoff
Hi Robert,

Thanks for the suggestion. Threading out a blocking RemoteStreamEnvironment.execute() call and polling the monitoring REST API will work for now. Once the job transitions to running I will kill the thread and monitor the job through the REST API.

As for metrics, accumulators, and other job state I think I can get away with monitoring jobs through a combination of my own metrics system and the data exposed through the Flink Monitoring REST APIs [0].

I didn't find any JIRA issues related to my original request for being able to easily submit detached jobs through an ExecutionEnvironment. Does that sound like something the Flink team would be open to discussing?

Thanks!

[0] https://ci.apache.org/projects/flink/flink-docs-master/internals/monitoring_rest_api.html
Reply | Threaded
Open this post in threaded view
|

Re: Executing detached data stream programs

rmetzger0
Hi Jordan,

the community is definitively open to discuss this further (in particular if users start asking for the feature)
Here is the related JIRA issue: https://issues.apache.org/jira/browse/FLINK-2313

On Tue, May 31, 2016 at 5:19 PM, jganoff <[hidden email]> wrote:
Hi Robert,

Thanks for the suggestion. Threading out a blocking
RemoteStreamEnvironment.execute() call and polling the monitoring REST API
will work for now. Once the job transitions to running I will kill the
thread and monitor the job through the REST API.

As for metrics, accumulators, and other job state I think I can get away
with monitoring jobs through a combination of my own metrics system and the
data exposed through the Flink Monitoring REST APIs [0].

I didn't find any JIRA issues related to my original request for being able
to easily submit detached jobs through an ExecutionEnvironment. Does that
sound like something the Flink team would be open to discussing?

Thanks!

[0]
https://ci.apache.org/projects/flink/flink-docs-master/internals/monitoring_rest_api.html



--
View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Executing-detached-data-stream-programs-tp7268p7291.html
Sent from the Apache Flink User Mailing List archive. mailing list archive at Nabble.com.