Job scheduling

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Job scheduling

Flavio Pompermaier
Hi to all,

I'd like to know if there's an example of how to schedule a Job in Flink.
Do we still need something like Oozie or Quartz or we can avoid them?

Best,
Flavio
Reply | Threaded
Open this post in threaded view
|

Re: Job scheduling

Fabian Hueske
Hi Flavio,

what exactly do you mean by scheduling?
Do you want to run a job in regular intervals or execute a complex workflow?

Oozie is primarily used to orchestrate the execution of MapReduce workflows. Since, MR is a rather inflexible programming model, complex tasks need to split up into multiple dependent jobs that are executed once their predecessors have finished. Oozie orchestrates this execution.
In Flink, you can build a complex analysis flow as a single program and execute it. Hence, there is no need for a workflow scheduler such as Oozie.

If you want to run a job in regular intervals, you can configure a cron job, that starts executes the CLI client or implement a Java or Scala program that submits jobs a certain points in time.

Best, Fabian

2014-09-11 15:36 GMT+02:00 Flavio Pompermaier <[hidden email]>:
Hi to all,

I'd like to know if there's an example of how to schedule a Job in Flink.
Do we still need something like Oozie or Quartz or we can avoid them?

Best,
Flavio

Reply | Threaded
Open this post in threaded view
|

Re: Job scheduling

Flavio Pompermaier
Of course with Flink I could in principle execute almost everything with a single Job but, in general, I could write 2 different jobs and decide from time to time when the second should be run.
That's why also metheor scripts are very useful :)
From what I know there was a scheduler in Stratosphere that was using RabbitMQ, right?

I would like to avoid to run linux commands and instead use some REST interface to trigger or schedule jobs.

Best,
Flavio

On Thu, Sep 11, 2014 at 4:07 PM, Fabian Hueske <[hidden email]> wrote:
Hi Flavio,

what exactly do you mean by scheduling?
Do you want to run a job in regular intervals or execute a complex workflow?

Oozie is primarily used to orchestrate the execution of MapReduce workflows. Since, MR is a rather inflexible programming model, complex tasks need to split up into multiple dependent jobs that are executed once their predecessors have finished. Oozie orchestrates this execution.
In Flink, you can build a complex analysis flow as a single program and execute it. Hence, there is no need for a workflow scheduler such as Oozie.

If you want to run a job in regular intervals, you can configure a cron job, that starts executes the CLI client or implement a Java or Scala program that submits jobs a certain points in time.

Best, Fabian

2014-09-11 15:36 GMT+02:00 Flavio Pompermaier <[hidden email]>:
Hi to all,

I'd like to know if there's an example of how to schedule a Job in Flink.
Do we still need something like Oozie or Quartz or we can avoid them?

Best,
Flavio

Reply | Threaded
Open this post in threaded view
|

Re: Job scheduling

rmetzger0
Are you referring to this project? https://github.com/TU-Berlin/dopa-scheduler
Its not an official repository of the Flink (Stratosphere) project. I think a PhD student at TU Berlin created the code there. 



On Thu, Sep 11, 2014 at 4:29 PM, Flavio Pompermaier <[hidden email]> wrote:
Of course with Flink I could in principle execute almost everything with a single Job but, in general, I could write 2 different jobs and decide from time to time when the second should be run.
That's why also metheor scripts are very useful :)
From what I know there was a scheduler in Stratosphere that was using RabbitMQ, right?

I would like to avoid to run linux commands and instead use some REST interface to trigger or schedule jobs.

Best,
Flavio


On Thu, Sep 11, 2014 at 4:07 PM, Fabian Hueske <[hidden email]> wrote:
Hi Flavio,

what exactly do you mean by scheduling?
Do you want to run a job in regular intervals or execute a complex workflow?

Oozie is primarily used to orchestrate the execution of MapReduce workflows. Since, MR is a rather inflexible programming model, complex tasks need to split up into multiple dependent jobs that are executed once their predecessors have finished. Oozie orchestrates this execution.
In Flink, you can build a complex analysis flow as a single program and execute it. Hence, there is no need for a workflow scheduler such as Oozie.

If you want to run a job in regular intervals, you can configure a cron job, that starts executes the CLI client or implement a Java or Scala program that submits jobs a certain points in time.

Best, Fabian

2014-09-11 15:36 GMT+02:00 Flavio Pompermaier <[hidden email]>:
Hi to all,

I'd like to know if there's an example of how to schedule a Job in Flink.
Do we still need something like Oozie or Quartz or we can avoid them?

Best,
Flavio


Reply | Threaded
Open this post in threaded view
|

Re: Job scheduling

Flavio Pompermaier
Yes I was referring exactly to that, I was also involved in the Dopa project :)
So, at the moment what is the suggested way to schedule jobs with Flink?

On Thu, Sep 18, 2014 at 9:48 AM, Robert Metzger <[hidden email]> wrote:
Are you referring to this project? https://github.com/TU-Berlin/dopa-scheduler
Its not an official repository of the Flink (Stratosphere) project. I think a PhD student at TU Berlin created the code there. 



On Thu, Sep 11, 2014 at 4:29 PM, Flavio Pompermaier <[hidden email]> wrote:
Of course with Flink I could in principle execute almost everything with a single Job but, in general, I could write 2 different jobs and decide from time to time when the second should be run.
That's why also metheor scripts are very useful :)
From what I know there was a scheduler in Stratosphere that was using RabbitMQ, right?

I would like to avoid to run linux commands and instead use some REST interface to trigger or schedule jobs.

Best,
Flavio


On Thu, Sep 11, 2014 at 4:07 PM, Fabian Hueske <[hidden email]> wrote:
Hi Flavio,

what exactly do you mean by scheduling?
Do you want to run a job in regular intervals or execute a complex workflow?

Oozie is primarily used to orchestrate the execution of MapReduce workflows. Since, MR is a rather inflexible programming model, complex tasks need to split up into multiple dependent jobs that are executed once their predecessors have finished. Oozie orchestrates this execution.
In Flink, you can build a complex analysis flow as a single program and execute it. Hence, there is no need for a workflow scheduler such as Oozie.

If you want to run a job in regular intervals, you can configure a cron job, that starts executes the CLI client or implement a Java or Scala program that submits jobs a certain points in time.

Best, Fabian

2014-09-11 15:36 GMT+02:00 Flavio Pompermaier <[hidden email]>:
Hi to all,

I'd like to know if there's an example of how to schedule a Job in Flink.
Do we still need something like Oozie or Quartz or we can avoid them?

Best,
Flavio


Reply | Threaded
Open this post in threaded view
|

Re: Job scheduling

rmetzger0
I don't think that we have a suggested way.

If I would have the requirement, I would look into Oozie. I think its quite easy to add additional services (=Flink) into Oozie. In addition, it seems to have a REST interface and some other stuff.

If you want, you could also implement one yourself and contribute it back to Flink.

On Thu, Sep 18, 2014 at 10:11 AM, Flavio Pompermaier <[hidden email]> wrote:
Yes I was referring exactly to that, I was also involved in the Dopa project :)
So, at the moment what is the suggested way to schedule jobs with Flink?


On Thu, Sep 18, 2014 at 9:48 AM, Robert Metzger <[hidden email]> wrote:
Are you referring to this project? https://github.com/TU-Berlin/dopa-scheduler
Its not an official repository of the Flink (Stratosphere) project. I think a PhD student at TU Berlin created the code there. 



On Thu, Sep 11, 2014 at 4:29 PM, Flavio Pompermaier <[hidden email]> wrote:
Of course with Flink I could in principle execute almost everything with a single Job but, in general, I could write 2 different jobs and decide from time to time when the second should be run.
That's why also metheor scripts are very useful :)
From what I know there was a scheduler in Stratosphere that was using RabbitMQ, right?

I would like to avoid to run linux commands and instead use some REST interface to trigger or schedule jobs.

Best,
Flavio


On Thu, Sep 11, 2014 at 4:07 PM, Fabian Hueske <[hidden email]> wrote:
Hi Flavio,

what exactly do you mean by scheduling?
Do you want to run a job in regular intervals or execute a complex workflow?

Oozie is primarily used to orchestrate the execution of MapReduce workflows. Since, MR is a rather inflexible programming model, complex tasks need to split up into multiple dependent jobs that are executed once their predecessors have finished. Oozie orchestrates this execution.
In Flink, you can build a complex analysis flow as a single program and execute it. Hence, there is no need for a workflow scheduler such as Oozie.

If you want to run a job in regular intervals, you can configure a cron job, that starts executes the CLI client or implement a Java or Scala program that submits jobs a certain points in time.

Best, Fabian

2014-09-11 15:36 GMT+02:00 Flavio Pompermaier <[hidden email]>:
Hi to all,

I'd like to know if there's an example of how to schedule a Job in Flink.
Do we still need something like Oozie or Quartz or we can avoid them?

Best,
Flavio



Reply | Threaded
Open this post in threaded view
|

Re: Job scheduling

Flavio Pompermaier
I think it could be a useful feature to implement in Stockholm if I could be there.. :)

On Thu, Sep 18, 2014 at 10:23 AM, Robert Metzger <[hidden email]> wrote:
I don't think that we have a suggested way.

If I would have the requirement, I would look into Oozie. I think its quite easy to add additional services (=Flink) into Oozie. In addition, it seems to have a REST interface and some other stuff.

If you want, you could also implement one yourself and contribute it back to Flink.

On Thu, Sep 18, 2014 at 10:11 AM, Flavio Pompermaier <[hidden email]> wrote:
Yes I was referring exactly to that, I was also involved in the Dopa project :)
So, at the moment what is the suggested way to schedule jobs with Flink?


On Thu, Sep 18, 2014 at 9:48 AM, Robert Metzger <[hidden email]> wrote:
Are you referring to this project? https://github.com/TU-Berlin/dopa-scheduler
Its not an official repository of the Flink (Stratosphere) project. I think a PhD student at TU Berlin created the code there. 



On Thu, Sep 11, 2014 at 4:29 PM, Flavio Pompermaier <[hidden email]> wrote:
Of course with Flink I could in principle execute almost everything with a single Job but, in general, I could write 2 different jobs and decide from time to time when the second should be run.
That's why also metheor scripts are very useful :)
From what I know there was a scheduler in Stratosphere that was using RabbitMQ, right?

I would like to avoid to run linux commands and instead use some REST interface to trigger or schedule jobs.

Best,
Flavio


On Thu, Sep 11, 2014 at 4:07 PM, Fabian Hueske <[hidden email]> wrote:
Hi Flavio,

what exactly do you mean by scheduling?
Do you want to run a job in regular intervals or execute a complex workflow?

Oozie is primarily used to orchestrate the execution of MapReduce workflows. Since, MR is a rather inflexible programming model, complex tasks need to split up into multiple dependent jobs that are executed once their predecessors have finished. Oozie orchestrates this execution.
In Flink, you can build a complex analysis flow as a single program and execute it. Hence, there is no need for a workflow scheduler such as Oozie.

If you want to run a job in regular intervals, you can configure a cron job, that starts executes the CLI client or implement a Java or Scala program that submits jobs a certain points in time.

Best, Fabian

2014-09-11 15:36 GMT+02:00 Flavio Pompermaier <[hidden email]>:
Hi to all,

I'd like to know if there's an example of how to schedule a Job in Flink.
Do we still need something like Oozie or Quartz or we can avoid them?

Best,
Flavio