Scheduling sources

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Scheduling sources

Averell
Hi everyone,

I have 2 file sources, which I want to start reading them in a specified
order (e.g: source2 should only start 5 minutes after source1 has started).
I could not find any Flink document mentioning this capability, and I also
tried to search the mailing list, without any success.
However, on Flink GUI there's a Timeline tab which shows start-time of each
operator. And this gives me a hope that there is something that can help
with my requirement.
(http://localhost:20888/proxy/application_1537700592704_0026/#/jobs/0360094da093e36299273329f9dec19d/timeline)

Could you please help give some help?

Thanks and best regards,
Averell



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re: Scheduling sources

Till Rohrmann
Hi Averell,

such a feature is currently not supported by Flink. The scheduling works by starting all sources at the same time. Depending whether it is a batch or streaming job, you either start deploying consumers once producers have produced some results or right away.

Cheers,
Till

On Tue, Sep 25, 2018 at 8:16 AM Averell <[hidden email]> wrote:
Hi everyone,

I have 2 file sources, which I want to start reading them in a specified
order (e.g: source2 should only start 5 minutes after source1 has started).
I could not find any Flink document mentioning this capability, and I also
tried to search the mailing list, without any success.
However, on Flink GUI there's a Timeline tab which shows start-time of each
operator. And this gives me a hope that there is something that can help
with my requirement.
(http://localhost:20888/proxy/application_1537700592704_0026/#/jobs/0360094da093e36299273329f9dec19d/timeline)

Could you please help give some help?

Thanks and best regards,
Averell



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re: Scheduling sources

Averell
Thank you Till.

My use case is like this: I  have two streams, one is raw data (1), the
other is enrichment data (2), which in turn consists of two component:
initial enrichment data (2a) which comes from an RDBMS table, and
incremental data (2b) which comes from a Kafka stream. To ensure that (1)
gets enriched properly, I want to have (2a) loaded properly into memory
before starting to process (1).

Is there any walkaround solution for me in this case?

Thanks and best regards,
Averell



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re: Scheduling sources

Kostas Kloudas
Hi Averell,

If the 2a fits in memory, then you can load the data to all TMs in the open() method 
of any rich function, eg. ProcessFunction [1]. The open() runs before any data is allowed 
to flow in your pipeline from the sources.

Cheers, 
Kostas


On Sep 26, 2018, at 2:04 AM, Averell <[hidden email]> wrote:

Thank you Till.

My use case is like this: I  have two streams, one is raw data (1), the
other is enrichment data (2), which in turn consists of two component:
initial enrichment data (2a) which comes from an RDBMS table, and
incremental data (2b) which comes from a Kafka stream. To ensure that (1)
gets enriched properly, I want to have (2a) loaded properly into memory
before starting to process (1).

Is there any walkaround solution for me in this case?

Thanks and best regards,
Averell



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Reply | Threaded
Open this post in threaded view
|

Re: Scheduling sources

Averell
Hi Kostas,

So that means my 2a will be broadcasted to all TMs? Is that possible to
partition that? As I'm using CoProcessFunction to join 1 with 2.

Thanks and best regards,
Averell



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re: Scheduling sources

tison
Hi Averell,

As Till pointed out, currently Flink doesn't provide such a flexible schedule strategy.

However, our team internally implemented a mechanism that allow user-define schedule plugin(flexible schedule strategy). It could fit your case by setting a timer on schedule start and trigger (1) after a fixed delay; or when (2) loaded, a execution event be sent to the schedule plugin then it schedules (1).

This work is on going to contribute back to Flink master, you can follow this JIRA[1] for more information. We are glad if this work could help you.



Averell <[hidden email]> 于2018年9月26日周三 下午3:46写道:
Hi Kostas,

So that means my 2a will be broadcasted to all TMs? Is that possible to
partition that? As I'm using CoProcessFunction to join 1 with 2.

Thanks and best regards,
Averell



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re: Scheduling sources

Averell
Hi Tison,

"/setting a timer on schedule start and trigger (1) after a fixed delay/"
would be quite sufficient for me.
Looking forward to the change of that Jira ticket's status.

Thanks for your help.

Regards,
Averell



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/