(DEPRECATED) Apache Flink User Mailing List archive.

Scheduling sources

Classic

List

Threaded

7 messages Options

Averell

Scheduling sources

Hi everyone,

I have 2 file sources, which I want to start reading them in a specified
order (e.g: source2 should only start 5 minutes after source1 has started).
I could not find any Flink document mentioning this capability, and I also
tried to search the mailing list, without any success.
However, on Flink GUI there's a Timeline tab which shows start-time of each
operator. And this gives me a hope that there is something that can help
with my requirement.
(http://localhost:20888/proxy/application_1537700592704_0026/#/jobs/0360094da093e36299273329f9dec19d/timeline)

Could you please help give some help?

Thanks and best regards,
Averell

--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Till Rohrmann

Re: Scheduling sources

Hi Averell,

such a feature is currently not supported by Flink. The scheduling works by starting all sources at the same time. Depending whether it is a batch or streaming job, you either start deploying consumers once producers have produced some results or right away.

Cheers,

Till

On Tue, Sep 25, 2018 at 8:16 AM Averell <[hidden email]> wrote:

Hi everyone,

I have 2 file sources, which I want to start reading them in a specified
order (e.g: source2 should only start 5 minutes after source1 has started).
I could not find any Flink document mentioning this capability, and I also
tried to search the mailing list, without any success.
However, on Flink GUI there's a Timeline tab which shows start-time of each
operator. And this gives me a hope that there is something that can help
with my requirement.
(http://localhost:20888/proxy/application_1537700592704_0026/#/jobs/0360094da093e36299273329f9dec19d/timeline)

Could you please help give some help?

Thanks and best regards,
Averell

--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Averell

Re: Scheduling sources

Thank you Till.

My use case is like this: I have two streams, one is raw data (1), the
other is enrichment data (2), which in turn consists of two component:
initial enrichment data (2a) which comes from an RDBMS table, and
incremental data (2b) which comes from a Kafka stream. To ensure that (1)
gets enriched properly, I want to have (2a) loaded properly into memory
before starting to process (1).

Is there any walkaround solution for me in this case?

Thanks and best regards,
Averell

--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Kostas Kloudas

Re: Scheduling sources

Hi Averell,

If the 2a fits in memory, then you can load the data to all TMs in the open() method

of any rich function, eg. ProcessFunction [1]. The open() runs before any data is allowed

to flow in your pipeline from the sources.

Cheers,

Kostas

[1] https://ci.apache.org/projects/flink/flink-docs-stable/dev/stream/operators/process_function.html

On Sep 26, 2018, at 2:04 AM, Averell <[hidden email]> wrote:

Thank you Till.

My use case is like this: I have two streams, one is raw data (1), the
other is enrichment data (2), which in turn consists of two component:
initial enrichment data (2a) which comes from an RDBMS table, and
incremental data (2b) which comes from a Kafka stream. To ensure that (1)
gets enriched properly, I want to have (2a) loaded properly into memory
before starting to process (1).

Is there any walkaround solution for me in this case?

Thanks and best regards,
Averell

--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Averell

Re: Scheduling sources

Hi Kostas,

So that means my 2a will be broadcasted to all TMs? Is that possible to
partition that? As I'm using CoProcessFunction to join 1 with 2.

Thanks and best regards,
Averell

--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

tison

Re: Scheduling sources

Hi Averell,

As Till pointed out, currently Flink doesn't provide such a flexible schedule strategy.

However, our team internally implemented a mechanism that allow user-define schedule plugin(flexible schedule strategy). It could fit your case by setting a timer on schedule start and trigger (1) after a fixed delay; or when (2) loaded, a execution event be sent to the schedule plugin then it schedules (1).

This work is on going to contribute back to Flink master, you can follow this JIRA[1] for more information. We are glad if this work could help you.

Best,

tison.

[1] https://issues.apache.org/jira/browse/FLINK-10240

Averell <[hidden email]> 于2018年9月26日周三下午3:46写道：

Hi Kostas,

So that means my 2a will be broadcasted to all TMs? Is that possible to
partition that? As I'm using CoProcessFunction to join 1 with 2.

Thanks and best regards,
Averell

--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Averell

Re: Scheduling sources

Hi Tison,

"/setting a timer on schedule start and trigger (1) after a fixed delay/"
would be quite sufficient for me.
Looking forward to the change of that Jira ticket's status.

Thanks for your help.

Regards,
Averell

--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/