Hi everyone,
I have 2 file sources, which I want to start reading them in a specified order (e.g: source2 should only start 5 minutes after source1 has started). I could not find any Flink document mentioning this capability, and I also tried to search the mailing list, without any success. However, on Flink GUI there's a Timeline tab which shows start-time of each operator. And this gives me a hope that there is something that can help with my requirement. (http://localhost:20888/proxy/application_1537700592704_0026/#/jobs/0360094da093e36299273329f9dec19d/timeline) Could you please help give some help? Thanks and best regards, Averell -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/ |
Hi Averell, such a feature is currently not supported by Flink. The scheduling works by starting all sources at the same time. Depending whether it is a batch or streaming job, you either start deploying consumers once producers have produced some results or right away. Cheers, Till On Tue, Sep 25, 2018 at 8:16 AM Averell <[hidden email]> wrote: Hi everyone, |
Thank you Till.
My use case is like this: I have two streams, one is raw data (1), the other is enrichment data (2), which in turn consists of two component: initial enrichment data (2a) which comes from an RDBMS table, and incremental data (2b) which comes from a Kafka stream. To ensure that (1) gets enriched properly, I want to have (2a) loaded properly into memory before starting to process (1). Is there any walkaround solution for me in this case? Thanks and best regards, Averell -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/ |
Hi Averell,
If the 2a fits in memory, then you can load the data to all TMs in the open() method of any rich function, eg. ProcessFunction [1]. The open() runs before any data is allowed to flow in your pipeline from the sources. Cheers, Kostas [1] https://ci.apache.org/projects/flink/flink-docs-stable/dev/stream/operators/process_function.html
|
Hi Kostas,
So that means my 2a will be broadcasted to all TMs? Is that possible to partition that? As I'm using CoProcessFunction to join 1 with 2. Thanks and best regards, Averell -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/ |
Hi Averell, As Till pointed out, currently Flink doesn't provide such a flexible schedule strategy. However, our team internally implemented a mechanism that allow user-define schedule plugin(flexible schedule strategy). It could fit your case by setting a timer on schedule start and trigger (1) after a fixed delay; or when (2) loaded, a execution event be sent to the schedule plugin then it schedules (1). This work is on going to contribute back to Flink master, you can follow this JIRA[1] for more information. We are glad if this work could help you. Averell <[hidden email]> 于2018年9月26日周三 下午3:46写道: Hi Kostas, |
Hi Tison,
"/setting a timer on schedule start and trigger (1) after a fixed delay/" would be quite sufficient for me. Looking forward to the change of that Jira ticket's status. Thanks for your help. Regards, Averell -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/ |
Free forum by Nabble | Edit this page |