Hi,
Is there a way to stop temporarily to consume one kafka source in streaming mode ? Use case: I have to consume 2 topics but in fact one of them is more prioritized. One of this topic is dedicated to ingest data from db (change data capture) and one of them is dedicated to make a synchronization (a dump i.e. a SELECT ... from db). At the moment the last one is performed by one Flink job and we start this one after stop the previous one (CDC) manually I want to merge these 2 modes and automatically stop consumption of the topic dedicated to the CDC mode when a dump is done. How to handle that with Flink in a streaming way ? backpressure ? ... Thx in advance for your insights David |
Hi,
I’d like to share my opinion here. It seems that you need adjust the Kafka consumer to have communication each other. When your begin the dump process, you need to notify another CDC-topic consumer to wait idle. Best, Terry Wang > 2020年1月2日 16:49,David Morin <[hidden email]> 写道: > > Hi, > > Is there a way to stop temporarily to consume one kafka source in streaming mode ? > Use case: I have to consume 2 topics but in fact one of them is more prioritized. > One of this topic is dedicated to ingest data from db (change data capture) and one of them is dedicated to make a synchronization (a dump i.e. a SELECT ... from db). At the moment the last one is performed by one Flink job and we start this one after stop the previous one (CDC) manually > I want to merge these 2 modes and automatically stop consumption of the topic dedicated to the CDC mode when a dump is done. > How to handle that with Flink in a streaming way ? backpressure ? ... > Thx in advance for your insights > > David |
Are you asking how to detect from within the job whether the dump is
complete, or how to combine these 2 jobs? If you had a way to notice whether the dump is complete, then I would suggest to create a custom source that wraps 2 kafka sources, and switch between them at will based on your conditions. On 03/01/2020 03:53, Terry Wang wrote: > Hi, > > I’d like to share my opinion here. It seems that you need adjust the Kafka consumer to have communication each other. When your begin the dump process, you need to notify another CDC-topic consumer to wait idle. > > > Best, > Terry Wang > > > >> 2020年1月2日 16:49,David Morin <[hidden email]> 写道: >> >> Hi, >> >> Is there a way to stop temporarily to consume one kafka source in streaming mode ? >> Use case: I have to consume 2 topics but in fact one of them is more prioritized. >> One of this topic is dedicated to ingest data from db (change data capture) and one of them is dedicated to make a synchronization (a dump i.e. a SELECT ... from db). At the moment the last one is performed by one Flink job and we start this one after stop the previous one (CDC) manually >> I want to merge these 2 modes and automatically stop consumption of the topic dedicated to the CDC mode when a dump is done. >> How to handle that with Flink in a streaming way ? backpressure ? ... >> Thx in advance for your insights >> >> David > |
Hi,
Thanks for your replies. Yes Terry. You are right. I can try to create a custom source. But perhaps, according to my use case, I figured out I can use a technical field in my data. This is a timestamp and I think I just have to ignore late events with watermarks or later in the pipeline according to metadata stored in the Flink state. I test it now... Thx David On 2020/01/03 15:44:08, Chesnay Schepler <[hidden email]> wrote: > Are you asking how to detect from within the job whether the dump is > complete, or how to combine these 2 jobs? > > If you had a way to notice whether the dump is complete, then I would > suggest to create a custom source that wraps 2 kafka sources, and switch > between them at will based on your conditions. > > > On 03/01/2020 03:53, Terry Wang wrote: > > Hi, > > > > I’d like to share my opinion here. It seems that you need adjust the Kafka consumer to have communication each other. When your begin the dump process, you need to notify another CDC-topic consumer to wait idle. > > > > > > Best, > > Terry Wang > > > > > > > >> 2020年1月2日 16:49,David Morin <[hidden email]> 写道: > >> > >> Hi, > >> > >> Is there a way to stop temporarily to consume one kafka source in streaming mode ? > >> Use case: I have to consume 2 topics but in fact one of them is more prioritized. > >> One of this topic is dedicated to ingest data from db (change data capture) and one of them is dedicated to make a synchronization (a dump i.e. a SELECT ... from db). At the moment the last one is performed by one Flink job and we start this one after stop the previous one (CDC) manually > >> I want to merge these 2 modes and automatically stop consumption of the topic dedicated to the CDC mode when a dump is done. > >> How to handle that with Flink in a streaming way ? backpressure ? ... > >> Thx in advance for your insights > >> > >> David > > > > |
My naive solution can't work because a dump can be quite long.
So, yes I have to find a way to stop the consumption from the topic used for streaming mode when a dump is done :( Terry, I try to implement something based on your reply and based on this thread https://stackoverflow.com/questions/59201503/flink-kafka-gracefully-close-flink-consuming-messages-from-kafka-source-after-a Any suggestions are welcomed thx. David On 2020/01/06 09:35:37, David Morin <[hidden email]> wrote: > Hi, > > Thanks for your replies. > Yes Terry. You are right. I can try to create a custom source. > But perhaps, according to my use case, I figured out I can use a technical field in my data. This is a timestamp and I think I just have to ignore late events with watermarks or later in the pipeline according to metadata stored in the Flink state. I test it now... > Thx > > David > > On 2020/01/03 15:44:08, Chesnay Schepler <[hidden email]> wrote: > > Are you asking how to detect from within the job whether the dump is > > complete, or how to combine these 2 jobs? > > > > If you had a way to notice whether the dump is complete, then I would > > suggest to create a custom source that wraps 2 kafka sources, and switch > > between them at will based on your conditions. > > > > > > On 03/01/2020 03:53, Terry Wang wrote: > > > Hi, > > > > > > I’d like to share my opinion here. It seems that you need adjust the Kafka consumer to have communication each other. When your begin the dump process, you need to notify another CDC-topic consumer to wait idle. > > > > > > > > > Best, > > > Terry Wang > > > > > > > > > > > >> 2020年1月2日 16:49,David Morin <[hidden email]> 写道: > > >> > > >> Hi, > > >> > > >> Is there a way to stop temporarily to consume one kafka source in streaming mode ? > > >> Use case: I have to consume 2 topics but in fact one of them is more prioritized. > > >> One of this topic is dedicated to ingest data from db (change data capture) and one of them is dedicated to make a synchronization (a dump i.e. a SELECT ... from db). At the moment the last one is performed by one Flink job and we start this one after stop the previous one (CDC) manually > > >> I want to merge these 2 modes and automatically stop consumption of the topic dedicated to the CDC mode when a dump is done. > > >> How to handle that with Flink in a streaming way ? backpressure ? ... > > >> Thx in advance for your insights > > >> > > >> David > > > > > > > > |
I'd second Chesnay's suggestion to use a custom source. It would be a piece of cake with FLIP-27 [1], but we are not there yet unfortunately. It's probably in Flink 1.11 (mid year) if you can wait. The current way would be a source that wraps the two KafkaConsumer and blocks the normal consumer from outputting elements. Here is a quick and dirty solution that I threw together: https://gist.github.com/AHeise/d7a8662f091e5a135c5ccfd6630634dd . On Mon, Jan 6, 2020 at 1:16 PM David Morin <[hidden email]> wrote: My naive solution can't work because a dump can be quite long. |
Awesome ! I gonna implement it. Thanks a lot Arvid. Le mer. 8 janv. 2020 à 12:00, Arvid Heise <[hidden email]> a écrit :
|
Free forum by Nabble | Edit this page |