[Question] enable end2end Kafka exactly once processing

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

[Question] enable end2end Kafka exactly once processing

Eleanore Jin
Hi experts, 

My application is using Apache Beam and with Flink to be the runner. My source and sink are kafka topics, and I am using KafkaIO connector provided by Apache Beam to consume and publish. 


It looks like Beam does not support Flink Runner for EOS, can someone please shad some lights on how to enable exactly once processing with Apache Beam?

Thanks a lot!
Eleanore
Reply | Threaded
Open this post in threaded view
|

Re: [Question] enable end2end Kafka exactly once processing

Arvid Heise-3
Hi Eleanore,

the flink runner is maintained by the Beam developers, so it's best to ask on their user list.

The documentation is, however, very clear. "Flink runner is one of the runners whose checkpoint semantics are not compatible with current implementation (hope to provide a solution in near future)."
So, Beam uses a different approach to EOS than Flink and there is currently no way around it. Maybe, you could use the EOS Kafka Sink of Flink directly and use that in Beam somehow.

I'm not aware of any work with the Beam devs to actually make it work. Independently, we started to improve our interfaces for two phase commit sinks (which is our approach). It might coincidentally help Beam.

Best,

Arvid

On Sun, Mar 1, 2020 at 8:23 PM Jin Yi <[hidden email]> wrote:
Hi experts, 

My application is using Apache Beam and with Flink to be the runner. My source and sink are kafka topics, and I am using KafkaIO connector provided by Apache Beam to consume and publish. 


It looks like Beam does not support Flink Runner for EOS, can someone please shad some lights on how to enable exactly once processing with Apache Beam?

Thanks a lot!
Eleanore