In https://ci.apache.org/projects/flink/flink-docs-release-1.4/ops/cli.html it is shown that
for gracefully stopping a job you need to implement the StoppableFunction interface. This appears not (yet) implemented for Kafka consumers. Am I missing something, or is there a different way to gracefully stop a job using a kafka source so we can restart it later without losing any (in flight) events? - bart |
Hi Bart, you're right that Flink currently does not support a graceful stop mechanism for the Kafka source. The community has already a good idea how to solve it in the general case and will hopefully soon add it to Flink. Concerning the StoppableFunction: This interface was introduced quite some time ago and currently only works for some batch sources. In order to make it work with streaming, we need to add some more functionality to the engine in order to properly stop and take a savepoint. Cheers, Till On Mon, Feb 19, 2018 at 3:36 PM, Bart Kastermans <[hidden email]> wrote: In https://ci.apache.org/ |
Thanks for the reply; is there a flip for this?
- bart
On Mon, Feb 19, 2018, at 5:50 PM, Till Rohrmann wrote:
|
In reply to this post by Till Rohrmann
Hmm, I did not realize that.
I was planning when upgrading a job (consuming from Kafka) to cancel it with a savepoint and then start it back from the savedpoint. But this savedpoint thing was giving me the apparently false feeling I would not lose anything? My understanding was that maybe I would process some events twice in this case but certainly not miss events entirely. Did I misunderstand this thread? If not this sounds like pretty annoying? Do people have some sort of workaround for that? Thanks, -- Christophe On Mon, Feb 19, 2018 at 5:50 PM, Till Rohrmann <[hidden email]> wrote:
|
Hi Christophe, yes I think you misunderstood the thread. Cancel with savepoint will never cause any data loss. The only problem which might arise if you have an operator which writes data to an external system immediately, then you might see some data in the external system which originates from after the savepoint. By implementing the interaction with the external system, for example only flush on notify checkpoint complete, you can solve this problem. The bottom line is that if you don't do it like this, then you might see some duplicate data. The Kafka exactly once sink, for example, is implemented such that it takes care of this problem and gives you exactly once guarantees. Cheers, Till On Tue, Feb 20, 2018 at 11:51 PM, Christophe Jolif <[hidden email]> wrote:
|
@Bart, I think there is no Flip yet for the proper stop with savepoint implementation. My gut feeling is that the community will soon address this problem since it's a heavily requested feature. Cheers, Till On Wed, Feb 21, 2018 at 10:26 AM, Till Rohrmann <[hidden email]> wrote:
|
In reply to this post by Till Rohrmann
Ok. Thanks a lot of the clarification. That was my initial understanding but then was confused by the "losing in-flight events" wording.
On Wed, Feb 21, 2018 at 10:26 AM, Till Rohrmann <[hidden email]> wrote:
Christophe
|
Free forum by Nabble | Edit this page |