Flink - Once and once processing

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Flink - Once and once processing

M Singh
Hi:

I have a use case where we need to update a counter in a db and for this need to guarantee once only processing.  If we have some entries in a batch and it partially updates the counters and then fails, if Flink retries the processing for that batch, some of the counters will be updated twice (the ones which succeeded in the first batch).

I think in order to guarantee once only processing, I will have to set the buffer size to zero (ie, send one item at a time).

Is there any alternative configuration or suggestion on how I can achieve once only updates using a batch mode with partial failures ?

Thanks

Mans

Reply | Threaded
Open this post in threaded view
|

Re: Flink - Once and once processing

snntr
Hi Mans,

depending on the number of operations and the particular database, you
might be able to use transactions.

Maybe you can also find a data model, which is more resilient to these
kind of failures.

Cheers,

Konstantin

On 29.07.2016 19:26, M Singh wrote:

> Hi:
>
> I have a use case where we need to update a counter in a db and for this
> need to guarantee once only processing.  If we have some entries in a
> batch and it partially updates the counters and then fails, if Flink
> retries the processing for that batch, some of the counters will be
> updated twice (the ones which succeeded in the first batch).
>
> I think in order to guarantee once only processing, I will have to set
> the buffer size to zero (ie, send one item at a time).
>
> Is there any alternative configuration or suggestion on how I can
> achieve once only updates using a batch mode with partial failures ?
>
> Thanks
>
> Mans
>
--
Konstantin Knauf * [hidden email] * +49-174-3413182
TNG Technology Consulting GmbH, Betastr. 13a, 85774 Unterföhring
Geschäftsführer: Henrik Klagges, Christoph Stock, Dr. Robert Dahlke
Sitz: Unterföhring * Amtsgericht München * HRB 135082


signature.asc (853 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Flink - Once and once processing

M Singh
Thanks Konstantin.

Just to clarify - unless the target database is resilient to duplicates, Flink's once-only configuration will not avoid duplicate updates.

Mans


On Saturday, July 30, 2016 7:40 AM, Konstantin Knauf <[hidden email]> wrote:


Hi Mans,

depending on the number of operations and the particular database, you
might be able to use transactions.

Maybe you can also find a data model, which is more resilient to these
kind of failures.

Cheers,

Konstantin

On 29.07.2016 19:26, M Singh wrote:

> Hi:
>
> I have a use case where we need to update a counter in a db and for this
> need to guarantee once only processing.  If we have some entries in a
> batch and it partially updates the counters and then fails, if Flink
> retries the processing for that batch, some of the counters will be
> updated twice (the ones which succeeded in the first batch).
>
> I think in order to guarantee once only processing, I will have to set
> the buffer size to zero (ie, send one item at a time).
>
> Is there any alternative configuration or suggestion on how I can
> achieve once only updates using a batch mode with partial failures ?
>
> Thanks
>
> Mans

>

--
Konstantin Knauf * [hidden email] * +49-174-3413182
TNG Technology Consulting GmbH, Betastr. 13a, 85774 Unterföhring
Geschäftsführer: Henrik Klagges, Christoph Stock, Dr. Robert Dahlke
Sitz: Unterföhring * Amtsgericht München * HRB 135082



Reply | Threaded
Open this post in threaded view
|

Re: Flink - Once and once processing

milind parikh

Flink operates in conjunction with sources and sinks. So ,yes, there are things that an underlying sink  (or a source) must support in conjunction with   Flink to enable a particular semantic.

On Jul 30, 2016 11:46 AM, "M Singh" <[hidden email]> wrote:
Thanks Konstantin.

Just to clarify - unless the target database is resilient to duplicates, Flink's once-only configuration will not avoid duplicate updates.

Mans


On Saturday, July 30, 2016 7:40 AM, Konstantin Knauf <[hidden email]> wrote:


Hi Mans,

depending on the number of operations and the particular database, you
might be able to use transactions.

Maybe you can also find a data model, which is more resilient to these
kind of failures.

Cheers,

Konstantin

On 29.07.2016 19:26, M Singh wrote:

> Hi:
>
> I have a use case where we need to update a counter in a db and for this
> need to guarantee once only processing.  If we have some entries in a
> batch and it partially updates the counters and then fails, if Flink
> retries the processing for that batch, some of the counters will be
> updated twice (the ones which succeeded in the first batch).
>
> I think in order to guarantee once only processing, I will have to set
> the buffer size to zero (ie, send one item at a time).
>
> Is there any alternative configuration or suggestion on how I can
> achieve once only updates using a batch mode with partial failures ?
>
> Thanks
>
> Mans

>

--
Konstantin Knauf * [hidden email] * <a href="tel:%2B49-174-3413182" value="+491743413182" target="_blank">+49-174-3413182
TNG Technology Consulting GmbH, Betastr. 13a, 85774 Unterföhring
Geschäftsführer: Henrik Klagges, Christoph Stock, Dr. Robert Dahlke
Sitz: Unterföhring * Amtsgericht München * HRB 135082



Reply | Threaded
Open this post in threaded view
|

Re: Flink - Once and once processing

Till Rohrmann
Hi Mans,

Milind is right that in general external systems have to play along if you want to achieve exactly once processing guarantees while writing to these systems. Either by supporting idempotent operations or by allowing to roll back their state.

In the batch world, this usually means to overwrite data from a previously failed execution run completely or having a unique key which does not change across runs.

In the case of streaming we can achieve exactly once guarantees by committing the data to the external system only after we have taken a checkpoint and buffering the data in between. This guarantees that the changes are only materialized after we are sure that we can go back to a checkpoint where we've already seen all the elements which might have caused the sink output. You can take a look at the CassandraSink where we're exactly doing this.

Cheers,
Till

On Sun, Jul 31, 2016 at 2:59 AM, milind parikh <[hidden email]> wrote:

Flink operates in conjunction with sources and sinks. So ,yes, there are things that an underlying sink  (or a source) must support in conjunction with   Flink to enable a particular semantic.

On Jul 30, 2016 11:46 AM, "M Singh" <[hidden email]> wrote:
Thanks Konstantin.

Just to clarify - unless the target database is resilient to duplicates, Flink's once-only configuration will not avoid duplicate updates.

Mans


On Saturday, July 30, 2016 7:40 AM, Konstantin Knauf <[hidden email]> wrote:


Hi Mans,

depending on the number of operations and the particular database, you
might be able to use transactions.

Maybe you can also find a data model, which is more resilient to these
kind of failures.

Cheers,

Konstantin

On 29.07.2016 19:26, M Singh wrote:

> Hi:
>
> I have a use case where we need to update a counter in a db and for this
> need to guarantee once only processing.  If we have some entries in a
> batch and it partially updates the counters and then fails, if Flink
> retries the processing for that batch, some of the counters will be
> updated twice (the ones which succeeded in the first batch).
>
> I think in order to guarantee once only processing, I will have to set
> the buffer size to zero (ie, send one item at a time).
>
> Is there any alternative configuration or suggestion on how I can
> achieve once only updates using a batch mode with partial failures ?
>
> Thanks
>
> Mans

>

--
Konstantin Knauf * [hidden email] * <a href="tel:%2B49-174-3413182" value="+491743413182" target="_blank">+49-174-3413182
TNG Technology Consulting GmbH, Betastr. 13a, 85774 Unterföhring
Geschäftsführer: Henrik Klagges, Christoph Stock, Dr. Robert Dahlke
Sitz: Unterföhring * Amtsgericht München * HRB 135082




Reply | Threaded
Open this post in threaded view
|

Re: Flink - Once and once processing

M Singh
Thanks Till.  I will take a look at your pointers.  Mans


On Monday, August 1, 2016 6:27 AM, Till Rohrmann <[hidden email]> wrote:


Hi Mans,

Milind is right that in general external systems have to play along if you want to achieve exactly once processing guarantees while writing to these systems. Either by supporting idempotent operations or by allowing to roll back their state.

In the batch world, this usually means to overwrite data from a previously failed execution run completely or having a unique key which does not change across runs.

In the case of streaming we can achieve exactly once guarantees by committing the data to the external system only after we have taken a checkpoint and buffering the data in between. This guarantees that the changes are only materialized after we are sure that we can go back to a checkpoint where we've already seen all the elements which might have caused the sink output. You can take a look at the CassandraSink where we're exactly doing this.

Cheers,
Till

On Sun, Jul 31, 2016 at 2:59 AM, milind parikh <[hidden email]> wrote:
Flink operates in conjunction with sources and sinks. So ,yes, there are things that an underlying sink  (or a source) must support in conjunction with   Flink to enable a particular semantic.
On Jul 30, 2016 11:46 AM, "M Singh" <[hidden email]> wrote:
Thanks Konstantin.

Just to clarify - unless the target database is resilient to duplicates, Flink's once-only configuration will not avoid duplicate updates.

Mans


On Saturday, July 30, 2016 7:40 AM, Konstantin Knauf <[hidden email]> wrote:


Hi Mans,

depending on the number of operations and the particular database, you
might be able to use transactions.

Maybe you can also find a data model, which is more resilient to these
kind of failures.

Cheers,

Konstantin

On 29.07.2016 19:26, M Singh wrote:

> Hi:
>
> I have a use case where we need to update a counter in a db and for this
> need to guarantee once only processing.  If we have some entries in a
> batch and it partially updates the counters and then fails, if Flink
> retries the processing for that batch, some of the counters will be
> updated twice (the ones which succeeded in the first batch).
>
> I think in order to guarantee once only processing, I will have to set
> the buffer size to zero (ie, send one item at a time).
>
> Is there any alternative configuration or suggestion on how I can
> achieve once only updates using a batch mode with partial failures ?
>
> Thanks
>
> Mans

>

--
Konstantin Knauf * [hidden email] * +49-174-3413182
TNG Technology Consulting GmbH, Betastr. 13a, 85774 Unterföhring
Geschäftsführer: Henrik Klagges, Christoph Stock, Dr. Robert Dahlke
Sitz: Unterföhring * Amtsgericht München * HRB 135082