Cassandra sink wrt Counters

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Cassandra sink wrt Counters

milind parikh

Given FLINK 3311 & 3332, I am wondering it would be possible,  without idempotent counters in Cassandra, to deliver on an exactly once sink into Cassandra. I do note that the verbiage/disc2 in 3332 does warn the user that this is not exactly "exactly once" sink.

However my question  has to do with whether having idempotent counters and a Data model that enables all other idempotent operations are a necessary prerequisite to exactly once semantics in flink.

Asked a different way, what source and sink would enable a end-to-end exactly - once semantics,  in the current state-of-the-art, with Flink in the middle.

Thanks
Milind

Reply | Threaded
Open this post in threaded view
|

Re: Cassandra sink wrt Counters

Chesnay Schepler
Hello Milind,

I'm not entirely sure i fully understood your question, but I'll try anyway :)

There is now way to provide exactly-once semantics for Cassandra's counters. As such we (will) only provide exactly-once semantics for a subset of Cassandra operations; idempotent inserts/updates.

There are several things that would allow exactly-once semantics:
  • transactions
    • rather obvious i think
  • replaying/rollback to a given state
    • replay for sources/rollback for sinks upon failure
  • an atomic idempotent update across 2 tables.
    • allows tracking every read/write made; selectively re-read/write upon failure
One of the key requisites is proper failure reporting though; if an update fails we need to know. As far as i know Cassandra doesn't make this guarantee.

Regards,
Chesnay Schepler

On 10.05.2016 07:48, milind parikh wrote:

Given FLINK 3311 & 3332, I am wondering it would be possible,  without idempotent counters in Cassandra, to deliver on an exactly once sink into Cassandra. I do note that the verbiage/disc2 in 3332 does warn the user that this is not exactly "exactly once" sink.

However my question  has to do with whether having idempotent counters and a Data model that enables all other idempotent operations are a necessary prerequisite to exactly once semantics in flink.

Asked a different way, what source and sink would enable a end-to-end exactly - once semantics,  in the current state-of-the-art, with Flink in the middle.

Thanks
Milind


Reply | Threaded
Open this post in threaded view
|

Re: Cassandra sink wrt Counters

milind parikh

Hi Chesnay

Sorry for asking the question in a confusing manner. Being new to flink, there are many questions swirling around in my head.

Thanks for the details in your answers. Here's the facts , as I see them:

(a) Cassandra Counters are not idempotent
(b) The failures, in context of Cassandra, are not the typical failures of an ACID transaction. The failure indicate that the operation was not able to continue at the specified transaction level; meaning that at least one of the nodes didn't ack back in the requisite amount of time the reads or the writes. This failure is NOT indicative of the fact that some node (or many ) might have seen and processed the reads or writes; just that at least one of the nodes did not. There is no rollback either. The antientropy features of Cassandra will kick in and attempt to correct the situation internal to Cassandra. From an external system, though, the situation is different....if such failure occurs, one could try to retry the operation (specifically writes) again outside of Cassandra; provided one has the ability to do so through an intermediate layer (think flink)and the write is specifically modeled to be idempotent in the data model (specifically Rowkey design).

One could model the data model so as to make Flink work exceptionally well with Cassandra; except counter tables. There is no way in Cassandra currently to model an idempotent counter table that I know of. Therefore an event replay that affects a counter might end up double counting.

When will the Cassandra sink be released?  I am ready to test it out even now.

Hello Milind,

I'm not entirely sure i fully understood your question, but I'll try anyway :)

There is now way to provide exactly-once semantics for Cassandra's counters. As such we (will) only provide exactly-once semantics for a subset of Cassandra operations; idempotent inserts/updates.

There are several things that would allow exactly-once semantics:
  • transactions
    • rather obvious i think
  • replaying/rollback to a given state
    • replay for sources/rollback for sinks upon failure
  • an atomic idempotent update across 2 tables.
    • allows tracking every read/write made; selectively re-read/write upon failure
One of the key requisites is proper failure reporting though; if an update fails we need to know. As far as i know Cassandra doesn't make this guarantee.

Regards,
Chesnay Schepler

On 10.05.2016 07:48, milind parikh wrote:

Given FLINK 3311 & 3332, I am wondering it would be possible,  without idempotent counters in Cassandra, to deliver on an exactly once sink into Cassandra. I do note that the verbiage/disc2 in 3332 does warn the user that this is not exactly "exactly once" sink.

However my question  has to do with whether having idempotent counters and a Data model that enables all other idempotent operations are a necessary prerequisite to exactly once semantics in flink.

Asked a different way, what source and sink would enable a end-to-end exactly - once semantics,  in the current state-of-the-art, with Flink in the middle.

Thanks
Milind


Reply | Threaded
Open this post in threaded view
|

Re: Cassandra sink wrt Counters

Ufuk Celebi
On Tue, May 10, 2016 at 5:36 PM, milind parikh <[hidden email]> wrote:
> When will the Cassandra sink be released?  I am ready to test it out even
> now.

You can work with Chesnay's branch here:
https://github.com/apache/flink/pull/1771

Clone his repo via Git, check out the branch, and then build it from
source (https://ci.apache.org/projects/flink/flink-docs-release-1.0/setup/building.html).