Fwd: How to reprocess certain events in Flink?

Posted by Pooja Agrawal on
URL: http://deprecated-apache-flink-user-mailing-list-archive.369.s1.nabble.com/Fwd-How-to-reprocess-certain-events-in-Flink-tp31687.html


Hi,

I have a use case where we are reading events from kinesis stream.The event can look like this
Event {
  event_id,
  transaction_id
  key1,
  key2,
  value,
  timestamp,
  some other fields...
}

We want to aggregate the values per key for all events we have seen till now (as simple as "select key1, key2, sum(value) from events group by key1, key2key."). For this I have created a simple flink job which uses flink-kinesis connector and applies keyby() and sum() on the incoming events. I am facing two challenges here:

1) The incoming events can have duplicates. How to maintain exactly once processing here, as processing duplicate events can give me erroneous result? The field transaction_id will be unique for each events. If two events have same transaction_id, we can say that they are duplicates (By duplicates here, I don't just mean the retried ones. The same message can be present in kinesis with different sequence number. I am not sure if flink-kinesis connector can handle that, as it is using KCL underlying which I assume doesn't take care of it)

2) There can be the the cases where the value has been updated for a key after processing the event and we may want to reprocess those events with new value. Since this is just a value change, the transaction_id will be same. The idempotency logic will not allow to reprocess the events. What are the ways to handle such scenarios in flink?

Thanks
Pooja


--
Warm Regards,
Pooja Agrawal