Re: How to reprocess certain events in Flink?
Posted by
Zhu Zhu on
URL: http://deprecated-apache-flink-user-mailing-list-archive.369.s1.nabble.com/Fwd-How-to-reprocess-certain-events-in-Flink-tp31687p31697.html
Hi Pooja,
I'm a bit confused since in 1) it says that "If two events have same transaction_id, we can say that they are duplicates", and in 2) it says that "Since this is just a value change, the transaction_id will be same". Looks to me they are conflicting. Usually in case 2) scenarios, the value updates event is considered as new event which does not share the unique id with prior events.
If each event has a unique transaction_id, you can use it to de-duplicate the events via a set recording the transaction_id(s) which are already processed. And then 2) would not be a problem with the unique transaction_id assumption.
Thanks,
Zhu Zhu
Hi,
I have a use case where we are reading events from kinesis stream.The event can look like this
Event {
event_id,
transaction_id
key1,
key2,
value,
timestamp,
some other fields...
}
We want to aggregate the values per key for all events we have seen till now (as simple as "select key1, key2, sum(value) from events group by key1, key2key."). For this I have created a simple flink job which uses flink-kinesis connector and applies keyby() and sum() on the incoming events. I am facing two challenges here:
1) The incoming events can have duplicates. How to maintain exactly once processing here, as processing duplicate events can give me erroneous result? The field transaction_id will be unique for each events. If two events have same transaction_id, we can say that they are duplicates (By duplicates here, I don't just mean the retried ones. The same message can be present in kinesis with different sequence number. I am not sure if flink-kinesis connector can handle that, as it is using KCL underlying which I assume doesn't take care of it)
2) There can be the the cases where the value has been updated for a key after processing the event and we may want to reprocess those events with new value. Since this is just a value change, the transaction_id will be same. The idempotency logic will not allow to reprocess the events. What are the ways to handle such scenarios in flink?
Thanks
Pooja
--
Warm Regards,
Pooja Agrawal