Reprocessing the data after config change

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Reprocessing the data after config change

Tomasz Dobrzycki
Hi all,

I'm currently working on a system that windows and extract metrics from data made of browser events. This data is processed based on config loaded from external application. 
One of the main requirements of the system is to reprocess historical data (within some reason, currently I've set on 7 days due to Kinesis Streams retention period) after that config changes. 
My line of attack was to keep one job processing live data and start another one from past checkpoints that would reprocess the data until it catches up with the live one (still need to think what metrics should use to determine that - any suggestions are welcome :) ).
Am I on the right track with this or is there a better way of approaching the problem?

Kind Regards,
Tomasz
Reply | Threaded
Open this post in threaded view
|

Re: Reprocessing the data after config change

Fabian Hueske-2
Hi Tomasz,

that sounds like a sound design.
You have to make sure that the output of the application is idempotent such that the reprocessing job overrides all! output data of the earlier job.

Best, Fabian



2017-10-23 16:24 GMT+02:00 Tomasz Dobrzycki <[hidden email]>:
Hi all,

I'm currently working on a system that windows and extract metrics from data made of browser events. This data is processed based on config loaded from external application. 
One of the main requirements of the system is to reprocess historical data (within some reason, currently I've set on 7 days due to Kinesis Streams retention period) after that config changes. 
My line of attack was to keep one job processing live data and start another one from past checkpoints that would reprocess the data until it catches up with the live one (still need to think what metrics should use to determine that - any suggestions are welcome :) ).
Am I on the right track with this or is there a better way of approaching the problem?

Kind Regards,
Tomasz