I'm looking into a situation where our checkpoint sizes are automatically growing over time. I'm unable to pinpoint exactly why this is happening, and it would be great if there was a way to figure out how much checkpoint space is attributable to each operator so I can narrow it down. Are there any tools or methods for introspecting the checkpoint data so that I can determine where the space is going?
The pipeline in question is consuming from Kinesis and batching up data using windows. I suspected that I was doing something wrong with windowing, but I'm emitting FIRE_AND_PURGE and also setting a max end timestamp. The Kinesis consumer is not emitting watermarks at the moment, but as far as I know thats not necessary for proper checkpointing (only exactly once behavior).