Hi Luis,
using Event Time windows, you should be able to generate some test data and get predictable results.
Flink is internally using similar tests to ensure correctness of the windowing implementation (for example the EventTimeWindowCheckpointingITCase).
Regards,
Robert