Flink SQL continuous join checkpointing

Posted by Taras Moisiuk on
URL: http://deprecated-apache-flink-user-mailing-list-archive.369.s1.nabble.com/Flink-SQL-continuous-join-checkpointing-tp40267.html

Hi everyone!
I'm using Flink 1.12.0 with SQL API.
My streaming job is basically a join of two dynamic tables (from kafka topics) and insertion the result into PostgreSQL table.

I have enabled watermarking based on kafka topic timestamp column for each table in join:

CREATE TABLE table1 (
....,

kafka_time TIMESTAMP(3) METADATA FROM 'timestamp',
WATERMARK FOR kafka_time AS kafka_time - INTERVAL '2' HOUR
)
WITH ('format' = 'json',
...

But checkpoint data size for join task is permanently increasing despite the watermarks on the tables and "Low watermark" mark in UI.

As far as I understand outdated records from both tables must be dropped from checkpoint after 2 hours, but looks like it holds all job state since the beginning.

Should I enable this behavior for outdated records explicitly of modify join query?
Thank you!

Sent from the Apache Flink User Mailing List archive. mailing list archive at Nabble.com.