join state TTL

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

join state TTL

lec ssmi
Hi:
  When the stream is joined with another stream , the cached stream data will be saved as a state and deleted as the watermark advances.
  I found that there is also a parameter that can set the state expiration time, such as StateTtlConfig in DataStream API, TableConfig in TableAPI &SQL .This setting is effective for the state of group by operator. And now the state TTL seems to be based on processing time.If the configured TTL  has been reached and the watermark has not moved to the edge. The state of join will be cleared ? What is the relationship between StateTtlConfig  and TableConfig? If I use  StateTtlConfig  and program with TableAPI, can the configuration take effect?
  
  Best regards
  Lec Ssmi
 
Reply | Threaded
Open this post in threaded view
|

Re: join state TTL

Jark Wu-3
Hi Lec,

StateTtlConfig in DataStream API is a configuration on specific state, not a job level configuration. 
TableConfig#setIdleStateRetentionTime in TableAPI&SQL is a job level configuration which will enable state ttl for all non-time-based operator states.
In blink planner, the underlying of TableConfig#setIdleStateRetentionTime uses the StateTtlConfig. 
Time-based operators are window aggregation, time-windowed join, and so on. 

StateTtlConfig is a state TTL mechanism only works on processing time, not event-time. 
In TableAPI&SQL and DataStream, the window aggregation and time-windowed join will clear expired state using Timers which is triggered by watermark.
So time-based operators don't use StateTtlConfig to clear expired state. 

Best,
Jark


On Tue, 28 Apr 2020 at 14:48, lec ssmi <[hidden email]> wrote:
Hi:
  When the stream is joined with another stream , the cached stream data will be saved as a state and deleted as the watermark advances.
  I found that there is also a parameter that can set the state expiration time, such as StateTtlConfig in DataStream API, TableConfig in TableAPI &SQL .This setting is effective for the state of group by operator. And now the state TTL seems to be based on processing time.If the configured TTL  has been reached and the watermark has not moved to the edge. The state of join will be cleared ? What is the relationship between StateTtlConfig  and TableConfig? If I use  StateTtlConfig  and program with TableAPI, can the configuration take effect?
  
  Best regards
  Lec Ssmi
 
Reply | Threaded
Open this post in threaded view
|

Re: join state TTL

Jark Wu-3
If 'uu' in stream A is not updated for more than 24 hours, then it will be cleared.  (blink planner) 
The state expiration strategy is "not updated for more than x time".

Best,
Jark

On Wed, 29 Apr 2020 at 10:19, LakeShen <[hidden email]> wrote:
Hi Jark,

I am a little  confused about how double stream joining state cleared(not window join). 

For example, there are two stream , A , B . The sql like this :

select a ,b from A  join B on  A.a = B.b
 
If I config the idle state retention time, such min idle state retention time is 24 hour, max is 25 hour.

There exist a key 'uu'  which not joined in B for 29 hour , the key 'uu' state in A stream , is it cleared by flink ?

Thanks to your reply.

Best,
LakeShen

Jark Wu <[hidden email]> 于2020年4月28日周二 下午7:47写道:
Hi Lec,

StateTtlConfig in DataStream API is a configuration on specific state, not a job level configuration. 
TableConfig#setIdleStateRetentionTime in TableAPI&SQL is a job level configuration which will enable state ttl for all non-time-based operator states.
In blink planner, the underlying of TableConfig#setIdleStateRetentionTime uses the StateTtlConfig. 
Time-based operators are window aggregation, time-windowed join, and so on. 

StateTtlConfig is a state TTL mechanism only works on processing time, not event-time. 
In TableAPI&SQL and DataStream, the window aggregation and time-windowed join will clear expired state using Timers which is triggered by watermark.
So time-based operators don't use StateTtlConfig to clear expired state. 

Best,
Jark


On Tue, 28 Apr 2020 at 14:48, lec ssmi <[hidden email]> wrote:
Hi:
  When the stream is joined with another stream , the cached stream data will be saved as a state and deleted as the watermark advances.
  I found that there is also a parameter that can set the state expiration time, such as StateTtlConfig in DataStream API, TableConfig in TableAPI &SQL .This setting is effective for the state of group by operator. And now the state TTL seems to be based on processing time.If the configured TTL  has been reached and the watermark has not moved to the edge. The state of join will be cleared ? What is the relationship between StateTtlConfig  and TableConfig? If I use  StateTtlConfig  and program with TableAPI, can the configuration take effect?
  
  Best regards
  Lec Ssmi
 
Reply | Threaded
Open this post in threaded view
|

Re: join state TTL

LakeShen
Thank you for the clarification. Jark

Jark Wu <[hidden email]> 于2020年4月29日周三 上午10:39写道:
If 'uu' in stream A is not updated for more than 24 hours, then it will be cleared.  (blink planner) 
The state expiration strategy is "not updated for more than x time".

Best,
Jark

On Wed, 29 Apr 2020 at 10:19, LakeShen <[hidden email]> wrote:
Hi Jark,

I am a little  confused about how double stream joining state cleared(not window join). 

For example, there are two stream , A , B . The sql like this :

select a ,b from A  join B on  A.a = B.b
 
If I config the idle state retention time, such min idle state retention time is 24 hour, max is 25 hour.

There exist a key 'uu'  which not joined in B for 29 hour , the key 'uu' state in A stream , is it cleared by flink ?

Thanks to your reply.

Best,
LakeShen

Jark Wu <[hidden email]> 于2020年4月28日周二 下午7:47写道:
Hi Lec,

StateTtlConfig in DataStream API is a configuration on specific state, not a job level configuration. 
TableConfig#setIdleStateRetentionTime in TableAPI&SQL is a job level configuration which will enable state ttl for all non-time-based operator states.
In blink planner, the underlying of TableConfig#setIdleStateRetentionTime uses the StateTtlConfig. 
Time-based operators are window aggregation, time-windowed join, and so on. 

StateTtlConfig is a state TTL mechanism only works on processing time, not event-time. 
In TableAPI&SQL and DataStream, the window aggregation and time-windowed join will clear expired state using Timers which is triggered by watermark.
So time-based operators don't use StateTtlConfig to clear expired state. 

Best,
Jark


On Tue, 28 Apr 2020 at 14:48, lec ssmi <[hidden email]> wrote:
Hi:
  When the stream is joined with another stream , the cached stream data will be saved as a state and deleted as the watermark advances.
  I found that there is also a parameter that can set the state expiration time, such as StateTtlConfig in DataStream API, TableConfig in TableAPI &SQL .This setting is effective for the state of group by operator. And now the state TTL seems to be based on processing time.If the configured TTL  has been reached and the watermark has not moved to the edge. The state of join will be cleared ? What is the relationship between StateTtlConfig  and TableConfig? If I use  StateTtlConfig  and program with TableAPI, can the configuration take effect?
  
  Best regards
  Lec Ssmi