Question about session_aggregate.merging-window-set.rocksdb_estimate-num-keys

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Question about session_aggregate.merging-window-set.rocksdb_estimate-num-keys

Vishal Santoshi
Hey folks, 

      Was looking at this very specific metric "session_aggregate.merging-window-set.rocksdb_estimate-num-keys".  Does this metric also represent session windows ( it is a session window ) that have lateness on them ? In essence if the session window was closed but has a lateness of a few hours would those keys still be counted against this metric.  

I think they should as it is an estimate keys for the Column Family for the operator and if the window has not been GCed then the key for those Windows should be in RocksDB but wanted to be sure. 

Regards.


Reply | Threaded
Open this post in threaded view
|

Re: Question about session_aggregate.merging-window-set.rocksdb_estimate-num-keys

Vishal Santoshi
The reason I ask is that I have a "Process Window Function" on that Session  Window  and I keep key scoped Global State.  I maintain a TTL on that state ( that is outside the Window state )  that is roughly the current WM + lateness. 

I would imagine that keys for that custom state are roughly equal to the number of keys in the "merging-window-set" . It seems twice that number but does follow the slope. I am trying to figure out why this deviation. 

public void process(KEY key,
ProcessWindowFunction<KeyedSession<KEY, VALUE>, KeyedSessionWithSessionID<KEY, VALUE>, KEY, TimeWindow>.Context context,
Iterable<KeyedSession<KEY, VALUE>> elements, Collector<KeyedSessionWithSessionID<KEY, VALUE>> out)
throws Exception {
// scoped to the key
if (state.value() == null) {
this.newKeysInState.inc();
state.update(new IntervalList());
}else{
this.existingKeysInState.inc();
}

On Sun, Mar 14, 2021 at 3:32 PM Vishal Santoshi <[hidden email]> wrote:
Hey folks, 

      Was looking at this very specific metric "session_aggregate.merging-window-set.rocksdb_estimate-num-keys".  Does this metric also represent session windows ( it is a session window ) that have lateness on them ? In essence if the session window was closed but has a lateness of a few hours would those keys still be counted against this metric.  

I think they should as it is an estimate keys for the Column Family for the operator and if the window has not been GCed then the key for those Windows should be in RocksDB but wanted to be sure. 

Regards.


Reply | Threaded
Open this post in threaded view
|

Re: Question about session_aggregate.merging-window-set.rocksdb_estimate-num-keys

Vishal Santoshi
All I can think is, that any update on a state key, which I do in my ProcessFunction, creates an update ( essentially an append on rocksdb ) which does render the previous value for the key, a  tombstone , but that need not reflect on the count  ( as double or triple counts ) atomically, thus the called as an "estimate" , but was not anticipating this much difference ...

On Sun, Mar 14, 2021 at 5:32 PM Vishal Santoshi <[hidden email]> wrote:
The reason I ask is that I have a "Process Window Function" on that Session  Window  and I keep key scoped Global State.  I maintain a TTL on that state ( that is outside the Window state )  that is roughly the current WM + lateness. 

I would imagine that keys for that custom state are roughly equal to the number of keys in the "merging-window-set" . It seems twice that number but does follow the slope. I am trying to figure out why this deviation. 

public void process(KEY key,
ProcessWindowFunction<KeyedSession<KEY, VALUE>, KeyedSessionWithSessionID<KEY, VALUE>, KEY, TimeWindow>.Context context,
Iterable<KeyedSession<KEY, VALUE>> elements, Collector<KeyedSessionWithSessionID<KEY, VALUE>> out)
throws Exception {
// scoped to the key
if (state.value() == null) {
this.newKeysInState.inc();
state.update(new IntervalList());
}else{
this.existingKeysInState.inc();
}

On Sun, Mar 14, 2021 at 3:32 PM Vishal Santoshi <[hidden email]> wrote:
Hey folks, 

      Was looking at this very specific metric "session_aggregate.merging-window-set.rocksdb_estimate-num-keys".  Does this metric also represent session windows ( it is a session window ) that have lateness on them ? In essence if the session window was closed but has a lateness of a few hours would those keys still be counted against this metric.  

I think they should as it is an estimate keys for the Column Family for the operator and if the window has not been GCed then the key for those Windows should be in RocksDB but wanted to be sure. 

Regards.


Reply | Threaded
Open this post in threaded view
|

Re: Question about session_aggregate.merging-window-set.rocksdb_estimate-num-keys

Yun Tang
Hi,

Could you describe what you observed in details? Which states you compare with the session window state "merging-window-set", the "newKeysInState" or "existingKeysInState"?

BTW, since we use list state as main state for window operator and we use RocksDB's merge operation for window state add operations, this would cause the estimating of number keys inaccurate [1]:
  // Estimation will be inaccurate when:
  // (1) there exist merge keys
  // (2) keys are directly overwritten
  // (3) deletion on non-existing keys
  // (4) low number of samples




Best
Yun Tang

From: Vishal Santoshi <[hidden email]>
Sent: Monday, March 15, 2021 5:48
To: user <[hidden email]>
Subject: Re: Question about session_aggregate.merging-window-set.rocksdb_estimate-num-keys
 
All I can think is, that any update on a state key, which I do in my ProcessFunction, creates an update ( essentially an append on rocksdb ) which does render the previous value for the key, a  tombstone , but that need not reflect on the count  ( as double or triple counts ) atomically, thus the called as an "estimate" , but was not anticipating this much difference ...

On Sun, Mar 14, 2021 at 5:32 PM Vishal Santoshi <[hidden email]> wrote:
The reason I ask is that I have a "Process Window Function" on that Session  Window  and I keep key scoped Global State.  I maintain a TTL on that state ( that is outside the Window state )  that is roughly the current WM + lateness. 

I would imagine that keys for that custom state are roughly equal to the number of keys in the "merging-window-set" . It seems twice that number but does follow the slope. I am trying to figure out why this deviation. 

public void process(KEY key,
ProcessWindowFunction<KeyedSession<KEY, VALUE>, KeyedSessionWithSessionID<KEY, VALUE>, KEY, TimeWindow>.Context context,
Iterable<KeyedSession<KEY, VALUE>> elements, Collector<KeyedSessionWithSessionID<KEY, VALUE>> out)
throws Exception {
// scoped to the key
if (state.value() == null) {
this.newKeysInState.inc();
state.update(new IntervalList());
}else{
this.existingKeysInState.inc();
}

On Sun, Mar 14, 2021 at 3:32 PM Vishal Santoshi <[hidden email]> wrote:
Hey folks, 

      Was looking at this very specific metric "session_aggregate.merging-window-set.rocksdb_estimate-num-keys".  Does this metric also represent session windows ( it is a session window ) that have lateness on them ? In essence if the session window was closed but has a lateness of a few hours would those keys still be counted against this metric.  

I think they should as it is an estimate keys for the Column Family for the operator and if the window has not been GCed then the key for those Windows should be in RocksDB but wanted to be sure. 

Regards.


Reply | Threaded
Open this post in threaded view
|

Re: Question about session_aggregate.merging-window-set.rocksdb_estimate-num-keys

Vishal Santoshi
Neither those are metrics metrics on a ValueState<IntervalList>, which is updated at least once every call to process.  The metric is the the number of these ValueState<IntervalList>s  scoped to a key ( am using session windows ).


On Mon, Mar 15, 2021 at 11:29 PM Yun Tang <[hidden email]> wrote:
Hi,

Could you describe what you observed in details? Which states you compare with the session window state "merging-window-set", the "newKeysInState" or "existingKeysInState"?

BTW, since we use list state as main state for window operator and we use RocksDB's merge operation for window state add operations, this would cause the estimating of number keys inaccurate [1]:
  // Estimation will be inaccurate when:
  // (1) there exist merge keys
  // (2) keys are directly overwritten
  // (3) deletion on non-existing keys
  // (4) low number of samples




Best
Yun Tang

From: Vishal Santoshi <[hidden email]>
Sent: Monday, March 15, 2021 5:48
To: user <[hidden email]>
Subject: Re: Question about session_aggregate.merging-window-set.rocksdb_estimate-num-keys
 
All I can think is, that any update on a state key, which I do in my ProcessFunction, creates an update ( essentially an append on rocksdb ) which does render the previous value for the key, a  tombstone , but that need not reflect on the count  ( as double or triple counts ) atomically, thus the called as an "estimate" , but was not anticipating this much difference ...

On Sun, Mar 14, 2021 at 5:32 PM Vishal Santoshi <[hidden email]> wrote:
The reason I ask is that I have a "Process Window Function" on that Session  Window  and I keep key scoped Global State.  I maintain a TTL on that state ( that is outside the Window state )  that is roughly the current WM + lateness. 

I would imagine that keys for that custom state are roughly equal to the number of keys in the "merging-window-set" . It seems twice that number but does follow the slope. I am trying to figure out why this deviation. 

public void process(KEY key,
ProcessWindowFunction<KeyedSession<KEY, VALUE>, KeyedSessionWithSessionID<KEY, VALUE>, KEY, TimeWindow>.Context context,
Iterable<KeyedSession<KEY, VALUE>> elements, Collector<KeyedSessionWithSessionID<KEY, VALUE>> out)
throws Exception {
// scoped to the key
if (state.value() == null) {
this.newKeysInState.inc();
state.update(new IntervalList());
}else{
this.existingKeysInState.inc();
}

On Sun, Mar 14, 2021 at 3:32 PM Vishal Santoshi <[hidden email]> wrote:
Hey folks, 

      Was looking at this very specific metric "session_aggregate.merging-window-set.rocksdb_estimate-num-keys".  Does this metric also represent session windows ( it is a session window ) that have lateness on them ? In essence if the session window was closed but has a lateness of a few hours would those keys still be counted against this metric.  

I think they should as it is an estimate keys for the Column Family for the operator and if the window has not been GCed then the key for those Windows should be in RocksDB but wanted to be sure. 

Regards.