'Custom' mapping function on keyed WindowedStream

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

'Custom' mapping function on keyed WindowedStream

Marchant, Hayden
I would like to create a custom aggregator function for a windowed KeyedStream which I have complete control over - i.e. instead of implementing an AggregatorFunction, I would like to control the lifecycle of the flink state by implementing the CheckpointedFunction interface, though I still want this state to be per-key, per-window.

I am not sure which function I should be calling on the WindowedStream in order to invoke this custom functionality. I see from the documentation that CheckpointedFunction is for non-keyed state - which I guess eliminates this option.

A little background - I have logic that needs to hold a very large state in the operator - lots of counts by sub-key. Since only a sub-set of these aggregations are updated, I was interesting in trying out incremental checkpointing in RocksDB. However, I don't want to be hitting RocksDB I/O on every update of state since we need very low latency, and instead wanted to hold the state in Java Heap and then update the Flink state on checkpoint - i.e something like CheckpointedFunction.
My assumption is that any update I make to RocksDB backed state will hit the local disk - if this is wrong then I'll be happy

What other options do I have?

Thanks,
Hayden Marchant

Reply | Threaded
Open this post in threaded view
|

Re: 'Custom' mapping function on keyed WindowedStream

swiesman
I had to solve a similar problem, we use a process function with rocksdb and map state for the sub keys. So while we hit rocks on every element, only the specified sub keys are ever read from disk.

Seth Wiesman| Software Engineer4 World Trade Center, 46th Floor, New York, NY [hidden email] <mailto:[hidden email]>
 

On 2/26/18, 6:32 AM, "Marchant, Hayden " <[hidden email]> wrote:

    I would like to create a custom aggregator function for a windowed KeyedStream which I have complete control over - i.e. instead of implementing an AggregatorFunction, I would like to control the lifecycle of the flink state by implementing the CheckpointedFunction interface, though I still want this state to be per-key, per-window.
   
    I am not sure which function I should be calling on the WindowedStream in order to invoke this custom functionality. I see from the documentation that CheckpointedFunction is for non-keyed state - which I guess eliminates this option.
   
    A little background - I have logic that needs to hold a very large state in the operator - lots of counts by sub-key. Since only a sub-set of these aggregations are updated, I was interesting in trying out incremental checkpointing in RocksDB. However, I don't want to be hitting RocksDB I/O on every update of state since we need very low latency, and instead wanted to hold the state in Java Heap and then update the Flink state on checkpoint - i.e something like CheckpointedFunction.
    My assumption is that any update I make to RocksDB backed state will hit the local disk - if this is wrong then I'll be happy
   
    What other options do I have?
   
    Thanks,
    Hayden Marchant