Simulating Time-based and Count-based Custom Windows with ProcessFunction

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Simulating Time-based and Count-based Custom Windows with ProcessFunction

m@xi
Hello Flinkers!

Around here and there one may find some post for sliding windows in Flink. I
have read that default sliding windows of Flink, the system maintains each
window separately in memory, which in my case is prohibitive.

Therefore, I want to implement my own sliding windows through
ProcessFunction() via onTimer() function. Specifically, let's assume that
the data does not contain any timestamps. So, if anyone can help providing
concrete answers or even *code skeletons* on the following bullets, it is
more than welcome :

1 -- How do I assign timestamps into my data tuples? What type of
time...process, event or ingestion?

2 -- How to simulate count-based windows? In this case, what would be the
best artificial timestamps to append? Just increasing integers?

Thanks in advance.

Best,
Max




--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re: Simulating Time-based and Count-based Custom Windows with ProcessFunction

Tzu-Li (Gordon) Tai
Hi Max!

Before we jump into the custom ProcessFunction approach:
Have you also checked out using the RocksDB state backend, and whether or not it is suitable for your use case?
For state that would not fit into memory, that is usually the to-go state backend to use.

If you’re sure a custom ProcessFunction is still the way to go, then some answers to your questions:

1 -- How do I assign timestamps into my data tuples? What type of 
time...process, event or ingestion? 

This should be done via the `.assignTimestampsAndWatermarks` method on the output of an operator prior to the process function.
The timestamps assigned using this method is for event time processing.
Timestamps for processing and ingestion time processing are determined by the system.

2 -- How to simulate count-based windows? In this case, what would be the 
best artificial timestamps to append? Just increasing integers? 

I’m not sure of your actual use case here, but if you want to implement count-based windows, then timers should not need to be part of the implementation. On each processElement, you should fire results of a window’s state if the window’s element count has reached the target count.

- Gordon

On 13 April 2018 at 5:36:02 PM, m@xi ([hidden email]) wrote:

Hello Flinkers!

Around here and there one may find some post for sliding windows in Flink. I
have read that default sliding windows of Flink, the system maintains each
window separately in memory, which in my case is prohibitive.

Therefore, I want to implement my own sliding windows through
ProcessFunction() via onTimer() function. Specifically, let's assume that
the data does not contain any timestamps. So, if anyone can help providing
concrete answers or even *code skeletons* on the following bullets, it is
more than welcome :

1 -- How do I assign timestamps into my data tuples? What type of
time...process, event or ingestion?

2 -- How to simulate count-based windows? In this case, what would be the
best artificial timestamps to append? Just increasing integers?

Thanks in advance.

Best,
Max




--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/