Hi Folks,
I have a question regarding Global Windows.
I have a stream with a large number of records. The records have a key which has a very high cardinality. They also have a state ( start, status, finish).
I need to do some processing where I look at the records separated into windows using the ‘state’ property.
From the documentation, I believe I should be using a Global Window with a custom trigger to identify the windows….
I have this implemented.. the Trigger returns ‘CONTINUE” for ‘start', and FIRE_AND_PURGE for ‘finish'.
I also need to avoid running out of memory since sometimes I don’t get ‘finish’ records… so I added a timer to the Trigger which PURGE’s if it fires..
Is this the correct approach?
I say this since I do in fact see a memory leak … is there anything else I need to be aware of?
Thanks
Steve
|
Hi Steve,
are you sure a GlobalWindows assigner fits your needs? This may be the case if all your events always come in order and you do not ever have overlapping sessions since a GlobalWindows assigner simply puts all events (per key) into a single window (per key). If you have overlapping sessions, you may need your own window assigner that handles multiple windows (see the EventTimeSessionWindows assigner for our take on event-time session windows). Regarding the timer: if you set it via `#registerEventTimeTimer()`, it only fires if a watermark passes the given timestamp, so you need to make sure your sources create them (see [1] and its sub-topics). Depending on your further constraints in your application, it may be ok to use `registerProcessingTimeTimer()` instead. Does this help already? If not, we'd need some (minimal) example of how your using these things to debug further into your memory issues. Nico [1] https://ci.apache.org/projects/flink/flink-docs-release-1.3/dev/ event_time.html On Monday, 14 August 2017 06:32:01 CEST Steve Jerman wrote: > Hi Folks, > > I have a question regarding Global Windows. > > I have a stream with a large number of records. The records have a key which > has a very high cardinality. They also have a state ( start, status, > finish). > I need to do some processing where I look at the records separated into > windows using the ‘state’ property. > From the documentation, I believe I should be using a Global Window with a > custom trigger to identify the windows…. > I have this implemented.. the Trigger returns ‘CONTINUE” for ‘start', and > FIRE_AND_PURGE for ‘finish'. > I also need to avoid running out of memory since sometimes I don’t get > ‘finish’ records… so I added a timer to the Trigger which PURGE’s if it > fires.. > Is this the correct approach? > > I say this since I do in fact see a memory leak … is there anything else I > need to be aware of? > Thanks > > Steve signature.asc (201 bytes) Download Attachment |
Thank you Nico.
I *think* I should have one stream per key... the stream I get is pretty fast and there may be some corner cases I'm not aware of. However, I really need to process as a single window per key.
I am worried about the cardinality of the key ... I wanted to use a timeout to remove the window for a key. If not the memory requirements would grow quickly (which I think is what is happening). The stream has 60K unique keys per 5 minutes window (maybe 1/2 million total unique per day...).
Anyway I'll write a test to investigate further...
Thanks for your thoughts
Steve
From: Nico Kruber <[hidden email]>
Sent: Wednesday, August 16, 2017 3:22:41 AM To: [hidden email] Cc: Steve Jerman Subject: Re: Question about Global Windows. Hi Steve,
are you sure a GlobalWindows assigner fits your needs? This may be the case if all your events always come in order and you do not ever have overlapping sessions since a GlobalWindows assigner simply puts all events (per key) into a single window (per key). If you have overlapping sessions, you may need your own window assigner that handles multiple windows (see the EventTimeSessionWindows assigner for our take on event-time session windows). Regarding the timer: if you set it via `#registerEventTimeTimer()`, it only fires if a watermark passes the given timestamp, so you need to make sure your sources create them (see [1] and its sub-topics). Depending on your further constraints in your application, it may be ok to use `registerProcessingTimeTimer()` instead. Does this help already? If not, we'd need some (minimal) example of how your using these things to debug further into your memory issues. Nico [1] https://ci.apache.org/projects/flink/flink-docs-release-1.3/dev/
event_time.html On Monday, 14 August 2017 06:32:01 CEST Steve Jerman wrote: > Hi Folks, > > I have a question regarding Global Windows. > > I have a stream with a large number of records. The records have a key which > has a very high cardinality. They also have a state ( start, status, > finish). > I need to do some processing where I look at the records separated into > windows using the ‘state’ property. > From the documentation, I believe I should be using a Global Window with a > custom trigger to identify the windows…. > I have this implemented.. the Trigger returns ‘CONTINUE” for ‘start', and > FIRE_AND_PURGE for ‘finish'. > I also need to avoid running out of memory since sometimes I don’t get > ‘finish’ records… so I added a timer to the Trigger which PURGE’s if it > fires.. > Is this the correct approach? > > I say this since I do in fact see a memory leak … is there anything else I > need to be aware of? > Thanks > > Steve |
From: Steve Jerman <[hidden email]>
Sent: Thursday, August 17, 2017 11:34:09 AM To: Nico Kruber; [hidden email] Subject: Re: Question about Global Windows. Thank you Nico.
I *think* I should have one stream per key... the stream I get is pretty fast and there may be some corner cases I'm not aware of. However, I really need to process as a single window per key.
I am worried about the cardinality of the key ... I wanted to use a timeout to remove the window for a key. If not the memory requirements would grow quickly (which I think is what is happening). The stream has 60K unique keys per 5 minutes window (maybe 1/2 million total unique per day...).
Anyway I'll write a test to investigate further...
Thanks for your thoughts
Steve
From: Nico Kruber <[hidden email]>
Sent: Wednesday, August 16, 2017 3:22:41 AM To: [hidden email] Cc: Steve Jerman Subject: Re: Question about Global Windows. Hi Steve,
are you sure a GlobalWindows assigner fits your needs? This may be the case if all your events always come in order and you do not ever have overlapping sessions since a GlobalWindows assigner simply puts all events (per key) into a single window (per key). If you have overlapping sessions, you may need your own window assigner that handles multiple windows (see the EventTimeSessionWindows assigner for our take on event-time session windows). Regarding the timer: if you set it via `#registerEventTimeTimer()`, it only fires if a watermark passes the given timestamp, so you need to make sure your sources create them (see [1] and its sub-topics). Depending on your further constraints in your application, it may be ok to use `registerProcessingTimeTimer()` instead. Does this help already? If not, we'd need some (minimal) example of how your using these things to debug further into your memory issues. Nico [1] https://ci.apache.org/projects/flink/flink-docs-release-1.3/dev/
event_time.html On Monday, 14 August 2017 06:32:01 CEST Steve Jerman wrote: > Hi Folks, > > I have a question regarding Global Windows. > > I have a stream with a large number of records. The records have a key which > has a very high cardinality. They also have a state ( start, status, > finish). > I need to do some processing where I look at the records separated into > windows using the ‘state’ property. > From the documentation, I believe I should be using a Global Window with a > custom trigger to identify the windows…. > I have this implemented.. the Trigger returns ‘CONTINUE” for ‘start', and > FIRE_AND_PURGE for ‘finish'. > I also need to avoid running out of memory since sometimes I don’t get > ‘finish’ records… so I added a timer to the Trigger which PURGE’s if it > fires.. > Is this the correct approach? > > I say this since I do in fact see a memory leak … is there anything else I > need to be aware of? > Thanks > > Steve |
Free forum by Nabble | Edit this page |