Streaming window : count with timeout ?

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Streaming window : count with timeout ?

LINZ, Arnaud

Hello,

 

The data in my stream have a timestamp that may be slightly out of order, but I need to process the data in the proper order. To do this, I use a windowing function and sort the items in a flatMap.

 

However, the source may sometimes send data in “bulk batches” and sometimes “on the fly”. If I choose a time window, it will suits well the “on the fly” behavior but when processing bulks I may have too many elements to sort in the time interval specified.

 

If I choose a “count.of” window, I will process batches efficiently but I may need to wait forever in the “on the fly” behavior until the count is reached.

 

What I need is then a “count window with time out” or a “time window with max element” : I would like to specify both a max count and a max time to fit the source behavior.

 

Do you have any idea how I can do that ?

 

Best regards,

Arnaud

 




L'intégrité de ce message n'étant pas assurée sur internet, la société expéditrice ne peut être tenue responsable de son contenu ni de ses pièces jointes. Toute utilisation ou diffusion non autorisée est interdite. Si vous n'êtes pas destinataire de ce message, merci de le détruire et d'avertir l'expéditeur.

The integrity of this message cannot be guaranteed on the Internet. The company that sent this message cannot therefore be held liable for its content nor attachments. Any unauthorized use or dissemination is prohibited. If you are not the intended recipient of this message, then please delete it and notify the sender.
Reply | Threaded
Open this post in threaded view
|

Re: Streaming window : count with timeout ?

Matthias J. Sax
You can implement an custom window policy (I guess this should be
flexible enough for your case).

See documentation:
https://ci.apache.org/projects/flink/flink-docs-master/apis/streaming_guide.html#policy-based-windowing

If you have further question after reading, just go ahead :)

-Matthias


On 07/17/2015 05:30 PM, LINZ, Arnaud wrote:

> Hello,
>
>  
>
> The data in my stream have a timestamp that may be slightly out of
> order, but I need to process the data in the proper order. To do this, I
> use a windowing function and sort the items in a flatMap.
>
>  
>
> However, the source may sometimes send data in “bulk batches” and
> sometimes “on the fly”. If I choose a time window, it will suits well
> the “on the fly” behavior but when processing bulks I may have too many
> elements to sort in the time interval specified.
>
>  
>
> If I choose a “count.of” window, I will process batches efficiently but
> I may need to wait forever in the “on the fly” behavior until the count
> is reached.
>
>  
>
> What I need is then a “count window with time out” or a “time window
> with max element” : I would like to specify both a max count and a max
> time to fit the source behavior.
>
>  
>
> Do you have any idea how I can do that ?
>
>  
>
> Best regards,
>
> Arnaud
>
>  
>
>
> ------------------------------------------------------------------------
>
> L'intégrité de ce message n'étant pas assurée sur internet, la société
> expéditrice ne peut être tenue responsable de son contenu ni de ses
> pièces jointes. Toute utilisation ou diffusion non autorisée est
> interdite. Si vous n'êtes pas destinataire de ce message, merci de le
> détruire et d'avertir l'expéditeur.
>
> The integrity of this message cannot be guaranteed on the Internet. The
> company that sent this message cannot therefore be held liable for its
> content nor attachments. Any unauthorized use or dissemination is
> prohibited. If you are not the intended recipient of this message, then
> please delete it and notify the sender.


signature.asc (836 bytes) Download Attachment