Re: finite subset of an infinite data stream

Posted by Stephan Ewen on
URL: http://deprecated-apache-flink-user-mailing-list-archive.369.s1.nabble.com/finite-subset-of-an-infinite-data-stream-tp3400p3404.html

Hi!

If you want to work on subsets of streams, the answer is usually to use windows, "stream.keyBy(...).timeWindow(Time.of(1, MINUTE))".

The transformations that you want to make, do they fit into a window function?

There are thoughts to introduce something like global time windows across the entire stream, inside which you can work more in a batch-style, but that is quite an extensive change to the core.

Greetings,
Stephan


On Sun, Nov 8, 2015 at 5:15 PM, rss rss <[hidden email]> wrote:

Hello,

 

I need to extract a finite subset like a data buffer from an infinite data stream. The best way for me is to obtain a finite stream with data accumulated for a 1minute before (as example). But I not found any existing technique to do it.

 

As a possible ways how to do something near to a stream’s subset I see following cases:

-          some transformation operation like ‘take_while’ that produces new stream but able to switch one to FINNISHED state. Unfortunately I not found how to switch the state of a stream from a user code of transformation functions;

-          new DataStream or StreamSource constructors which allow to connect a data processing chain to the source stream. It may be something like mentioned take_while transform function or modified StreamSource.run method with data from the source stream.

 

That is I have two questions.

1)      Is there any technique to extract accumulated data from a stream as a stream (to union it with another stream)? This is like pure buffer mode.

2)      If the answer to first question is negative, is there something like take_while transformation or should I think about custom implementation of it? Is it possible to implement it without modification of the core of Flink?

 

Regards,

Roman