Hi,
I've noticed that SplitStream class is marked as deprecated, although split method of DataStream is not. Also there is no alternative proposed in SplitStream doc for it. In my use case I will have a stream of events that I have to split into two separate streams based on some function. Events with field that meets some condition should go to the first stream, where all other should go to the different stream. Later both streams should be processed in a different manner. I was planing to use approach presented here: https://ci.apache.org/projects/flink/flink-docs-stable/dev/stream/operators/ SplitStream<Integer> split = someDataStream.split(new OutputSelector<Integer>() { @Override public Iterable<String> select(Integer value) { List<String> output = new ArrayList<String>(); if (value % 2 == 0) { output.add("even"); } else { output.add("odd"); } return output; } }); But it turns out that SplitStream is deprecated. Also I've found similar question on SO https://stackoverflow.com/questions/53588554/apache-flink-using-filter-or-split-to-split-a-stream I don't fink filter and SideOutputs are good choice here. I will be thankful for an any suggestion. -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/ |
Hi Kristoff,
The recommended alternative is to use SideOutputs as described in [1]. Could you elaborate why you think side outputs are not a good choice for your usecase? Cheers, Kostas [1] https://ci.apache.org/projects/flink/flink-docs-stable/dev/stream/side_output.html On Thu, Dec 19, 2019 at 5:13 PM KristoffSC <[hidden email]> wrote: > > Hi, > I've noticed that SplitStream class is marked as deprecated, although split > method of DataStream is not. > Also there is no alternative proposed in SplitStream doc for it. > > In my use case I will have a stream of events that I have to split into two > separate streams based on some function. Events with field that meets some > condition should go to the first stream, where all other should go to the > different stream. > > Later both streams should be processed in a different manner. > > I was planing to use approach presented here: > https://ci.apache.org/projects/flink/flink-docs-stable/dev/stream/operators/ > > SplitStream<Integer> split = someDataStream.split(new > OutputSelector<Integer>() { > @Override > public Iterable<String> select(Integer value) { > List<String> output = new ArrayList<String>(); > if (value % 2 == 0) { > output.add("even"); > } > else { > output.add("odd"); > } > return output; > } > }); > > But it turns out that SplitStream is deprecated. > Also I've found similar question on SO > https://stackoverflow.com/questions/53588554/apache-flink-using-filter-or-split-to-split-a-stream > > I don't fink filter and SideOutputs are good choice here. > > I will be thankful for an any suggestion. > > > > > -- > Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/ |
Kostas, thank you for your response,
Well although the Side Outputs would do the job, I was just surprised that those are the replacements for stream splitting. The thing is, and this is might be only a subjective opinion, it that I would assume that Side Outputs should be used only to produce something.... aside of the main processing function like control messages or some leftovers. In my case, I wanted to simply split the stream into two new streams based on some condition. With side outputs I will have to "treat" the second stream as a something additional to the main processing result. Like it is written in the docs: "*In addition* to the main stream that results from DataStream operations(...)" or "The type of data in the result streams does not have to match the type of data in the *main *stream and the types of the different side outputs can also differ. " I'm my case I don't have any "addition" to my main stream and actually both spitted streams are equally important :) So by writing that side outputs are not good for my use case I meant that they are not fitting conceptually, at least in my opinion. Regards, Krzysztof -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/ |
Hi Krzysztof,
If I get it correctly, your main reason behind not using side-outputs is that it seems that "side-output", by the name, seems to be a "second class citizen" compared to the main output. I see your point but in terms of functionality, there is no difference between the different outputs from Flink's perspective. Both create DataStreams that are full integrated with Flink's fault-tolerant state handling (if checkpointing is enabled) and event-time handling. So I believe it is safe to use them for your usecase. I hope this helps, Kostas On Thu, Dec 19, 2019 at 10:30 PM KristoffSC <[hidden email]> wrote: > > Kostas, thank you for your response, > > Well although the Side Outputs would do the job, I was just surprised that > those are the replacements for stream splitting. > > The thing is, and this is might be only a subjective opinion, it that I > would assume that Side Outputs should be used only to produce something.... > aside of the main processing function like control messages or some > leftovers. > > In my case, I wanted to simply split the stream into two new streams based > on some condition. > With side outputs I will have to "treat" the second stream as a something > additional to the main processing result. > > Like it is written in the docs: > "*In addition* to the main stream that results from DataStream > operations(...)" > > or > "The type of data in the result streams does not have to match the type of > data in the *main *stream and the types of the different side outputs can > also differ. " > > > I'm my case I don't have any "addition" to my main stream and actually both > spitted streams are equally important :) > > So by writing that side outputs are not good for my use case I meant that > they are not fitting conceptually, at least in my opinion. > > Regards, > Krzysztof > > > > > -- > Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/ |
Hi Kostas,
Thank you for the answer and clarification. If Side-outputs are treated in the same way and there is no significant performance penalty then it seems that they are ok for my use case. I can accept the name mismatch ;) Regards, Krzysztof -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/ |
Free forum by Nabble | Edit this page |