Deprecated SplitStream class - what should be use instead.

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Deprecated SplitStream class - what should be use instead.

KristoffSC
Hi,
I've noticed that SplitStream class is marked as deprecated, although split
method of DataStream is not.
Also there is no alternative proposed in SplitStream doc for it.

In my use case I will have a stream of events that I have to split into two
separate streams based on some function. Events with field that meets some
condition should go to the first stream, where all other should go to the
different stream.

Later both streams should be processed in a different manner.

I was planing to use approach presented here:
https://ci.apache.org/projects/flink/flink-docs-stable/dev/stream/operators/

SplitStream<Integer> split = someDataStream.split(new
OutputSelector<Integer>() {
    @Override
    public Iterable<String> select(Integer value) {
        List<String> output = new ArrayList<String>();
        if (value % 2 == 0) {
            output.add("even");
        }
        else {
            output.add("odd");
        }
        return output;
    }
});

But it turns out that SplitStream is deprecated.
Also I've found similar question on SO
https://stackoverflow.com/questions/53588554/apache-flink-using-filter-or-split-to-split-a-stream 

I don't fink filter and SideOutputs are good choice here.

I will be thankful for an any suggestion.




--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re: Deprecated SplitStream class - what should be use instead.

Kostas Kloudas-5
Hi Kristoff,

The recommended alternative is to use SideOutputs as described in [1].
Could you elaborate why you think side outputs are not a good choice
for your usecase?

Cheers,
Kostas

[1] https://ci.apache.org/projects/flink/flink-docs-stable/dev/stream/side_output.html

On Thu, Dec 19, 2019 at 5:13 PM KristoffSC
<[hidden email]> wrote:

>
> Hi,
> I've noticed that SplitStream class is marked as deprecated, although split
> method of DataStream is not.
> Also there is no alternative proposed in SplitStream doc for it.
>
> In my use case I will have a stream of events that I have to split into two
> separate streams based on some function. Events with field that meets some
> condition should go to the first stream, where all other should go to the
> different stream.
>
> Later both streams should be processed in a different manner.
>
> I was planing to use approach presented here:
> https://ci.apache.org/projects/flink/flink-docs-stable/dev/stream/operators/
>
> SplitStream<Integer> split = someDataStream.split(new
> OutputSelector<Integer>() {
>     @Override
>     public Iterable<String> select(Integer value) {
>         List<String> output = new ArrayList<String>();
>         if (value % 2 == 0) {
>             output.add("even");
>         }
>         else {
>             output.add("odd");
>         }
>         return output;
>     }
> });
>
> But it turns out that SplitStream is deprecated.
> Also I've found similar question on SO
> https://stackoverflow.com/questions/53588554/apache-flink-using-filter-or-split-to-split-a-stream
>
> I don't fink filter and SideOutputs are good choice here.
>
> I will be thankful for an any suggestion.
>
>
>
>
> --
> Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re: Deprecated SplitStream class - what should be use instead.

KristoffSC
Kostas, thank you for your response,

Well although the Side Outputs would do the job, I was just surprised that
those are the replacements for stream splitting.

The thing is, and this is might be only a subjective opinion, it that I
would assume that Side Outputs should be used only to produce something....
aside of the main processing function like control messages or some
leftovers.

In my case, I wanted to simply split the stream into two new streams based
on some condition.
With side outputs I will have to "treat" the second stream as a something
additional to the main processing result.

Like it is written in the docs:
"*In addition* to the main stream that results from DataStream
operations(...)"

or
"The type of data in the result streams does not have to match the type of
data in the *main *stream and the types of the different side outputs can
also differ. "


I'm my case I don't have any "addition" to my main stream and actually both
spitted streams are equally important :)

So by writing that side outputs are not good for my use case I meant that
they are not fitting conceptually, at least in my opinion.

Regards,
Krzysztof




--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re: Deprecated SplitStream class - what should be use instead.

Kostas Kloudas-2
Hi Krzysztof,

If I get it correctly, your main reason behind not using side-outputs
is that it seems that "side-output", by the name, seems to be a
"second class citizen"  compared to the main output.
I see your point but in terms of functionality, there is no difference
between the different outputs from Flink's perspective. Both create
DataStreams that are full integrated with Flink's fault-tolerant state
handling (if checkpointing is enabled) and event-time handling. So I
believe it is safe to use them for your usecase.

I hope this helps,
Kostas

On Thu, Dec 19, 2019 at 10:30 PM KristoffSC
<[hidden email]> wrote:

>
> Kostas, thank you for your response,
>
> Well although the Side Outputs would do the job, I was just surprised that
> those are the replacements for stream splitting.
>
> The thing is, and this is might be only a subjective opinion, it that I
> would assume that Side Outputs should be used only to produce something....
> aside of the main processing function like control messages or some
> leftovers.
>
> In my case, I wanted to simply split the stream into two new streams based
> on some condition.
> With side outputs I will have to "treat" the second stream as a something
> additional to the main processing result.
>
> Like it is written in the docs:
> "*In addition* to the main stream that results from DataStream
> operations(...)"
>
> or
> "The type of data in the result streams does not have to match the type of
> data in the *main *stream and the types of the different side outputs can
> also differ. "
>
>
> I'm my case I don't have any "addition" to my main stream and actually both
> spitted streams are equally important :)
>
> So by writing that side outputs are not good for my use case I meant that
> they are not fitting conceptually, at least in my opinion.
>
> Regards,
> Krzysztof
>
>
>
>
> --
> Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re: Deprecated SplitStream class - what should be use instead.

KristoffSC
Hi Kostas,
Thank you for the answer and clarification.

If Side-outputs are treated in the same way and there is no significant
performance penalty then it seems that they are ok for my use case.

I can accept the name mismatch ;)

Regards,
Krzysztof



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/