Dummy DataStream

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Dummy DataStream

Duck
I have a project where i am reading in on a single DataStream from Kafka, then sending to a variable number of handlers based on content of the recieved data, after that i want to join them all. Since i do not know how many different streams this will create, i cannot have a single "base" to performa a Join operation on. So my question is, can i create a "dummy / empty" DataStream<MyObject> to use as a join basis?

Example:
1) DataStream<MyObject> all = ..
2) Create a List<DataStream<MyObject>> myList;
3) Then i split the "all" datastream based on content, and add each stream to "myList"
4) I now parse each of the different streams....
5) I now want to join my list of streams, "myList" to a DataStream<MyObject> all_joined_again;

/Duck


Reply | Threaded
Open this post in threaded view
|

RE: Dummy DataStream

Radu Tudoran

Hi Duck,

 

I am not 100% sure I understand your exact scenario but I will try to give you some pointers, maybe it will help.

 

Typically when you do the split you have some knowledge about the criterion to do the split.

For example if you follow the example from the website

https://ci.apache.org/projects/flink/flink-docs-release-1.2/dev/datastream_api.html

 

SplitStream<Integer> split = someDataStream.split(new OutputSelector<Integer>() {

    @Override

    public Iterable<String> select(Integer value) {

        List<String> output = new ArrayList<String>();

        if (value % 2 == 0) {

            output.add("even");

        }

        else {

            output.add("odd");

        }

        return output;

    }

});

 

You would know you have a stream for even and odd and then you can collect them in your list by doing

 

myList.add(split.select("even"));

myList.add(split.select("odd"));

 

for that matter, the SplitStream object kind of does the same.

 

I would say that you have 2 options from this to get your full stream back:

You can use the option from the website:

DataStream<Integer> all = split.select("even","odd");

Which I believe does not work as you might have some operations performed on the splits.

The other option is to use union, which aggregates the independent streams without a specific condition like a join.

 

You could do something like

For(DataStream stream:myList)

                allStream = allStream.union(stream)

 

 

 

From: Duck [mailto:[hidden email]]
Sent: Thursday, January 26, 2017 9:08 PM
To: [hidden email]
Subject: Dummy DataStream

 

I have a project where i am reading in on a single DataStream from Kafka, then sending to a variable number of handlers based on content of the recieved data, after that i want to join them all. Since i do not know how many different streams this will create, i cannot have a single "base" to performa a Join operation on. So my question is, can i create a "dummy / empty" DataStream<MyObject> to use as a join basis?

 

Example:

1) DataStream<MyObject> all = ..

2) Create a List<DataStream<MyObject>> myList;

3) Then i split the "all" datastream based on content, and add each stream to "myList"

4) I now parse each of the different streams....

5) I now want to join my list of streams, "myList" to a DataStream<MyObject> all_joined_again;

 

/Duck