I have a project where i am reading in on a single DataStream from Kafka, then sending to a variable number of handlers based on content of the recieved data, after that i want to join them all. Since i do not know how many different streams this will create, i cannot have a single "base" to performa a Join operation on. So my question is, can i create a "dummy / empty" DataStream<MyObject> to use as a join basis? Example: 1) DataStream<MyObject> all = .. 2) Create a List<DataStream<MyObject>> myList; 3) Then i split the "all" datastream based on content, and add each stream to "myList" 4) I now parse each of the different streams.... 5) I now want to join my list of streams, "myList" to a DataStream<MyObject> all_joined_again; /Duck |
Hi Duck, I am not 100% sure I understand your exact scenario but I will try to give you some pointers, maybe it will help. Typically when you do the split you have some knowledge about the criterion to do the split.
For example if you follow the example from the website https://ci.apache.org/projects/flink/flink-docs-release-1.2/dev/datastream_api.html
SplitStream<Integer>
split
=
someDataStream.split(new
OutputSelector<Integer>()
{
@Override
public
Iterable<String>
select(Integer
value)
{
List<String>
output
=
new
ArrayList<String>();
if
(value
%
2
==
0)
{
output.add("even");
}
else
{
output.add("odd");
}
return
output;
} }); You would know you have a stream for even and odd and then you can collect them in your list by doing myList.add(split.select("even")); myList.add(split.select("odd")); for that matter, the SplitStream object kind of does the same. I would say that you have 2 options from this to get your full stream back: You can use the option from the website: DataStream<Integer>
all = split.select("even","odd"); Which I believe does not work as you might have some operations performed on the splits. The other option is to use union, which aggregates the independent streams without a specific condition like a join. You could do something like For(DataStream stream:myList) allStream = allStream.union(stream) From: Duck [mailto:[hidden email]]
I have a project where i am reading in on a single DataStream from Kafka, then sending to a variable number of handlers based on content of the recieved data, after that i want to join them all. Since i do not know how many different streams
this will create, i cannot have a single "base" to performa a Join operation on. So my question is, can i create a "dummy / empty" DataStream<MyObject> to use as a join basis? Example: 1) DataStream<MyObject> all = .. 2) Create a List<DataStream<MyObject>> myList; 3) Then i split the "all" datastream based on content, and add each stream to "myList" 4) I now parse each of the different streams.... 5) I now want to join my list of streams, "myList" to a DataStream<MyObject> all_joined_again; /Duck |
Free forum by Nabble | Edit this page |