2 Broadcast streams to a Single Keyed Stream....how to?

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

2 Broadcast streams to a Single Keyed Stream....how to?

Vishal Santoshi
I have 2 broadcast streams that carry rules to be applied to a third keyed  stream. The connect method of the keyed stream only takes a single broadcast stream. How do I connect the 2 broadcast stream to that single keyed stream.

  Do I have 2 connects and thus 2 instances of BroadcastConnextedStream, union them and then apply process through a single SingleOutpitStreamOperator ? The issue I see there are 2 keyBy calls and an additional shuffle before connect is called.

To be precise, is there a simple example of applying 2 dissimilar rules through 2 broadcast streams, thus 2 different MapStateDiscriptors, to a single keyed stream without any unnecessary overhead...


 
Reply | Threaded
Open this post in threaded view
|

Re: 2 Broadcast streams to a Single Keyed Stream....how to?

Vishal Santoshi
I think I can connect the broadcast streams to the single keyed stream (as in 2 connects ) and access the 2 or n broadcast states through the key in the Broadcast  Map in processBroadcastElement , using a single instance of the Processor function. In essence I need to merge the results of the 2 rules careying broadcast stream in a single keyed brooadcast function .....

On Tue, Sep 18, 2018, 8:59 AM Vishal Santoshi <[hidden email]> wrote:
I have 2 broadcast streams that carry rules to be applied to a third keyed  stream. The connect method of the keyed stream only takes a single broadcast stream. How do I connect the 2 broadcast stream to that single keyed stream.

  Do I have 2 connects and thus 2 instances of BroadcastConnextedStream, union them and then apply process through a single SingleOutpitStreamOperator ? The issue I see there are 2 keyBy calls and an additional shuffle before connect is called.

To be precise, is there a simple example of applying 2 dissimilar rules through 2 broadcast streams, thus 2 different MapStateDiscriptors, to a single keyed stream without any unnecessary overhead...


 
Reply | Threaded
Open this post in threaded view
|

Re: 2 Broadcast streams to a Single Keyed Stream....how to?

Xingcan Cui
In reply to this post by Vishal Santoshi
Hi Vishal,

You could try 1) merging these two rule streams first with the `union` method if they get the same type or 2) connecting them and encapsulate the records from both sides to a unified type (e.g., scala Either).

Best,
Xingcan

> On Sep 18, 2018, at 8:59 PM, Vishal Santoshi <[hidden email]> wrote:
>
> I have 2 broadcast streams that carry rules to be applied to a third keyed  stream. The connect method of the keyed stream only takes a single broadcast stream. How do I connect the 2 broadcast stream to that single keyed stream.
>
>   Do I have 2 connects and thus 2 instances of BroadcastConnextedStream, union them and then apply process through a single SingleOutpitStreamOperator ? The issue I see there are 2 keyBy calls and an additional shuffle before connect is called.
>
> To be precise, is there a simple example of applying 2 dissimilar rules through 2 broadcast streams, thus 2 different MapStateDiscriptors, to a single keyed stream without any unnecessary overhead...
>
>
>  

Reply | Threaded
Open this post in threaded view
|

Re: 2 Broadcast streams to a Single Keyed Stream....how to?

Vishal Santoshi
I could do that, but I was under the impression that 2 or more disparate broadcast states could be provided to a keyed stream, referenced through a key in the Map State...That would be cleaner as in the fact that 2 different set of rules are to be applied are explictely declared rather then carries inside the datums of a unioned stream...... I will look at second option...

On Tue, Sep 18, 2018, 9:15 AM Xingcan Cui <[hidden email]> wrote:
Hi Vishal,

You could try 1) merging these two rule streams first with the `union` method if they get the same type or 2) connecting them and encapsulate the records from both sides to a unified type (e.g., scala Either).

Best,
Xingcan

> On Sep 18, 2018, at 8:59 PM, Vishal Santoshi <[hidden email]> wrote:
>
> I have 2 broadcast streams that carry rules to be applied to a third keyed  stream. The connect method of the keyed stream only takes a single broadcast stream. How do I connect the 2 broadcast stream to that single keyed stream.
>
>   Do I have 2 connects and thus 2 instances of BroadcastConnextedStream, union them and then apply process through a single SingleOutpitStreamOperator ? The issue I see there are 2 keyBy calls and an additional shuffle before connect is called.
>
> To be precise, is there a simple example of applying 2 dissimilar rules through 2 broadcast streams, thus 2 different MapStateDiscriptors, to a single keyed stream without any unnecessary overhead...
>
>


Reply | Threaded
Open this post in threaded view
|

Re: 2 Broadcast streams to a Single Keyed Stream....how to?

Xingcan Cui
Hi Vishal,

Actually, you could provide multiple MapStateDescriptors for the `broadcast()` method and then use them, separately.

Best,
Xingcan

On Sep 18, 2018, at 9:29 PM, Vishal Santoshi <[hidden email]> wrote:

I could do that, but I was under the impression that 2 or more disparate broadcast states could be provided to a keyed stream, referenced through a key in the Map State...That would be cleaner as in the fact that 2 different set of rules are to be applied are explictely declared rather then carries inside the datums of a unioned stream...... I will look at second option...

On Tue, Sep 18, 2018, 9:15 AM Xingcan Cui <[hidden email]> wrote:
Hi Vishal,

You could try 1) merging these two rule streams first with the `union` method if they get the same type or 2) connecting them and encapsulate the records from both sides to a unified type (e.g., scala Either).

Best,
Xingcan

> On Sep 18, 2018, at 8:59 PM, Vishal Santoshi <[hidden email]> wrote:
>
> I have 2 broadcast streams that carry rules to be applied to a third keyed  stream. The connect method of the keyed stream only takes a single broadcast stream. How do I connect the 2 broadcast stream to that single keyed stream.
>
>   Do I have 2 connects and thus 2 instances of BroadcastConnextedStream, union them and then apply process through a single SingleOutpitStreamOperator ? The issue I see there are 2 keyBy calls and an additional shuffle before connect is called.
>
> To be precise, is there a simple example of applying 2 dissimilar rules through 2 broadcast streams, thus 2 different MapStateDiscriptors, to a single keyed stream without any unnecessary overhead...
>
>



Reply | Threaded
Open this post in threaded view
|

Re: 2 Broadcast streams to a Single Keyed Stream....how to?

Vishal Santoshi
Options ( here A is the non broadcast stream type  and X and Y are the streams to be broadcast types). 

Note also the the Broadcasts of X and Y happen at different intervals and we require the 2 Rules to be applied in a single execution and run some merge routine on the output in the processElement method, without leaving it for downstream operators.

1. ConnectedStream(X,Y)  connectedStream =  BroadcastStream<X>.connect(BroadcastStream<Y>);
    connectedStream.broadcast() ; 
 // broadcast not allowed on connected stream

2.  DataStream<X> unionedStream  =  DataStream<X>.union(DatStream<X>) ;//here X.type = Y.type
     B = unionedStream.broadcast( ..) ; 
     DataStream<A>.keyBy(..) connect(B);
    // But here we unifying to a single Descriptor, thus the hint as to which X or Y the rule came from has to be implicit in the the data, not at all declararitive.
    

3. DataStream<A>.keyBy().connect(DataStream<X>.broadcast( ..) ).process ( new () ) ; 
    DataStream<A>.keyBy().connect(DataStream<Y>.broadcast( ..) ).process ( new () ) 
    // 2 keyBy overhead and a subsequent merge step?
  
4. MapStateDescriptor<X> one= ..;
MapStateDescriptor<Y> two=..; 
BroadcastStream<X> oneBroadCastStream = DataStream<X>.broadcast(one);
BroadcastStream<Y> twoBroadCastStream = DataStream<Y>.broadcast(two);
DataStream<A>.keyBy()
      .connect(oneBroadCastStream).
      .connect(twoBroadCastStream) // not possible
     ).process(new KeyedBroadcastProcessFunction(){
      processElement(..);
      processBrodcastElement( Context, Out, X ,Y....) {
            // here figure out which local operator state to replace 
     }
}))

I do not see a clean way at all to out in 2 StateDescriptors to a single KeyedBroadcastProcessFunction from 2 ( or more ) MapDescriptors even though I evidently can broadcast each stream and connect each stream independently to the non broadcast stream and access the states independently.






I am not sure, that without reducing the 2 Rules to a single type 
























  


On Tue, Sep 18, 2018 at 9:38 AM Xingcan Cui <[hidden email]> wrote:
Hi Vishal,

Actually, you could provide multiple MapStateDescriptors for the `broadcast()` method and then use them, separately.

Best,
Xingcan

On Sep 18, 2018, at 9:29 PM, Vishal Santoshi <[hidden email]> wrote:

I could do that, but I was under the impression that 2 or more disparate broadcast states could be provided to a keyed stream, referenced through a key in the Map State...That would be cleaner as in the fact that 2 different set of rules are to be applied are explictely declared rather then carries inside the datums of a unioned stream...... I will look at second option...

On Tue, Sep 18, 2018, 9:15 AM Xingcan Cui <[hidden email]> wrote:
Hi Vishal,

You could try 1) merging these two rule streams first with the `union` method if they get the same type or 2) connecting them and encapsulate the records from both sides to a unified type (e.g., scala Either).

Best,
Xingcan

> On Sep 18, 2018, at 8:59 PM, Vishal Santoshi <[hidden email]> wrote:
>
> I have 2 broadcast streams that carry rules to be applied to a third keyed  stream. The connect method of the keyed stream only takes a single broadcast stream. How do I connect the 2 broadcast stream to that single keyed stream.
>
>   Do I have 2 connects and thus 2 instances of BroadcastConnextedStream, union them and then apply process through a single SingleOutpitStreamOperator ? The issue I see there are 2 keyBy calls and an additional shuffle before connect is called.
>
> To be precise, is there a simple example of applying 2 dissimilar rules through 2 broadcast streams, thus 2 different MapStateDiscriptors, to a single keyed stream without any unnecessary overhead...
>
>



Reply | Threaded
Open this post in threaded view
|

Re: 2 Broadcast streams to a Single Keyed Stream....how to?

Vishal Santoshi
of course 

2.  DataStream<X> unionedStream  =  DataStream<X>.union(DatStream<X>) ;//here X.type = Y.type
     B = unionedStream.broadcast( ..) ; 
     DataStream<A>.keyBy(..) connect(B);
    // But here we unifying to a single Descriptor, thus the hint as to which X or Y the rule came from has to be implicit in the the data, not at all declararitive.


should work.

On Tue, Sep 18, 2018 at 10:51 AM Vishal Santoshi <[hidden email]> wrote:
Options ( here A is the non broadcast stream type  and X and Y are the streams to be broadcast types). 

Note also the the Broadcasts of X and Y happen at different intervals and we require the 2 Rules to be applied in a single execution and run some merge routine on the output in the processElement method, without leaving it for downstream operators.

1. ConnectedStream(X,Y)  connectedStream =  BroadcastStream<X>.connect(BroadcastStream<Y>);
    connectedStream.broadcast() ; 
 // broadcast not allowed on connected stream

2.  DataStream<X> unionedStream  =  DataStream<X>.union(DatStream<X>) ;//here X.type = Y.type
     B = unionedStream.broadcast( ..) ; 
     DataStream<A>.keyBy(..) connect(B);
    // But here we unifying to a single Descriptor, thus the hint as to which X or Y the rule came from has to be implicit in the the data, not at all declararitive.
    

3. DataStream<A>.keyBy().connect(DataStream<X>.broadcast( ..) ).process ( new () ) ; 
    DataStream<A>.keyBy().connect(DataStream<Y>.broadcast( ..) ).process ( new () ) 
    // 2 keyBy overhead and a subsequent merge step?
  
4. MapStateDescriptor<X> one= ..;
MapStateDescriptor<Y> two=..; 
BroadcastStream<X> oneBroadCastStream = DataStream<X>.broadcast(one);
BroadcastStream<Y> twoBroadCastStream = DataStream<Y>.broadcast(two);
DataStream<A>.keyBy()
      .connect(oneBroadCastStream).
      .connect(twoBroadCastStream) // not possible
     ).process(new KeyedBroadcastProcessFunction(){
      processElement(..);
      processBrodcastElement( Context, Out, X ,Y....) {
            // here figure out which local operator state to replace 
     }
}))

I do not see a clean way at all to out in 2 StateDescriptors to a single KeyedBroadcastProcessFunction from 2 ( or more ) MapDescriptors even though I evidently can broadcast each stream and connect each stream independently to the non broadcast stream and access the states independently.






I am not sure, that without reducing the 2 Rules to a single type 
























  


On Tue, Sep 18, 2018 at 9:38 AM Xingcan Cui <[hidden email]> wrote:
Hi Vishal,

Actually, you could provide multiple MapStateDescriptors for the `broadcast()` method and then use them, separately.

Best,
Xingcan

On Sep 18, 2018, at 9:29 PM, Vishal Santoshi <[hidden email]> wrote:

I could do that, but I was under the impression that 2 or more disparate broadcast states could be provided to a keyed stream, referenced through a key in the Map State...That would be cleaner as in the fact that 2 different set of rules are to be applied are explictely declared rather then carries inside the datums of a unioned stream...... I will look at second option...

On Tue, Sep 18, 2018, 9:15 AM Xingcan Cui <[hidden email]> wrote:
Hi Vishal,

You could try 1) merging these two rule streams first with the `union` method if they get the same type or 2) connecting them and encapsulate the records from both sides to a unified type (e.g., scala Either).

Best,
Xingcan

> On Sep 18, 2018, at 8:59 PM, Vishal Santoshi <[hidden email]> wrote:
>
> I have 2 broadcast streams that carry rules to be applied to a third keyed  stream. The connect method of the keyed stream only takes a single broadcast stream. How do I connect the 2 broadcast stream to that single keyed stream.
>
>   Do I have 2 connects and thus 2 instances of BroadcastConnextedStream, union them and then apply process through a single SingleOutpitStreamOperator ? The issue I see there are 2 keyBy calls and an additional shuffle before connect is called.
>
> To be precise, is there a simple example of applying 2 dissimilar rules through 2 broadcast streams, thus 2 different MapStateDiscriptors, to a single keyed stream without any unnecessary overhead...
>
>