Hi,
I read the Side Inputs design document. How does it compare to using ConnectedStreams with respect to handling the ordering of streams transparently? One of the challenges I have with ConnectedStreams is I need to buffer main input if the rules stream has not arrived yet. Does this automatically go away with Side Inputs? Will the call to String sideValue = getRuntimeContext().getSideInput(filterString); block if the side input is not available yet? And is the reverse also true? Alternatively, if my rules are not large in number and I want to broadcast them to all nodes is the below equivalent to using SideInputs where side inputs are broadcast to all nodes and ensure that the side input is evaluated before the main input: DataStream ds4 = ds3.connect(dsSide.broadcast()); Will the above ensure that dsSide is always available before ds3 elements arrive on the connected stream. Am I correct in assuming that ds2 changes will continue to be broadcast to ds3 (with no ordering guarantees between ds3 and dsSide, ofcourse). Thanks, Sameer |
Hi Sameer, the semantics of side inputs are not fully fledged out yet, as far as I know. The design doc says that one first waits for some data on the side input before starting processing the main input. Therefore, I would assume that you have some temporal ordering of the side input and main input compared to connected streams. But don't take this as guaranteed since this feature is still under development and the semantics are not fully decided yet. Cheers, Till On Mon, Oct 3, 2016 at 2:31 PM, Sameer W <[hidden email]> wrote:
|
> Will the above ensure that dsSide is always available before ds3 elements arrive on the connected stream. Am I correct in assuming that ds2 changes will continue to be broadcast to ds3 (with no ordering guarantees between ds3 and dsSide, ofcourse).
Broadcasting is just a partition strategy. If you broadcast a stream, all records part of the stream will be replicated to all partitions. This is in contrast with a keyBy which would spread records among partitions. So yes, in the example `DataStream ds4 = ds3.connect(dsSide.broadcast());`, you can't assume dsSide is made available first. The elements will arrive as they are sent over the wire. On Tue, Oct 4, 2016 at 11:45 AM, Till Rohrmann <[hidden email]> wrote: > Hi Sameer, > > the semantics of side inputs are not fully fledged out yet, as far as I > know. > > The design doc says that one first waits for some data on the side input > before starting processing the main input. Therefore, I would assume that > you have some temporal ordering of the side input and main input compared to > connected streams. > > But don't take this as guaranteed since this feature is still under > development and the semantics are not fully decided yet. > > Cheers, > Till > > On Mon, Oct 3, 2016 at 2:31 PM, Sameer W <[hidden email]> wrote: >> >> Hi, >> >> I read the Side Inputs design document. How does it compare to using >> ConnectedStreams with respect to handling the ordering of streams >> transparently? >> >> One of the challenges I have with ConnectedStreams is I need to buffer >> main input if the rules stream has not arrived yet. Does this automatically >> go away with Side Inputs? Will the call to String sideValue = >> >> getRuntimeContext().getSideInput(filterString); >> >> block if the side input is not available yet? And is the reverse also >> true? >> >> Alternatively, if my rules are not large in number and I want to broadcast >> them to all nodes is the below equivalent to using SideInputs where side >> inputs are broadcast to all nodes and ensure that the side input is >> evaluated before the main input: >> >> DataStream ds4 = ds3.connect(dsSide.broadcast()); >> >> Will the above ensure that dsSide is always available before ds3 elements >> arrive on the connected stream. Am I correct in assuming that ds2 changes >> will continue to be broadcast to ds3 (with no ordering guarantees between >> ds3 and dsSide, ofcourse). >> >> >> Thanks, >> Sameer >> >> >> > |
Free forum by Nabble | Edit this page |