Three input stream operator and back pressure

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Three input stream operator and back pressure

Dmitry Golubets
Hi,

there are only two interfaces defined at the moment:
OneInputStreamOperator
and
TwoInputStreamOperator.

Is there any way to define an operator with arbitrary number of inputs?

My another concern is how to maintain backpressure in the operator?
Let's say I read events from two Kafka sources, both of which are ordered by time. I want to merge them keeping the global order. But to do it, I need to stop block one input if another one has no data yet.

Best regards,
Dmitry
Reply | Threaded
Open this post in threaded view
|

Re: Three input stream operator and back pressure

Timo Walther
Hi Dmitry,

the runtime supports an arbitrary number of inputs, however, the API does currently not provide a convenient way. You could use the "union" operator to reduce the number of inputs. Otherwise I think you have to implement your own operator. That depends on your use case though.

You can maintain backpressure by using Flink's operator state. But did you also thought about a Window Join instead?

I hope that helps.

Timo




Am 17/01/17 um 00:20 schrieb Dmitry Golubets:
Hi,

there are only two interfaces defined at the moment:
OneInputStreamOperator
and
TwoInputStreamOperator.

Is there any way to define an operator with arbitrary number of inputs?

My another concern is how to maintain backpressure in the operator?
Let's say I read events from two Kafka sources, both of which are ordered by time. I want to merge them keeping the global order. But to do it, I need to stop block one input if another one has no data yet.

Best regards,
Dmitry


Reply | Threaded
Open this post in threaded view
|

Re: Three input stream operator and back pressure

Dmitry Golubets
Hi Timo,

I don't have any key to join on, so I'm not sure Window Join would work for me.

Can I implement my own "low level" operator in any way?
I would appreciate if you can give me a hint or a link to example of how to do it.



Best regards,
Dmitry

On Tue, Jan 17, 2017 at 9:24 AM, Timo Walther <[hidden email]> wrote:
Hi Dmitry,

the runtime supports an arbitrary number of inputs, however, the API does currently not provide a convenient way. You could use the "union" operator to reduce the number of inputs. Otherwise I think you have to implement your own operator. That depends on your use case though.

You can maintain backpressure by using Flink's operator state. But did you also thought about a Window Join instead?

I hope that helps.

Timo




Am 17/01/17 um 00:20 schrieb Dmitry Golubets:
Hi,

there are only two interfaces defined at the moment:
OneInputStreamOperator
and
TwoInputStreamOperator.

Is there any way to define an operator with arbitrary number of inputs?

My another concern is how to maintain backpressure in the operator?
Let's say I read events from two Kafka sources, both of which are ordered by time. I want to merge them keeping the global order. But to do it, I need to stop block one input if another one has no data yet.

Best regards,
Dmitry



Reply | Threaded
Open this post in threaded view
|

Re: Three input stream operator and back pressure

Stephan Ewen
Hi Dmitry!

The streaming runtime makes a conscious decision to not merge streams as in an ordered merge.
The reason is that this is at large scale typically bad for scalability / network performance.
Also, in certain DAGs, it may lead to deadlocks.

Even the two input operator delivers records on a low level in a first-come-first-serve order as driven by network events (NIO events).

Flink's operators tolerate out-of-order records to compensate for that. Overall, that seemed the more scalable design to us.
Can your use case follow a similar approach?

Stephan



On Tue, Jan 17, 2017 at 10:57 AM, Dmitry Golubets <[hidden email]> wrote:
Hi Timo,

I don't have any key to join on, so I'm not sure Window Join would work for me.

Can I implement my own "low level" operator in any way?
I would appreciate if you can give me a hint or a link to example of how to do it.



Best regards,
Dmitry

On Tue, Jan 17, 2017 at 9:24 AM, Timo Walther <[hidden email]> wrote:
Hi Dmitry,

the runtime supports an arbitrary number of inputs, however, the API does currently not provide a convenient way. You could use the "union" operator to reduce the number of inputs. Otherwise I think you have to implement your own operator. That depends on your use case though.

You can maintain backpressure by using Flink's operator state. But did you also thought about a Window Join instead?

I hope that helps.

Timo




Am 17/01/17 um 00:20 schrieb Dmitry Golubets:
Hi,

there are only two interfaces defined at the moment:
OneInputStreamOperator
and
TwoInputStreamOperator.

Is there any way to define an operator with arbitrary number of inputs?

My another concern is how to maintain backpressure in the operator?
Let's say I read events from two Kafka sources, both of which are ordered by time. I want to merge them keeping the global order. But to do it, I need to stop block one input if another one has no data yet.

Best regards,
Dmitry




Reply | Threaded
Open this post in threaded view
|

Re: Three input stream operator and back pressure

Dmitry Golubets
Hi Stephan,

In one of our components we have to process events in order, due to business logic requirements.
That is for sure introduces a bottleneck, but other aspects are fine.

I'm not taking about really resorting data, but just about consuming it in the right order.
I.e. if two streams are already in order, all that has to be done is to consume one that has the Min element at it's head and backpressure another one.

What I can do ofc is to create a custom Source for it. But I would prefer not to mix source dependent logic (e.g. Kafka connection, etc) and merging logic.

Best regards,
Dmitry

On Tue, Jan 17, 2017 at 10:46 AM, Stephan Ewen <[hidden email]> wrote:
Hi Dmitry!

The streaming runtime makes a conscious decision to not merge streams as in an ordered merge.
The reason is that this is at large scale typically bad for scalability / network performance.
Also, in certain DAGs, it may lead to deadlocks.

Even the two input operator delivers records on a low level in a first-come-first-serve order as driven by network events (NIO events).

Flink's operators tolerate out-of-order records to compensate for that. Overall, that seemed the more scalable design to us.
Can your use case follow a similar approach?

Stephan



On Tue, Jan 17, 2017 at 10:57 AM, Dmitry Golubets <[hidden email]> wrote:
Hi Timo,

I don't have any key to join on, so I'm not sure Window Join would work for me.

Can I implement my own "low level" operator in any way?
I would appreciate if you can give me a hint or a link to example of how to do it.



Best regards,
Dmitry

On Tue, Jan 17, 2017 at 9:24 AM, Timo Walther <[hidden email]> wrote:
Hi Dmitry,

the runtime supports an arbitrary number of inputs, however, the API does currently not provide a convenient way. You could use the "union" operator to reduce the number of inputs. Otherwise I think you have to implement your own operator. That depends on your use case though.

You can maintain backpressure by using Flink's operator state. But did you also thought about a Window Join instead?

I hope that helps.

Timo




Am 17/01/17 um 00:20 schrieb Dmitry Golubets:
Hi,

there are only two interfaces defined at the moment:
OneInputStreamOperator
and
TwoInputStreamOperator.

Is there any way to define an operator with arbitrary number of inputs?

My another concern is how to maintain backpressure in the operator?
Let's say I read events from two Kafka sources, both of which are ordered by time. I want to merge them keeping the global order. But to do it, I need to stop block one input if another one has no data yet.

Best regards,
Dmitry





Reply | Threaded
Open this post in threaded view
|

Re: Three input stream operator and back pressure

Stephan Ewen
Hi!

Just to avoid confusion: the DataStream network readers does currently not support backpressuring only one input (as this conflicts with other design aspects). (The DataSet network readers do support that FYI)

How about simply "correcting" the order later? If you have pre-sorted data per stream, you can generate frequent watermarks trivially (every 100 ms, based the event's timestamp that you would use for the merge) and then apply event time windows of say 100ms, inside which you sort and emit the elements. The windows are strictly evaluated in order, so the resulting stream is sorted. This would be similar to a form of incremental "bucketing" sort over the merged stream.

That will give you a sorted stream easily, any may even be not too expensive.

Stephan


On Tue, Jan 17, 2017 at 1:05 PM, Dmitry Golubets <[hidden email]> wrote:
Hi Stephan,

In one of our components we have to process events in order, due to business logic requirements.
That is for sure introduces a bottleneck, but other aspects are fine.

I'm not taking about really resorting data, but just about consuming it in the right order.
I.e. if two streams are already in order, all that has to be done is to consume one that has the Min element at it's head and backpressure another one.

What I can do ofc is to create a custom Source for it. But I would prefer not to mix source dependent logic (e.g. Kafka connection, etc) and merging logic.

Best regards,
Dmitry

On Tue, Jan 17, 2017 at 10:46 AM, Stephan Ewen <[hidden email]> wrote:
Hi Dmitry!

The streaming runtime makes a conscious decision to not merge streams as in an ordered merge.
The reason is that this is at large scale typically bad for scalability / network performance.
Also, in certain DAGs, it may lead to deadlocks.

Even the two input operator delivers records on a low level in a first-come-first-serve order as driven by network events (NIO events).

Flink's operators tolerate out-of-order records to compensate for that. Overall, that seemed the more scalable design to us.
Can your use case follow a similar approach?

Stephan



On Tue, Jan 17, 2017 at 10:57 AM, Dmitry Golubets <[hidden email]> wrote:
Hi Timo,

I don't have any key to join on, so I'm not sure Window Join would work for me.

Can I implement my own "low level" operator in any way?
I would appreciate if you can give me a hint or a link to example of how to do it.



Best regards,
Dmitry

On Tue, Jan 17, 2017 at 9:24 AM, Timo Walther <[hidden email]> wrote:
Hi Dmitry,

the runtime supports an arbitrary number of inputs, however, the API does currently not provide a convenient way. You could use the "union" operator to reduce the number of inputs. Otherwise I think you have to implement your own operator. That depends on your use case though.

You can maintain backpressure by using Flink's operator state. But did you also thought about a Window Join instead?

I hope that helps.

Timo




Am 17/01/17 um 00:20 schrieb Dmitry Golubets:
Hi,

there are only two interfaces defined at the moment:
OneInputStreamOperator
and
TwoInputStreamOperator.

Is there any way to define an operator with arbitrary number of inputs?

My another concern is how to maintain backpressure in the operator?
Let's say I read events from two Kafka sources, both of which are ordered by time. I want to merge them keeping the global order. But to do it, I need to stop block one input if another one has no data yet.

Best regards,
Dmitry