Side effects or multiple sinks on streaming jobs?

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Side effects or multiple sinks on streaming jobs?

Luis Mariano Guerra
hi,

 I'm migrating some samza jobs to flink streaming, and on samza we sent the errors to a kafka topic to make it easier to display on dashboards, I would like to do the same on flink, what do you recommend?


Reply | Threaded
Open this post in threaded view
|

Re: Side effects or multiple sinks on streaming jobs?

rmetzger0
Hi Luis,

You can define as many data sinks as you want in a Flink job topology.
So its not a problem for your use case to define two Kafka sinks, sending data to different topics.

Regards,
Robert


On Tue, Oct 25, 2016 at 3:30 PM, Luis Mariano Guerra <[hidden email]> wrote:
hi,

 I'm migrating some samza jobs to flink streaming, and on samza we sent the errors to a kafka topic to make it easier to display on dashboards, I would like to do the same on flink, what do you recommend?



Reply | Threaded
Open this post in threaded view
|

Re: Side effects or multiple sinks on streaming jobs?

Luis Mariano Guerra
do I have to send the errors "in band"?

that is, return maybe more than one tuple in my operations then flatmap and use a KeyedSerializationSchema?

or is there a way to emit a tuple to another sink from within operations directly?

On Wed, Oct 26, 2016 at 9:20 AM, Robert Metzger <[hidden email]> wrote:
Hi Luis,

You can define as many data sinks as you want in a Flink job topology.
So its not a problem for your use case to define two Kafka sinks, sending data to different topics.

Regards,
Robert


On Tue, Oct 25, 2016 at 3:30 PM, Luis Mariano Guerra <[hidden email]> wrote:
hi,

 I'm migrating some samza jobs to flink streaming, and on samza we sent the errors to a kafka topic to make it easier to display on dashboards, I would like to do the same on flink, what do you recommend?




Reply | Threaded
Open this post in threaded view
|

Re: Side effects or multiple sinks on streaming jobs?

rmetzger0
Yes, you need to send the errors in band.

There are two options how you implement it:

A) Using the KeyedSerializationSchema, then you need to define only one Kafka Producer, because you can specify the target topic using the KeyedSerializationSchema.getTargetTopic() method.


(operator generating errors) --> stream with errors --> sink

B) Using two sinks (then, you probably don't need to use the KeyedSerializationSchema)

(operator generating errors) --> (filter) --> stream without errors --> sink
                             --> (filter) --> error stream  --> sink

To distinguish between error / regular data, you could use for example a Tuple2<>.

Regards,
Robert 


On Wed, Oct 26, 2016 at 10:27 AM, Luis Mariano Guerra <[hidden email]> wrote:
do I have to send the errors "in band"?

that is, return maybe more than one tuple in my operations then flatmap and use a KeyedSerializationSchema?

or is there a way to emit a tuple to another sink from within operations directly?

On Wed, Oct 26, 2016 at 9:20 AM, Robert Metzger <[hidden email]> wrote:
Hi Luis,

You can define as many data sinks as you want in a Flink job topology.
So its not a problem for your use case to define two Kafka sinks, sending data to different topics.

Regards,
Robert


On Tue, Oct 25, 2016 at 3:30 PM, Luis Mariano Guerra <[hidden email]> wrote:
hi,

 I'm migrating some samza jobs to flink streaming, and on samza we sent the errors to a kafka topic to make it easier to display on dashboards, I would like to do the same on flink, what do you recommend?