general design questions when using flink

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

general design questions when using flink

igor.berman
1. Suppose I have stream of different events(A,B,C). Each event will need it's own processing pipeline.
what is recommended approach of splitting pipelines per each event? I can do some filter operator at the beginning. I can setup different jobs per each event. I can hold every such event in different topic.

2. Suppose I have some bug in business logic operator. Currently when such operator throws NPE or any other error the whole job fails(something like : unable to forward event to next operator )
So my question is - what is best approach to sink such events to somewhere else so that flink job will be more robust. I can wrap operator of Event-> EventTag into Event->EventTagOrError and then install filter that will filter out Errors into log/special-topic etc but then I'll need this for every operator.
another approach that I thought of - might be I can extend map/flatMapFunction to have kind of try-catch wrappers over business logic operator, but then it's not clear how to pass messages that caused such error into some special sink

would like to hear what are "best" practices regarding those cases or at least hear some thoughts 

thanks in advance
Reply | Threaded
Open this post in threaded view
|

Re: general design questions when using flink

Aljoscha Krettek
Hi,
if it is a fixed number of event types and logical pipelines I would probably split them into several jobs to achieve good isolation. There are, however people who go a different way and integrate everything into a general-purpose job that can be dynamically modified and also deals with errors in user code. See for example this recent blog post by King.com (a guest post on the data Artisans blog): http://data-artisans.com/rbea-scalable-real-time-analytics-at-king/

Cheers,
Aljoscha

On Fri, 6 May 2016 at 13:32 Igor Berman <[hidden email]> wrote:
1. Suppose I have stream of different events(A,B,C). Each event will need it's own processing pipeline.
what is recommended approach of splitting pipelines per each event? I can do some filter operator at the beginning. I can setup different jobs per each event. I can hold every such event in different topic.

2. Suppose I have some bug in business logic operator. Currently when such operator throws NPE or any other error the whole job fails(something like : unable to forward event to next operator )
So my question is - what is best approach to sink such events to somewhere else so that flink job will be more robust. I can wrap operator of Event-> EventTag into Event->EventTagOrError and then install filter that will filter out Errors into log/special-topic etc but then I'll need this for every operator.
another approach that I thought of - might be I can extend map/flatMapFunction to have kind of try-catch wrappers over business logic operator, but then it's not clear how to pass messages that caused such error into some special sink

would like to hear what are "best" practices regarding those cases or at least hear some thoughts 

thanks in advance