Trapping Streaming Errors

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Trapping Streaming Errors

Joe Olson
If I am processing a stream in the following manner:

val stream = env.addSource(consumer).name("KafkaStream")
                    .keyBy(x => (x.obj.ID1(),x.obj.ID2(),x.obj.ID3())
                    .flatMap(new FlatMapProcessor)

and the IDs bomb out because of deserialization issues, my job crashes with a 'Could not extract key' error. How can I trap this cleanly? The only thing I can think of is to validate the IDs in the deserialization class argument that is used in the KafkaConsumer constructor, and trap any issues there. Is that the preferred way? Is there a better way?
Reply | Threaded
Open this post in threaded view
|

Re: Trapping Streaming Errors

Fabian Hueske-2
Hi Joe,

you can also insert a MapFunction between the Kafka source and the keyBy to validate the IDs.
The mapper will be chained and should not add only minimal overhead. If you want to keep the events which were incorrectly deserialized, you can use split() to move them somewhere.

Validation in the deserialization code works as well of course but would not allow to reroute invalid events.

Best, Fabian

2017-02-16 5:03 GMT+01:00 Joe Olson <[hidden email]>:
If I am processing a stream in the following manner:

val stream = env.addSource(consumer).name("KafkaStream")
                    .keyBy(x => (x.obj.ID1(),x.obj.ID2(),x.obj.ID3())
                    .flatMap(new FlatMapProcessor)

and the IDs bomb out because of deserialization issues, my job crashes with a 'Could not extract key' error. How can I trap this cleanly? The only thing I can think of is to validate the IDs in the deserialization class argument that is used in the KafkaConsumer constructor, and trap any issues there. Is that the preferred way? Is there a better way?