Hi again, something that I don't find (easily) in the documentation is what the recommended method is to discard data from the stream.DataStream<MyType> stream = But that seems a bit misleading, as the casual observer will get the idea that MyFunction 'branches' out, but it doesn't. The other "obvious" choice is to return null and follow with a filter... DataStream<MyType> stream =BUT, that doesn't work with Java 8 method references like above, so I have to create my own filter to get the type information correct to Flink; DataStream<MyType> stream = And in my opinion, that ends up looking ugly as the streams/pipeline (not used to terminology yet) quickly have many transformations and branches, and having a null check after each seems to put the burden of knowledge in the wrong spot ("Can this function return null?") Throwing an exception is shutting down the entire stream, which seems overly aggressive for many data related discards. Any other choices? Cheers -- Niclas Hedhman, Software Developer http://zest.apache.org - New Energy for Java |
Hi Niclas, I'd either add a Filter to directly discard bad records. That should make the behavior explicit.If you need to do complex transformations that you don't want to do twice, the FlatMap approach would be the most efficient. [1] https://ci.apache.org/projects/flink/flink-docs-release-1.4/dev/stream/side_output.html 2018-02-19 10:29 GMT+01:00 Niclas Hedhman <[hidden email]>:
|
On Mon, Feb 19, 2018 at 8:46 PM, Fabian Hueske <[hidden email]> wrote:
-- Niclas Hedhman, Software Developer http://zest.apache.org - New Energy for Java |
Free forum by Nabble | Edit this page |