Hi all,
we came across some interesting behaviour today. We enabled object reuse on a streaming job that looks like this: stream = env.addSource(source) stream.map(mapFnA).addSink(sinkA) stream.map(mapFnB).addSink(sinkB) Operator chaining is enabled, so the optimizer fuses all operations into a single slot. The same object reference gets passed to both mapFnA and mapFnB. This makes sense when I think about the internal implementation, but it still came as a bit of a surprise since the object reuse docs (for batch - there are no official ones for streaming, right?) don't really deal with splitting the DataSet/DataStream. I guess my case is *technically* covered by the documented warning that it is unsafe to reuse an object that has already been collected, only in this case this reuse is "hidden" behind the stream definition DSL. Is this the expected behaviour? Is object reuse for DataStreams encouraged at all or is it more of a "hidden beta" feature until FLIP-21 is officially finished? Best, Urs -- Urs Schönenberger - [hidden email] TNG Technology Consulting GmbH, Beta-Straße 13a, 85774 Unterföhring Geschäftsführer: Henrik Klagges, Dr. Robert Dahlke, Gerhard Müller Sitz: Unterföhring * Amtsgericht München * HRB 135082 |
Hi Urs, The "object reuse" is runtime behavior and it's configuration item belongs `ExecutionConfig` (this class both for batch and streaming), so it takes efforts for both batch and streaming[1]. "Object reuse" feature is not perfect, the main use case is for batch[2] and FLIP-21 tries to promote this feature gracefully, but the current state is "Under discussion". Thanks, vino. 2018-07-17 20:50 GMT+08:00 Urs Schoenenberger <[hidden email]>: Hi all, |
Free forum by Nabble | Edit this page |