Greetings,
Is there a means of maintaining a stream's partitioning after running it through an operation such as map or filter? I have a pipeline stage S that operates on a stream partitioned by an ID field. S flat maps objects of type A to type B, which both have an "ID" field, and where each instance of B that S outputs has the same ID as its input instance of A. I hope to add a pipeline stage T immediately after S that operates using the same partitioning as S, so that I can avoid the expense of re-keying the instances of type B. If I am understanding the DataStream API correctly this is not feasible with Flink, as map(), filter() etc. all outputĀ SingleOutputStreamOperator. But I am hoping that I am missing something. Thank you, Ryan |
Hello,
I think if you have multiple keyBy() transformations with identical parallelism the partitioning should be "preserved". The second keyBy() will still go through the partitioning process, but since both the key and parallelism are identical the resulting partition should be identical as well. resulting in no data being shuffled around. We aren't really preserving the partitioning, but re-creating the original one. Regards, Chesnay On 12.04.2017 21:37, Ryan Conway wrote: > Greetings, > > Is there a means of maintaining a stream's partitioning after running > it through an operation such as map or filter? > > I have a pipeline stage S that operates on a stream partitioned by an > ID field. S flat maps objects of type A to type B, which both have an > "ID" field, and where each instance of B that S outputs has the same > ID as its input instance of A. I hope to add a pipeline stage T > immediately after S that operates using the same partitioning as S, so > that I can avoid the expense of re-keying the instances of type B. > > If I am understanding the DataStream API correctly this is not > feasible with Flink, as map(), filter() etc. all > output SingleOutputStreamOperator. But I am hoping that I am missing > something. > > Thank you, > Ryan |
Thank you, Chesnay. My hope is to keep things computationally inexpensive, and if I understand you correctly, that is satisfied even with this rekeying. Ryan On Sat, Apr 15, 2017 at 4:22 AM, Chesnay Schepler <[hidden email]> wrote: Hello, |
Free forum by Nabble | Edit this page |