Hello all,
We have a kafka topic with lots of partitions where data is partitioned by an upstream publisher on "session". In flink we read this topic and another single partition topic which contains configuration definitions for a little flatMap based operation. We also do a little bit of processing on the incoming data before combining it with the configuration in a soft join-like operation. forgive the ascii art: N-partition Kafka "Data" source -> Map -----\ |-> connect -> CoFlatMap -> ... 1-partition Kafka "Config" source -> global-/ What we *think* are seeing is that this is broken up into 3 tasks and that the ordering of events in the kafka "Data" source isn't maintained when we see it in the CoFlatMap. I have tried to add a custom partitioner on "Session" before the connect and it seemed to not help. I can't use KeyBy because the Config stream has no "Session key" Should we be able to assume anything about ordering of events without explicitly windowing/sorting/chaining? -Bart ________________________________ This e-mail may contain CONFIDENTIAL AND PROPRIETARY INFORMATION and/or PRIVILEGED AND CONFIDENTIAL COMMUNICATION intended solely for the recipient and, therefore, may not be retransmitted to any party outside of the recipient's organization without the prior written consent of the sender. If you have received this e-mail in error please notify the sender immediately by telephone or reply e-mail and destroy the original message without making a copy. Deep Silver, Inc. accepts no liability for any losses or damages resulting from infected e-mail transmissions and viruses in e-mail attachments. |
Your approach of using a CoFlatMap with a slowly changing second input
(the config source) is a common pattern and a good choice for this. The ordering of events between the sources and the CoFlatMap should be maintained if the parallelism matches. There is no repartitioning going on (the CoFlatMap inputs are connected via `FORWARD` edges). Does the parallelism change between operators? Can you verify that the ordering of events in the Kafka source is actually correct (test with a source-flatMap topology)? @Gyula: am I overlooking something here? On Fri, Aug 12, 2016 at 10:48 PM, Bart Wyatt <[hidden email]> wrote: > Hello all, > > > > We have a kafka topic with lots of partitions where data is partitioned by an upstream publisher on "session". > > > > In flink we read this topic and another single partition topic which contains configuration definitions for a little flatMap based operation. We also do a little bit of processing on the incoming data before combining it with the configuration in a soft join-like operation. > > > > forgive the ascii art: > > > N-partition Kafka "Data" source -> Map -----\ > |-> connect -> CoFlatMap -> ... > 1-partition Kafka "Config" source -> global-/ > > What we *think* are seeing is that this is broken up into 3 tasks and that the ordering of events in the kafka "Data" source isn't maintained when we see it in the CoFlatMap. I have tried to add a custom partitioner on "Session" before the connect and it seemed to not help. I can't use KeyBy because the Config stream has no "Session key" > > Should we be able to assume anything about ordering of events without explicitly windowing/sorting/chaining? > > -Bart > > > > > ________________________________ > This e-mail may contain CONFIDENTIAL AND PROPRIETARY INFORMATION and/or PRIVILEGED AND CONFIDENTIAL COMMUNICATION intended solely for the recipient and, therefore, may not be retransmitted to any party outside of the recipient's organization without the prior written consent of the sender. If you have received this e-mail in error please notify the sender immediately by telephone or reply e-mail and destroy the original message without making a copy. Deep Silver, Inc. accepts no liability for any losses or damages resulting from infected e-mail transmissions and viruses in e-mail attachments. |
Free forum by Nabble | Edit this page |