Hi, Is it possible to share state across operators in Flink? |
Missed one point - I'm using Managed Operator state (and not Keyed state - as my data streams are not keyed).
On Tue, Dec 5, 2017 at 4:28 PM, Shailesh Jain <[hidden email]> wrote:
|
In reply to this post by Shailesh Jain
Hi Shailesh,
sharing state across operators is not possible. However, you could emit the state (or parts of it) as a stream element to downstream operators by having a function that emits a type like "Either<MyElement,List<MyState>>". Another option would be to use side outputs to send state to downstream operators [0]. Maybe you can tell use a bit more about what you want to achieve? Regards, Timo [0] https://ci.apache.org/projects/flink/flink-docs-release-1.3/dev/stream/side_output.html Am 12/5/17 um 11:58 AM schrieb Shailesh Jain: > Hi, > > Is it possible to share state across operators in Flink? > > I have CoFlatMap operator which maintains a ListState and returns a > DataStream. And downstream there is a KafkaSink operator for the same > DataStream which needs to access the ListState. > > Thanks, > Shailesh |
Thanks, Timo. Either<L, R> works for me.On Tue, Dec 5, 2017 at 4:55 PM, Timo Walther <[hidden email]> wrote: Hi Shailesh, |
In reply to this post by Timo Walther
Hey Timo!
I am using Java for my implementation and I have found this article [1] in stackoverflow for simulating the Either<A,B> in Java. Now, for my case, I have a coordinator instance (parallelism = 1) that needs to both distribute incoming tuples in a specific way, but also needs to redistribute some state (previously distributed tuples) that maintains to other downstream operators. I am wondering whether I should use SideOutputs or Either<A,B> to succeed the aforementioned goal. Nevertheless, I am wondering whether there are any differences regarding efficiency/effectiveness of the above 2 proposed workarounds to share state. Also, in the case of Either<A,B>, I am first doing all my distribution of Bs (parts of the state) and afterwards in the same .processElement() function I am distributing the newly arrived tuples to the downstream operators. So, given that my coordinator's parallelism is 1, I assume that the order of my collect()-s is preserved, meaning the first the state redistribution will get settled in the downstream operators, and afterwards the processing of new tuples will happen. Is that correct? (I do not no if inside the network buffers things get twisted somehow). Thanks a lot in advance! Best, Max [1] -- https://stackoverflow.com/questions/9975836/how-can-i-simulate-haskells-either-a-b-in-java -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/ |
Hey Flinker!
Anyone? Anybody? Someone with experience or any idea on the question above? Best, Max -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/ |
Hi Max,
I would go with the Either<A, B> approach if you want to ensure that the initital state and the first element arrive in the right order. Performance-wise there should not be a big different between both approaches. The side outputs are more meant for have a side channel beside the main stream that is not keyed anymore (e.g. for late elements that need a special treatment next to the main processing). Regards, Timo Am 12.03.18 um 10:17 schrieb m@xi: > Hey Flinker! > > Anyone? Anybody? Someone with experience or any idea on the question above? > > Best, > Max > > > > -- > Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/ |
Thank a lot Timo!
Best, Max -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/ |
Free forum by Nabble | Edit this page |