Share state across operators

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

Share state across operators

Shailesh Jain
Hi,

Is it possible to share state across operators in Flink?

I have CoFlatMap operator which maintains a ListState and returns a DataStream. And downstream there is a KafkaSink operator for the same DataStream which needs to access the ListState.

Thanks,
Shailesh
Reply | Threaded
Open this post in threaded view
|

Re: Share state across operators

Shailesh Jain
Missed one point - I'm using Managed Operator state (and not Keyed state - as my data streams are not keyed).

On Tue, Dec 5, 2017 at 4:28 PM, Shailesh Jain <[hidden email]> wrote:
Hi,

Is it possible to share state across operators in Flink?

I have CoFlatMap operator which maintains a ListState and returns a DataStream. And downstream there is a KafkaSink operator for the same DataStream which needs to access the ListState.

Thanks,
Shailesh

Reply | Threaded
Open this post in threaded view
|

Re: Share state across operators

Timo Walther
In reply to this post by Shailesh Jain
Hi Shailesh,

sharing state across operators is not possible. However, you could emit
the state (or parts of it) as a stream element to downstream operators
by having a function that emits a type like
"Either<MyElement,List<MyState>>".

Another option would be to use side outputs to send state to downstream
operators [0].

Maybe you can tell use a bit more about what you want to achieve?

Regards,
Timo

[0]
https://ci.apache.org/projects/flink/flink-docs-release-1.3/dev/stream/side_output.html


Am 12/5/17 um 11:58 AM schrieb Shailesh Jain:

> Hi,
>
> Is it possible to share state across operators in Flink?
>
> I have CoFlatMap operator which maintains a ListState and returns a
> DataStream. And downstream there is a KafkaSink operator for the same
> DataStream which needs to access the ListState.
>
> Thanks,
> Shailesh


Reply | Threaded
Open this post in threaded view
|

Re: Share state across operators

Shailesh Jain
Thanks, Timo.

Either<L, R> works for me.

On Tue, Dec 5, 2017 at 4:55 PM, Timo Walther <[hidden email]> wrote:
Hi Shailesh,

sharing state across operators is not possible. However, you could emit the state (or parts of it) as a stream element to downstream operators by having a function that emits a type like "Either<MyElement,List<MyState>>".

Another option would be to use side outputs to send state to downstream operators [0].

Maybe you can tell use a bit more about what you want to achieve?

Regards,
Timo

[0] https://ci.apache.org/projects/flink/flink-docs-release-1.3/dev/stream/side_output.html


Am 12/5/17 um 11:58 AM schrieb Shailesh Jain:

Hi,

Is it possible to share state across operators in Flink?

I have CoFlatMap operator which maintains a ListState and returns a DataStream. And downstream there is a KafkaSink operator for the same DataStream which needs to access the ListState.

Thanks,
Shailesh



Reply | Threaded
Open this post in threaded view
|

Re: Share state across operators

m@xi
In reply to this post by Timo Walther
Hey Timo!

I am using Java for my implementation and I have found this article [1] in
stackoverflow for simulating the Either<A,B> in Java.

Now, for my case, I have a coordinator instance (parallelism = 1) that needs
to both distribute incoming tuples in a specific way, but also needs to
redistribute some state (previously distributed tuples) that maintains to
other downstream operators.

I am wondering whether I should use SideOutputs or Either<A,B> to succeed
the aforementioned goal. Nevertheless, I am wondering whether there are any
differences regarding efficiency/effectiveness of the above 2 proposed
workarounds to share state.

Also, in the case of Either<A,B>, I am first doing all my distribution of Bs
(parts of the state) and afterwards in the same .processElement() function I
am distributing the newly arrived tuples to the downstream operators. So,
given that my coordinator's parallelism is 1, I assume that the order of my
collect()-s is preserved, meaning the first the state redistribution will
get settled in the downstream operators, and afterwards the processing of
new tuples will happen. Is that correct? (I do not no if inside the network
buffers things get twisted somehow).

Thanks a lot in advance!

Best,
Max

[1] --
https://stackoverflow.com/questions/9975836/how-can-i-simulate-haskells-either-a-b-in-java




--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re: Share state across operators

m@xi
Hey Flinker!

Anyone? Anybody? Someone with experience or any idea on the question above?

Best,
Max



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re: Share state across operators

Timo Walther
Hi Max,

I would go with the Either<A, B> approach if you want to ensure that the
initital state and the first element arrive in the right order.
Performance-wise there should not be a big different between both
approaches. The side outputs are more meant for have a side channel
beside the main stream that is not keyed anymore (e.g. for late elements
that need a special treatment next to the main processing).

Regards,
Timo

Am 12.03.18 um 10:17 schrieb m@xi:

> Hey Flinker!
>
> Anyone? Anybody? Someone with experience or any idea on the question above?
>
> Best,
> Max
>
>
>
> --
> Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/


Reply | Threaded
Open this post in threaded view
|

Re: Share state across operators

m@xi