Accessing Global State when processing KeyedStreams

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Accessing Global State when processing KeyedStreams

Scott Sue
Hi,

In our application, we receive Orders and Prices via two KafkaSources.  What
I want to do is to perform calculations for a given Order against a stream
of Prices for the same securityId, i.e. same identifier between the Order
and stream of Prices.  Naturally this is a perfect fit for a KeyedStream
against the securityId for both Orders and Prices.

I have currently connected these two streams together and then processing by
ordersKeyStream.connect(pricesKeyStream).process(MyCoProcessFunction) and
this fits for 95% of the time.  However part of my requirement is for
certain Orders, I need to be able to connect prices from a different
securityId (aka different key) to perform more calculations.  From what I
can see, by the time I get to my CoProcessFunction, I am only able to see
the Orders and Prices for a single securityId, I won't be able to cross over
to another KeyedStream of Prices for me to perform this extra calculation.
In terms of this extra calculation, it is not a hard requirement to be able
to cross over to another KeyedStream of Prices, this is more ideal.  

Things that I have thought about to get around this as it would be
acceptable to have a slightly older price for the securityId I require so:
1) I could connect to an external source of information to get this Price,
or
2) Periodically broadcast out a price that the ProcessFunction could consume
to perform this extra calculation.

This seems like something Flink should be easily able to handle, I just feel
as though I'm missing something here to allow this.

Just as something as a more non functional requirement.  The number of
prices I receive per second can reach 10's of 000's per second, so that is
also something that I am very wary of as well

Is there anything that could be suggested to help me out on this?


Thanks in advance!
Scott



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re: Accessing Global State when processing KeyedStreams

Rong Rong
Hi Scott,

Your use case seems to be a perfect fit for the Broadcast state pattern [1]. 

--
Rong


On Wed, Sep 19, 2018 at 7:11 AM Scott Sue <[hidden email]> wrote:
Hi,

In our application, we receive Orders and Prices via two KafkaSources.  What
I want to do is to perform calculations for a given Order against a stream
of Prices for the same securityId, i.e. same identifier between the Order
and stream of Prices.  Naturally this is a perfect fit for a KeyedStream
against the securityId for both Orders and Prices.

I have currently connected these two streams together and then processing by
ordersKeyStream.connect(pricesKeyStream).process(MyCoProcessFunction) and
this fits for 95% of the time.  However part of my requirement is for
certain Orders, I need to be able to connect prices from a different
securityId (aka different key) to perform more calculations.  From what I
can see, by the time I get to my CoProcessFunction, I am only able to see
the Orders and Prices for a single securityId, I won't be able to cross over
to another KeyedStream of Prices for me to perform this extra calculation.
In terms of this extra calculation, it is not a hard requirement to be able
to cross over to another KeyedStream of Prices, this is more ideal. 

Things that I have thought about to get around this as it would be
acceptable to have a slightly older price for the securityId I require so:
1) I could connect to an external source of information to get this Price,
or
2) Periodically broadcast out a price that the ProcessFunction could consume
to perform this extra calculation.

This seems like something Flink should be easily able to handle, I just feel
as though I'm missing something here to allow this.

Just as something as a more non functional requirement.  The number of
prices I receive per second can reach 10's of 000's per second, so that is
also something that I am very wary of as well

Is there anything that could be suggested to help me out on this?


Thanks in advance!
Scott



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re: Accessing Global State when processing KeyedStreams

Scott Sue
Hi Rong,

Thanks for your suggestion, I'll give that a go.  I just found a great
article on this as well that explains the functionality

https://data-artisans.com/blog/a-practical-guide-to-broadcast-state-in-apache-flink


Regards,
Scott



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/