Best way to access a Flink state entry from another Flink application

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Best way to access a Flink state entry from another Flink application

moe_hoss

Hi all,


We have a network of Flink applications. The whole cluster receives 'state-update' messages from the outside, and there is one Flink application in our cluster that 'merges' these updates and creates the actual, most up-to-date, state of the 'data-objects' and passes it to the next process. It does this, using a stateful stream processing with a `KeyedProcessFunction` object. In our processing logic, there are nodes that require to access the actual state of the objects when they receive one or more 'object-id's from the previous Flink application. We do not propagate the actual-state of the objects since, not all types of the objects are relevant to all processes in the cluster, so we saved some network/storage overhead there.

The question is: for such scenario, what is the best way to expose the Flink state to another Flink application? I am aware of 'Queryable states', but I am not sure if this feature has been designed and is suitable for our use-case or not?


Thank you very much in advance.


BR, Moe

--

Mohammad Hosseinian
Software Developer
Information Design One AG


Phone +49-69-244502-0
Fax +49-69-244502-10
Web www.id1.de



Information Design One AG, Baseler Strasse 10, 60329 Frankfurt am Main, Germany
Registration: Amtsgericht Frankfurt am Main, HRB 52596
Executive Board: Robert Peters, Benjamin Walther, Supervisory Board: Christian Hecht

Reply | Threaded
Open this post in threaded view
|

Re: Best way to access a Flink state entry from another Flink application

Протченко Алексей
Hi Mohammad,

which types of applications do you mean? Streaming or batch ones? In terms of streaming ones queues like Kafka or RabbitMq between applications should be the best way I think. 

Best regards,
Alex


Вторник, 6 августа 2019, 12:21 +02:00 от Mohammad Hosseinian <[hidden email]>:

Hi all,


We have a network of Flink applications. The whole cluster receives 'state-update' messages from the outside, and there is one Flink application in our cluster that 'merges' these updates and creates the actual, most up-to-date, state of the 'data-objects' and passes it to the next process. It does this, using a stateful stream processing with a `KeyedProcessFunction` object. In our processing logic, there are nodes that require to access the actual state of the objects when they receive one or more 'object-id's from the previous Flink application. We do not propagate the actual-state of the objects since, not all types of the objects are relevant to all processes in the cluster, so we saved some network/storage overhead there.

The question is: for such scenario, what is the best way to expose the Flink state to another Flink application? I am aware of 'Queryable states', but I am not sure if this feature has been designed and is suitable for our use-case or not?


Thank you very much in advance.


BR, Moe

--

Mohammad Hosseinian
Software Developer
Information Design One AG


Phone +49-69-244502-0
Fax +49-69-244502-10
Web www.id1.de



Information Design One AG, Baseler Strasse 10, 60329 Frankfurt am Main, Germany
Registration: Amtsgericht Frankfurt am Main, HRB 52596
Executive Board: Robert Peters, Benjamin Walther, Supervisory Board: Christian Hecht



--
Протченко Алексей
Reply | Threaded
Open this post in threaded view
|

Re: Best way to access a Flink state entry from another Flink application

moe_hoss
Hi Alex,

Thanks for your reply. The application is streaming. The issue with using messaging channels for such kind of communication is the 'race condition'. I mean, when you have parallel channels of communication (one for the main flow of your streaming application and one for bringing 'stated/current' objects to desired processing nodes), then the order of messages are not preserved and it might lead to incorrect result of your application. That was the reason why I was wondering if there is any 'synchronous' way of accessing the Flink state.

BR, Moe


On 06/08/2019 13:25, Протченко Алексей wrote:
Hi Mohammad,

which types of applications do you mean? Streaming or batch ones? In terms of streaming ones queues like Kafka or RabbitMq between applications should be the best way I think. 

Best regards,
Alex


Вторник, 6 августа 2019, 12:21 +02:00 от Mohammad Hosseinian [hidden email]:

Hi all,


We have a network of Flink applications. The whole cluster receives 'state-update' messages from the outside, and there is one Flink application in our cluster that 'merges' these updates and creates the actual, most up-to-date, state of the 'data-objects' and passes it to the next process. It does this, using a stateful stream processing with a `KeyedProcessFunction` object. In our processing logic, there are nodes that require to access the actual state of the objects when they receive one or more 'object-id's from the previous Flink application. We do not propagate the actual-state of the objects since, not all types of the objects are relevant to all processes in the cluster, so we saved some network/storage overhead there.

The question is: for such scenario, what is the best way to expose the Flink state to another Flink application? I am aware of 'Queryable states', but I am not sure if this feature has been designed and is suitable for our use-case or not?


Thank you very much in advance.


BR, Moe

--

Mohammad Hosseinian
Software Developer
Information Design One AG


Phone +49-69-244502-0
Fax +49-69-244502-10
Web www.id1.de



Information Design One AG, Baseler Strasse 10, 60329 Frankfurt am Main, Germany
Registration: Amtsgericht Frankfurt am Main, HRB 52596
Executive Board: Robert Peters, Benjamin Walther, Supervisory Board: Christian Hecht



--
Протченко Алексей
--

Mohammad Hosseinian
Software Developer
Information Design One AG


Phone +49-69-244502-0
Fax +49-69-244502-10
Web www.id1.de



Information Design One AG, Baseler Strasse 10, 60329 Frankfurt am Main, Germany
Registration: Amtsgericht Frankfurt am Main, HRB 52596
Executive Board: Robert Peters, Benjamin Walther, Supervisory Board: Christian Hecht

Reply | Threaded
Open this post in threaded view
|

Re: Best way to access a Flink state entry from another Flink application

Oytun Tez
Hi Mohammad,


As much as I know, this is the only way to access Flink's state from outside, until we have Savepoint API coming in 1.9.

---
Oytun Tez

M O T A W O R D
The World's Fastest Human Translation Platform.


On Tue, Aug 6, 2019 at 9:52 AM Mohammad Hosseinian <[hidden email]> wrote:
Hi Alex,

Thanks for your reply. The application is streaming. The issue with using messaging channels for such kind of communication is the 'race condition'. I mean, when you have parallel channels of communication (one for the main flow of your streaming application and one for bringing 'stated/current' objects to desired processing nodes), then the order of messages are not preserved and it might lead to incorrect result of your application. That was the reason why I was wondering if there is any 'synchronous' way of accessing the Flink state.

BR, Moe


On 06/08/2019 13:25, Протченко Алексей wrote:
Hi Mohammad,

which types of applications do you mean? Streaming or batch ones? In terms of streaming ones queues like Kafka or RabbitMq between applications should be the best way I think. 

Best regards,
Alex


Вторник, 6 августа 2019, 12:21 +02:00 от Mohammad Hosseinian [hidden email]:

Hi all,


We have a network of Flink applications. The whole cluster receives 'state-update' messages from the outside, and there is one Flink application in our cluster that 'merges' these updates and creates the actual, most up-to-date, state of the 'data-objects' and passes it to the next process. It does this, using a stateful stream processing with a `KeyedProcessFunction` object. In our processing logic, there are nodes that require to access the actual state of the objects when they receive one or more 'object-id's from the previous Flink application. We do not propagate the actual-state of the objects since, not all types of the objects are relevant to all processes in the cluster, so we saved some network/storage overhead there.

The question is: for such scenario, what is the best way to expose the Flink state to another Flink application? I am aware of 'Queryable states', but I am not sure if this feature has been designed and is suitable for our use-case or not?


Thank you very much in advance.


BR, Moe

--

Mohammad Hosseinian
Software Developer
Information Design One AG


Phone +49-69-244502-0
Fax +49-69-244502-10
Web www.id1.de



Information Design One AG, Baseler Strasse 10, 60329 Frankfurt am Main, Germany
Registration: Amtsgericht Frankfurt am Main, HRB 52596
Executive Board: Robert Peters, Benjamin Walther, Supervisory Board: Christian Hecht



--
Протченко Алексей
--

Mohammad Hosseinian
Software Developer
Information Design One AG


Phone +49-69-244502-0
Fax +49-69-244502-10
Web www.id1.de



Information Design One AG, Baseler Strasse 10, 60329 Frankfurt am Main, Germany
Registration: Amtsgericht Frankfurt am Main, HRB 52596
Executive Board: Robert Peters, Benjamin Walther, Supervisory Board: Christian Hecht

Reply | Threaded
Open this post in threaded view
|

Re: Best way to access a Flink state entry from another Flink application

moe_hoss
Hi Oytun,

Thanks and good to know about your planned features.

BR, Moe


On 06/08/2019 16:14, Oytun Tez wrote:
Hi Mohammad,


As much as I know, this is the only way to access Flink's state from outside, until we have Savepoint API coming in 1.9.

---
Oytun Tez

M O T A W O R D
The World's Fastest Human Translation Platform.


On Tue, Aug 6, 2019 at 9:52 AM Mohammad Hosseinian <[hidden email]> wrote:
Hi Alex,

Thanks for your reply. The application is streaming. The issue with using messaging channels for such kind of communication is the 'race condition'. I mean, when you have parallel channels of communication (one for the main flow of your streaming application and one for bringing 'stated/current' objects to desired processing nodes), then the order of messages are not preserved and it might lead to incorrect result of your application. That was the reason why I was wondering if there is any 'synchronous' way of accessing the Flink state.

BR, Moe


On 06/08/2019 13:25, Протченко Алексей wrote:
Hi Mohammad,

which types of applications do you mean? Streaming or batch ones? In terms of streaming ones queues like Kafka or RabbitMq between applications should be the best way I think. 

Best regards,
Alex


Вторник, 6 августа 2019, 12:21 +02:00 от Mohammad Hosseinian [hidden email]:

Hi all,


We have a network of Flink applications. The whole cluster receives 'state-update' messages from the outside, and there is one Flink application in our cluster that 'merges' these updates and creates the actual, most up-to-date, state of the 'data-objects' and passes it to the next process. It does this, using a stateful stream processing with a `KeyedProcessFunction` object. In our processing logic, there are nodes that require to access the actual state of the objects when they receive one or more 'object-id's from the previous Flink application. We do not propagate the actual-state of the objects since, not all types of the objects are relevant to all processes in the cluster, so we saved some network/storage overhead there.

The question is: for such scenario, what is the best way to expose the Flink state to another Flink application? I am aware of 'Queryable states', but I am not sure if this feature has been designed and is suitable for our use-case or not?


Thank you very much in advance.


BR, Moe

--

Mohammad Hosseinian
Software Developer
Information Design One AG


Phone +49-69-244502-0
Fax +49-69-244502-10
Web www.id1.de



Information Design One AG, Baseler Strasse 10, 60329 Frankfurt am Main, Germany
Registration: Amtsgericht Frankfurt am Main, HRB 52596
Executive Board: Robert Peters, Benjamin Walther, Supervisory Board: Christian Hecht



--
Протченко Алексей
--

Mohammad Hosseinian
Software Developer
Information Design One AG


Phone +49-69-244502-0
Fax +49-69-244502-10
Web www.id1.de



Information Design One AG, Baseler Strasse 10, 60329 Frankfurt am Main, Germany
Registration: Amtsgericht Frankfurt am Main, HRB 52596
Executive Board: Robert Peters, Benjamin Walther, Supervisory Board: Christian Hecht

--

Mohammad Hosseinian
Software Developer
Information Design One AG


Phone +49-69-244502-0
Fax +49-69-244502-10
Web www.id1.de



Information Design One AG, Baseler Strasse 10, 60329 Frankfurt am Main, Germany
Registration: Amtsgericht Frankfurt am Main, HRB 52596
Executive Board: Robert Peters, Benjamin Walther, Supervisory Board: Christian Hecht