Best way of doing some global initialization

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Best way of doing some global initialization

scgupta
Hi,

I need to do set/initialize some config of a framework/util that is used in my Flink stream processing app. Basically, a piece of code that needs to be executed exactly once before anything else. Clearly doing it in the main flink processor function will not suffice, as apart from the client, the same needs to be done on other node before topology is executed.

I have gone through Flink best practices, and one way I can think about it to check whether init has been done in the open() all Rich Functions, and if not then call the initialization code. But that seems to be "right", basically to add any operator, I must do this initillazation call boilerplate code.

Is there anyway to define a global initializations in Flink, or to define an operator that is guaranteed to be called the first thing on all nodes?

Thanks,
+satish
Reply | Threaded
Open this post in threaded view
|

Re: Best way of doing some global initialization

Aljoscha Krettek
Hi,
I'm afraid this is not possible right now if you don't want to go with completely custom sources/operators.

If you want to go the custom source route you would have only one true source in your job that does the global initialisation and then emits one element. Your other sources would be operators that have this one source as input and only start producing data once that one element arrived. In this way, you would block all other sources until the first source has done the initialisation.

Cheers,
Aljoscha 

On Thu, 3 Nov 2016 at 09:26 Satish Chandra Gupta <[hidden email]> wrote:
Hi,

I need to do set/initialize some config of a framework/util that is used in my Flink stream processing app. Basically, a piece of code that needs to be executed exactly once before anything else. Clearly doing it in the main flink processor function will not suffice, as apart from the client, the same needs to be done on other node before topology is executed.

I have gone through Flink best practices, and one way I can think about it to check whether init has been done in the open() all Rich Functions, and if not then call the initialization code. But that seems to be "right", basically to add any operator, I must do this initillazation call boilerplate code.

Is there anyway to define a global initializations in Flink, or to define an operator that is guaranteed to be called the first thing on all nodes?

Thanks,
+satish
Reply | Threaded
Open this post in threaded view
|

Re: Best way of doing some global initialization

scgupta
Hi Aljoscha,

Thanks for quick reply. I didn't quite understand your suggestion. Say I have three Kafka Stream sources that my Flink program consumes. How can I modify those three sources to be Kafka source as well as consumer of this single element?

Thanks,
+satish

On Thu, Nov 3, 2016 at 6:37 PM, Aljoscha Krettek <[hidden email]> wrote:
Hi,
I'm afraid this is not possible right now if you don't want to go with completely custom sources/operators.

If you want to go the custom source route you would have only one true source in your job that does the global initialisation and then emits one element. Your other sources would be operators that have this one source as input and only start producing data once that one element arrived. In this way, you would block all other sources until the first source has done the initialisation.

Cheers,
Aljoscha 

On Thu, 3 Nov 2016 at 09:26 Satish Chandra Gupta <[hidden email]> wrote:
Hi,

I need to do set/initialize some config of a framework/util that is used in my Flink stream processing app. Basically, a piece of code that needs to be executed exactly once before anything else. Clearly doing it in the main flink processor function will not suffice, as apart from the client, the same needs to be done on other node before topology is executed.

I have gone through Flink best practices, and one way I can think about it to check whether init has been done in the open() all Rich Functions, and if not then call the initialization code. But that seems to be "right", basically to add any operator, I must do this initillazation call boilerplate code.

Is there anyway to define a global initializations in Flink, or to define an operator that is guaranteed to be called the first thing on all nodes?

Thanks,
+satish

Reply | Threaded
Open this post in threaded view
|

Re: Best way of doing some global initialization

Aljoscha Krettek
Hi,
you would not be able to modify the three sources, you would basically have to reimplement yourself a Flink Kafka Source that is at the same time an operator that listens to this one element.

Cheers,
Aljoscha

On Thu, 3 Nov 2016 at 15:10 Satish Chandra Gupta <[hidden email]> wrote:
Hi Aljoscha,

Thanks for quick reply. I didn't quite understand your suggestion. Say I have three Kafka Stream sources that my Flink program consumes. How can I modify those three sources to be Kafka source as well as consumer of this single element?

Thanks,
+satish

On Thu, Nov 3, 2016 at 6:37 PM, Aljoscha Krettek <[hidden email]> wrote:
Hi,
I'm afraid this is not possible right now if you don't want to go with completely custom sources/operators.

If you want to go the custom source route you would have only one true source in your job that does the global initialisation and then emits one element. Your other sources would be operators that have this one source as input and only start producing data once that one element arrived. In this way, you would block all other sources until the first source has done the initialisation.

Cheers,
Aljoscha 

On Thu, 3 Nov 2016 at 09:26 Satish Chandra Gupta <[hidden email]> wrote:
Hi,

I need to do set/initialize some config of a framework/util that is used in my Flink stream processing app. Basically, a piece of code that needs to be executed exactly once before anything else. Clearly doing it in the main flink processor function will not suffice, as apart from the client, the same needs to be done on other node before topology is executed.

I have gone through Flink best practices, and one way I can think about it to check whether init has been done in the open() all Rich Functions, and if not then call the initialization code. But that seems to be "right", basically to add any operator, I must do this initillazation call boilerplate code.

Is there anyway to define a global initializations in Flink, or to define an operator that is guaranteed to be called the first thing on all nodes?

Thanks,
+satish