Shared Object Instance over different RichMapFunctions

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Shared Object Instance over different RichMapFunctions

Duck
Hi there,

I was wondering on how my caching object, would behave in the given scenario below.

1) I create an instance of an object that performs lookups to an external resource, and caches results.
2) I have a DataStream that i perform a map function on (with a custom RichMapFunction)
3) I have a second DataStream that i perform a map function on (with a custom RichMapFunction)
4) I set the Job parallelism to 2.

Will the multiple usage, along with parallelism duplicate my object in any way, or will it still behave as a "shared object instance". Wondering, since this "cacheloader" will talk to external resources, i do not want it to be say duplicated due to performance reasons on the external resource.

Sent from ProtonMail, Swiss-based encrypted email.


Reply | Threaded
Open this post in threaded view
|

Re: Shared Object Instance over different RichMapFunctions

Aljoscha Krettek
Hi,
Flink will serialise uses functions when distributing work across the cluster. Therefore your shared objects will not be shared objects anymore once your program executes. You will still get object sharing because only one instance of your function is used to process data on one parallel instance of an operation.

Cheers,
Aljoscha

On Wed, 4 Jan 2017 at 21:05 Duck <[hidden email]> wrote:
Hi there,

I was wondering on how my caching object, would behave in the given scenario below.

1) I create an instance of an object that performs lookups to an external resource, and caches results.
2) I have a DataStream that i perform a map function on (with a custom RichMapFunction)
3) I have a second DataStream that i perform a map function on (with a custom RichMapFunction)
4) I set the Job parallelism to 2.

Will the multiple usage, along with parallelism duplicate my object in any way, or will it still behave as a "shared object instance". Wondering, since this "cacheloader" will talk to external resources, i do not want it to be say duplicated due to performance reasons on the external resource.

Sent from ProtonMail, Swiss-based encrypted email.