RichMapFunction parameters in the Streaming API

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

RichMapFunction parameters in the Streaming API

Colin Williams
I was looking for withParameters(config) in the Streaming API today. I stumbled across the following thread.

It appears that some of the StreamingAPI developers are in favor of removing the parameters from RichMapFunctions' open. However the best practices article


Show examples of using both global configuration (where parameters are available from open) and withParameters(config) (which doesn't work from the Streaming API)

I'm trying to make a decision regarding using global parameters with my Flink Streaming jobs.

Is using the global configuration a good idea for parameters in the Streaming API or is this best practice just suggested for the Batch API?

Is there a reason for the opinion of removing the configuration parameters from open?



Reply | Threaded
Open this post in threaded view
|

Re: RichMapFunction parameters in the Streaming API

Chesnay Schepler
The Configuration parameter in open() is a relic of the previous java API where operators were instantiated generically.

Nowadays, this is no longer the case as they are serialized instead, which simplifies the passing of parameters as you can
simply store them in a field of your UDF.

The configuration object passed to open() in case of the streaming API is always empty, and we don't plan
to implement it since it provides little value due to the above.

As such, we suggest to pass either the parameter tool, configuration instance or specific parameters through the constructor of user-defined functions and store them in a field. This applies both to the batch and streaming API.

Personally i would stay away from the global configuration option as it is more brittle than the constructor approach, which makes
it explicit that this function requires these parameters.

On 11.10.2017 00:36, Colin Williams wrote:
I was looking for withParameters(config) in the Streaming API today. I stumbled across the following thread.

It appears that some of the StreamingAPI developers are in favor of removing the parameters from RichMapFunctions' open. However the best practices article


Show examples of using both global configuration (where parameters are available from open) and withParameters(config) (which doesn't work from the Streaming API)

I'm trying to make a decision regarding using global parameters with my Flink Streaming jobs.

Is using the global configuration a good idea for parameters in the Streaming API or is this best practice just suggested for the Batch API?

Is there a reason for the opinion of removing the configuration parameters from open?




Reply | Threaded
Open this post in threaded view
|

Re: RichMapFunction parameters in the Streaming API

Aljoscha Krettek
I think we should remove that part from the best-practices documentation. I'll quickly open a PR.

On 11. Oct 2017, at 10:46, Chesnay Schepler <[hidden email]> wrote:

The Configuration parameter in open() is a relic of the previous java API where operators were instantiated generically.

Nowadays, this is no longer the case as they are serialized instead, which simplifies the passing of parameters as you can
simply store them in a field of your UDF.

The configuration object passed to open() in case of the streaming API is always empty, and we don't plan
to implement it since it provides little value due to the above.

As such, we suggest to pass either the parameter tool, configuration instance or specific parameters through the constructor of user-defined functions and store them in a field. This applies both to the batch and streaming API.

Personally i would stay away from the global configuration option as it is more brittle than the constructor approach, which makes
it explicit that this function requires these parameters.

On 11.10.2017 00:36, Colin Williams wrote:
I was looking for withParameters(config) in the Streaming API today. I stumbled across the following thread.

It appears that some of the StreamingAPI developers are in favor of removing the parameters from RichMapFunctions' open. However the best practices article


Show examples of using both global configuration (where parameters are available from open) and withParameters(config) (which doesn't work from the Streaming API)

I'm trying to make a decision regarding using global parameters with my Flink Streaming jobs.

Is using the global configuration a good idea for parameters in the Streaming API or is this best practice just suggested for the Batch API?

Is there a reason for the opinion of removing the configuration parameters from open?





Reply | Threaded
Open this post in threaded view
|

Re: RichMapFunction parameters in the Streaming API

Colin Williams
In reply to this post by Chesnay Schepler
Thanks for the detailed explanation regarding the reasoning behind not using opens' configuration parameters!

On Wed, Oct 11, 2017 at 1:46 AM, Chesnay Schepler <[hidden email]> wrote:
The Configuration parameter in open() is a relic of the previous java API where operators were instantiated generically.

Nowadays, this is no longer the case as they are serialized instead, which simplifies the passing of parameters as you can
simply store them in a field of your UDF.

The configuration object passed to open() in case of the streaming API is always empty, and we don't plan
to implement it since it provides little value due to the above.

As such, we suggest to pass either the parameter tool, configuration instance or specific parameters through the constructor of user-defined functions and store them in a field. This applies both to the batch and streaming API.

Personally i would stay away from the global configuration option as it is more brittle than the constructor approach, which makes
it explicit that this function requires these parameters.


On 11.10.2017 00:36, Colin Williams wrote:
I was looking for withParameters(config) in the Streaming API today. I stumbled across the following thread.

It appears that some of the StreamingAPI developers are in favor of removing the parameters from RichMapFunctions' open. However the best practices article


Show examples of using both global configuration (where parameters are available from open) and withParameters(config) (which doesn't work from the Streaming API)

I'm trying to make a decision regarding using global parameters with my Flink Streaming jobs.

Is using the global configuration a good idea for parameters in the Streaming API or is this best practice just suggested for the Batch API?

Is there a reason for the opinion of removing the configuration parameters from open?