Visualizing Flink Statefun's "Logical Co-location, Physical Separation" Properties

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Visualizing Flink Statefun's "Logical Co-location, Physical Separation" Properties

flint-stone
Hello!

I'm trying to understand the internal mechanism used by Flink Statefun to dispatch functions to Flink cluster. In particular, I was trying to find a good example demonstrating Statefun's "Logical Co-location, Physical Separation" properties (as pointed out by [1]).

My understanding based on the doc was that there are three modes to deploy statefun to Flink cluster ranging from remote functions to embedded functions (decreasing storage locality and increasing efficiency). Therefore, in the scenario of remote functions, functions are deployed with its state and message independent from flink processes. And function should be able to be executed in any Flink process as if it is a stateless application. I have tried out a couple of examples from the statefun but judging by allocation result the subtask of the job seems to bind statically with each task slot in the Flink cluster (I'm assuming the example such as DataStream uses embedded function instead?).

I also came across this tutorial [2] demonstrating the usage of remote function. The README indicates [3] that "Since the functions are stateless, and running in their own container, we can redeploy and rescale it independently of the rest of the infrastructure." which seems to indicate that the function performs scaling manually by the user that could occupy arbitrary resources (e.g., task slots) from the Flink cluster on demand. But I wasn't sure how to explicitly specify the amount of parallelism for each function dynamically.
Is there a good example to visualize statefun "physical separation" behavior by forcing the same function to be invoked at different task slots / machines (either on-demand or automatically)?

Any help will be appreciated!

Thanks!

Le


Reply | Threaded
Open this post in threaded view
|

Re: Visualizing Flink Statefun's "Logical Co-location, Physical Separation" Properties

Igal Shilman
Hi Le,
Let me try to answer to your multiple questions, one by one:
 
I'm trying to understand the internal mechanism used by Flink Statefun to dispatch functions to Flink cluster. In particular, I was trying to find a good example demonstrating Statefun's "Logical Co-location, Physical Separation" properties (as pointed out by [1]).

I'm not sure that I understand what dispatch functions to the Flink cluster mean here, but I will try to give you a general description of how things work with StateFun, and please follow up
with any clarifications :-)

In general, StateFun is a very specific Flink streaming job, and as such it will be running on a Flink cluster. Now, a remote function is a function that runs in a different process
that executes (for now) an HTTP server and runs the StateFun SDK. These processes can be located at the same machine as the Flink's TaskManagers and communicate via a unix domain socket, or they can be at a different machine, or they can even be deployed behind a load balancer, and autoscaled up and down on demand.
Now, as long as StateFun knows how to translate a function logical address to an HTTP endpoint that serves it, StateFun can dispatch function calls to these remote function processes.
By logical co-location, physical separation: a Flink worker that executes the StateFun job, is responsible for the state and messaging of a specific key (address) but the function itself can be running at a different physical process.
A good example of this kind of deployment you can find Gordon's talk [1], that demonstrates deploying the remote functions on AWS lambda.


My understanding based on the doc was that there are three modes to deploy statefun to Flink cluster ranging from remote functions to embedded functions (decreasing storage locality and increasing efficiency). Therefore, in the scenario of remote functions, functions are deployed with its state and message independent from flink processes. And function should be able to be executed in any Flink process as if it is a stateless application.
 
Remote functions are indeed "effectively stateless" and state is being provided as part of an invocation request. But the state is managed in a fault tolerant way by Flink.
 
I have tried out a couple of examples from the statefun but judging by allocation result the subtask of the job seems to bind statically with each task slot in the Flink cluster (I'm assuming the example such as DataStream uses embedded function instead?).

You are correct, the StateFun job has a fixed topology independent of the number of functions or function types. Therefore you can have many different function types and many billions of function instances.
A single FunctionDispatcher operator, will handle transparently the multiplexing of different function types and instances behind the scenes.

I hope that clarifies a bit.

Igal.




On Tue, Dec 29, 2020 at 10:58 PM Le Xu <[hidden email]> wrote:
Hello!

I'm trying to understand the internal mechanism used by Flink Statefun to dispatch functions to Flink cluster. In particular, I was trying to find a good example demonstrating Statefun's "Logical Co-location, Physical Separation" properties (as pointed out by [1]).

My understanding based on the doc was that there are three modes to deploy statefun to Flink cluster ranging from remote functions to embedded functions (decreasing storage locality and increasing efficiency). Therefore, in the scenario of remote functions, functions are deployed with its state and message independent from flink processes. And function should be able to be executed in any Flink process as if it is a stateless application. I have tried out a couple of examples from the statefun but judging by allocation result the subtask of the job seems to bind statically with each task slot in the Flink cluster (I'm assuming the example such as DataStream uses embedded function instead?).

I also came across this tutorial [2] demonstrating the usage of remote function. The README indicates [3] that "Since the functions are stateless, and running in their own container, we can redeploy and rescale it independently of the rest of the infrastructure." which seems to indicate that the function performs scaling manually by the user that could occupy arbitrary resources (e.g., task slots) from the Flink cluster on demand. But I wasn't sure how to explicitly specify the amount of parallelism for each function dynamically.
Is there a good example to visualize statefun "physical separation" behavior by forcing the same function to be invoked at different task slots / machines (either on-demand or automatically)?

Any help will be appreciated!

Thanks!

Le