Statefun 2.0 questions

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

Statefun 2.0 questions

Wouter Zorgdrager
Hi all,

I've been using Flink for quite some time now and for a university project I'm planning to experiment with statefun. During the walkthrough I've run into some issues, I hope you can help me with.

1) Is it correct that the Docker image of statefun is not yet published? I couldn't find it anywhere, but was able to run it by building the image myself.
2) In the example project using the Python SDK, it uses Flask to expose a function using POST. Is there also a way to serve GET request so that you can trigger a stateful function by for instance using your browser? 
3) Do you expect a lot of performance loss when using the Python SDK over Java?

Thanks in advance!

Regards,
Wouter
Reply | Threaded
Open this post in threaded view
|

Re: Statefun 2.0 questions

Igal Shilman
Hi Wouter!

Glad to read that you are using Flink for quite some time, and also exploring with StateFun!

1) yes it is correct and you can follow the Dockerhub contribution PR at [1]

2) I’m not sure I understand what do you mean by trigger from the browser.
If you mean, for testing / illustration purposes triggering the function independently of StateFun, you would need to write some JavaScript and preform the POST (assuming CORS are enabled)
Let me know if you’d like getting further information of how to do it.
Broadly speaking, GET is traditionally used to get data from a resource and POST to send data (the data is the invocation batch in our case).

One easier walk around for you would be to expose another endpoint in your Flask application, and call your stateful function directly from there (possibly populating the function argument with values taken from the query params)

3) I would expect a performance loss when going from the embedded SDK to the remote one, simply because the remote function is at a different process, and a round trip is required. There are different ways of deployment even for remote functions.
For example they can be co-located with the Task managers and communicate via the loop back device /Unix domain socket, or they can be deployed behind a load balancer with an auto-scaler, and thus reacting to higher request rate/latency increases by spinning new instances (something that is not yet supported with the embedded API)

Good luck,
Igal.







On Wednesday, May 6, 2020, Wouter Zorgdrager <[hidden email]> wrote:
Hi all,

I've been using Flink for quite some time now and for a university project I'm planning to experiment with statefun. During the walkthrough I've run into some issues, I hope you can help me with.

1) Is it correct that the Docker image of statefun is not yet published? I couldn't find it anywhere, but was able to run it by building the image myself.
2) In the example project using the Python SDK, it uses Flask to expose a function using POST. Is there also a way to serve GET request so that you can trigger a stateful function by for instance using your browser? 
3) Do you expect a lot of performance loss when using the Python SDK over Java?

Thanks in advance!

Regards,
Wouter
Reply | Threaded
Open this post in threaded view
|

Re: Statefun 2.0 questions

Wouter Zorgdrager
Hi Igal,

Thanks for your quick reply. Getting back to point 2, I was wondering if you could trigger indeed a stateful function directly from Flask and also get the reply there instead of using Kafka in between. We want to experiment running stateful functions behind a front-end (which should be able to trigger a function), but we're a bit afraid that using Kafka doesn't scale well if on the frontend side a user has to consume all Kafka messages to find the correct reply/output for a certain request/input. Any thoughts?

Thanks in advance,
Wouter

Op do 7 mei 2020 om 10:51 schreef Igal Shilman <[hidden email]>:
Hi Wouter!

Glad to read that you are using Flink for quite some time, and also exploring with StateFun!

1) yes it is correct and you can follow the Dockerhub contribution PR at [1]

2) I’m not sure I understand what do you mean by trigger from the browser.
If you mean, for testing / illustration purposes triggering the function independently of StateFun, you would need to write some JavaScript and preform the POST (assuming CORS are enabled)
Let me know if you’d like getting further information of how to do it.
Broadly speaking, GET is traditionally used to get data from a resource and POST to send data (the data is the invocation batch in our case).

One easier walk around for you would be to expose another endpoint in your Flask application, and call your stateful function directly from there (possibly populating the function argument with values taken from the query params)

3) I would expect a performance loss when going from the embedded SDK to the remote one, simply because the remote function is at a different process, and a round trip is required. There are different ways of deployment even for remote functions.
For example they can be co-located with the Task managers and communicate via the loop back device /Unix domain socket, or they can be deployed behind a load balancer with an auto-scaler, and thus reacting to higher request rate/latency increases by spinning new instances (something that is not yet supported with the embedded API)

Good luck,
Igal.







On Wednesday, May 6, 2020, Wouter Zorgdrager <[hidden email]> wrote:
Hi all,

I've been using Flink for quite some time now and for a university project I'm planning to experiment with statefun. During the walkthrough I've run into some issues, I hope you can help me with.

1) Is it correct that the Docker image of statefun is not yet published? I couldn't find it anywhere, but was able to run it by building the image myself.
2) In the example project using the Python SDK, it uses Flask to expose a function using POST. Is there also a way to serve GET request so that you can trigger a stateful function by for instance using your browser? 
3) Do you expect a lot of performance loss when using the Python SDK over Java?

Thanks in advance!

Regards,
Wouter
Reply | Threaded
Open this post in threaded view
|

Re: Statefun 2.0 questions

Tzu-Li (Gordon) Tai
Hi,

Correct me if I'm wrong, but from the discussion so far it seems like what
Wouter is looking for is an HTTP-based ingress / egress.

We have been thinking about this in the past. The specifics of the
implementation is still to be discussed, but to be able to ensure
exactly-once processing semantics, behind the scenes of an HTTP-based
ingress, external messages / response will still likely be routed through
durable messaging systems such as Kafka / Pulsar / etc.

Gordon



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re: Statefun 2.0 questions

Wouter Zorgdrager
In reply to this post by Wouter Zorgdrager
Hi Igal, all,

In the meantime we found a way to serve Flink stateful functions in a frontend. We decided to add another (set of) Flask application(s) which link to Kafka topics. These Kafka topics then serve as ingress and egress for the statefun cluster. However, we're wondering how we can scale this cluster. On the documentation page some nice figures are provided for different setups but no implementation details are given. In our case we are using a remote cluster so we have a Docker instance containing the `python-stateful-function` and of course the Flink cluster containing a `master` and `worker`. If I understood correctly, in a remote setting, we can scale both the Flink cluster and the `python-stateful-function`. Scaling the Flink cluster is trivial because I can add just more workers/task-managers (providing more taskslots) just by scaling the worker instance. However, how can I scale the stateful function also ensuring that it ends op in the correct Flink job (because we need shared state there). I tried scaling the Docker instance as well but that didn't seem to work. 

Hope you can give me some leads there. 
Thanks in advance!

Kind regards,
Wouter

Op do 7 mei 2020 om 17:17 schreef Wouter Zorgdrager <[hidden email]>:
Hi Igal,

Thanks for your quick reply. Getting back to point 2, I was wondering if you could trigger indeed a stateful function directly from Flask and also get the reply there instead of using Kafka in between. We want to experiment running stateful functions behind a front-end (which should be able to trigger a function), but we're a bit afraid that using Kafka doesn't scale well if on the frontend side a user has to consume all Kafka messages to find the correct reply/output for a certain request/input. Any thoughts?

Thanks in advance,
Wouter

Op do 7 mei 2020 om 10:51 schreef Igal Shilman <[hidden email]>:
Hi Wouter!

Glad to read that you are using Flink for quite some time, and also exploring with StateFun!

1) yes it is correct and you can follow the Dockerhub contribution PR at [1]

2) I’m not sure I understand what do you mean by trigger from the browser.
If you mean, for testing / illustration purposes triggering the function independently of StateFun, you would need to write some JavaScript and preform the POST (assuming CORS are enabled)
Let me know if you’d like getting further information of how to do it.
Broadly speaking, GET is traditionally used to get data from a resource and POST to send data (the data is the invocation batch in our case).

One easier walk around for you would be to expose another endpoint in your Flask application, and call your stateful function directly from there (possibly populating the function argument with values taken from the query params)

3) I would expect a performance loss when going from the embedded SDK to the remote one, simply because the remote function is at a different process, and a round trip is required. There are different ways of deployment even for remote functions.
For example they can be co-located with the Task managers and communicate via the loop back device /Unix domain socket, or they can be deployed behind a load balancer with an auto-scaler, and thus reacting to higher request rate/latency increases by spinning new instances (something that is not yet supported with the embedded API)

Good luck,
Igal.







On Wednesday, May 6, 2020, Wouter Zorgdrager <[hidden email]> wrote:
Hi all,

I've been using Flink for quite some time now and for a university project I'm planning to experiment with statefun. During the walkthrough I've run into some issues, I hope you can help me with.

1) Is it correct that the Docker image of statefun is not yet published? I couldn't find it anywhere, but was able to run it by building the image myself.
2) In the example project using the Python SDK, it uses Flask to expose a function using POST. Is there also a way to serve GET request so that you can trigger a stateful function by for instance using your browser? 
3) Do you expect a lot of performance loss when using the Python SDK over Java?

Thanks in advance!

Regards,
Wouter
Reply | Threaded
Open this post in threaded view
|

Re: Statefun 2.0 questions

Igal Shilman
Hi Wouter,

Triggering a stateful function from a frontend indeed requires an ingress between them, so the way you've approached this is also the way we were thinking of.
As Gordon mentioned a potential improvement might be an HTTP ingress, that would allow triggering stateful functions directly from the front end servers.
But this kind of ingress is not implemented yet.

Regarding scaling: Your understanding is correct, you can scale both the Flink cluster and the remote "python-stateful-function" cluster independently.
Scaling the Flink cluster, tho, requires taking a savepoint, bumping the job parallelism, and starting the cluster with more workers from the savepoint taken previously.

Scaling "python-stateful-function" workers can be done transparently to the Flink cluster, but the exact details are deployment specific.
- For example the python workers are a k8s service.
- Or the python workers are deployed behind a load balancer
- Or you add new entries to the DNS record of your python worker. 

I didn't understand "ensuring that it ends op in the correct Flink job" can you please clarify?
Flink would be the one contacting the remote workers and not the other way around. So as long as the new instances
are visible to Flink they would be reached with the same shared state.

I'd recommend watching [1] and the demo at the end, and [2] for a demo using stateful functions on AWS lambda.


It seems like you are on the correct path!
Good luck!
Igal.


On Tue, May 12, 2020 at 11:18 PM Wouter Zorgdrager <[hidden email]> wrote:
Hi Igal, all,

In the meantime we found a way to serve Flink stateful functions in a frontend. We decided to add another (set of) Flask application(s) which link to Kafka topics. These Kafka topics then serve as ingress and egress for the statefun cluster. However, we're wondering how we can scale this cluster. On the documentation page some nice figures are provided for different setups but no implementation details are given. In our case we are using a remote cluster so we have a Docker instance containing the `python-stateful-function` and of course the Flink cluster containing a `master` and `worker`. If I understood correctly, in a remote setting, we can scale both the Flink cluster and the `python-stateful-function`. Scaling the Flink cluster is trivial because I can add just more workers/task-managers (providing more taskslots) just by scaling the worker instance. However, how can I scale the stateful function also ensuring that it ends op in the correct Flink job (because we need shared state there). I tried scaling the Docker instance as well but that didn't seem to work. 

Hope you can give me some leads there. 
Thanks in advance!

Kind regards,
Wouter

Op do 7 mei 2020 om 17:17 schreef Wouter Zorgdrager <[hidden email]>:
Hi Igal,

Thanks for your quick reply. Getting back to point 2, I was wondering if you could trigger indeed a stateful function directly from Flask and also get the reply there instead of using Kafka in between. We want to experiment running stateful functions behind a front-end (which should be able to trigger a function), but we're a bit afraid that using Kafka doesn't scale well if on the frontend side a user has to consume all Kafka messages to find the correct reply/output for a certain request/input. Any thoughts?

Thanks in advance,
Wouter

Op do 7 mei 2020 om 10:51 schreef Igal Shilman <[hidden email]>:
Hi Wouter!

Glad to read that you are using Flink for quite some time, and also exploring with StateFun!

1) yes it is correct and you can follow the Dockerhub contribution PR at [1]

2) I’m not sure I understand what do you mean by trigger from the browser.
If you mean, for testing / illustration purposes triggering the function independently of StateFun, you would need to write some JavaScript and preform the POST (assuming CORS are enabled)
Let me know if you’d like getting further information of how to do it.
Broadly speaking, GET is traditionally used to get data from a resource and POST to send data (the data is the invocation batch in our case).

One easier walk around for you would be to expose another endpoint in your Flask application, and call your stateful function directly from there (possibly populating the function argument with values taken from the query params)

3) I would expect a performance loss when going from the embedded SDK to the remote one, simply because the remote function is at a different process, and a round trip is required. There are different ways of deployment even for remote functions.
For example they can be co-located with the Task managers and communicate via the loop back device /Unix domain socket, or they can be deployed behind a load balancer with an auto-scaler, and thus reacting to higher request rate/latency increases by spinning new instances (something that is not yet supported with the embedded API)

Good luck,
Igal.







On Wednesday, May 6, 2020, Wouter Zorgdrager <[hidden email]> wrote:
Hi all,

I've been using Flink for quite some time now and for a university project I'm planning to experiment with statefun. During the walkthrough I've run into some issues, I hope you can help me with.

1) Is it correct that the Docker image of statefun is not yet published? I couldn't find it anywhere, but was able to run it by building the image myself.
2) In the example project using the Python SDK, it uses Flask to expose a function using POST. Is there also a way to serve GET request so that you can trigger a stateful function by for instance using your browser? 
3) Do you expect a lot of performance loss when using the Python SDK over Java?

Thanks in advance!

Regards,
Wouter
Reply | Threaded
Open this post in threaded view
|

Re: Statefun 2.0 questions

Wouter Zorgdrager
Dear Igal, all,

Thanks a lot. This is very helpful. I understand the architecture a bit more now. We can just scale the stateful functions and put a load balancer in front and Flink will contact them. The only part of the scaling I don't understand yet is how to scale the 'Flink side'. So If I understand correctly the Kafka ingress/egress parts runs on the Flink cluster and contacts the remote workers through HTTP. How can I scale this Kafka part then? For a normal Flink job I would just change the parallelism, but I couldn't really find that option yet. Is there some value I need to set in the module.yaml.

Once again, thanks for the help so far. It has been useful.

Regards,
Wouter

Op wo 13 mei 2020 om 00:03 schreef Igal Shilman <[hidden email]>:
Hi Wouter,

Triggering a stateful function from a frontend indeed requires an ingress between them, so the way you've approached this is also the way we were thinking of.
As Gordon mentioned a potential improvement might be an HTTP ingress, that would allow triggering stateful functions directly from the front end servers.
But this kind of ingress is not implemented yet.

Regarding scaling: Your understanding is correct, you can scale both the Flink cluster and the remote "python-stateful-function" cluster independently.
Scaling the Flink cluster, tho, requires taking a savepoint, bumping the job parallelism, and starting the cluster with more workers from the savepoint taken previously.

Scaling "python-stateful-function" workers can be done transparently to the Flink cluster, but the exact details are deployment specific.
- For example the python workers are a k8s service.
- Or the python workers are deployed behind a load balancer
- Or you add new entries to the DNS record of your python worker. 

I didn't understand "ensuring that it ends op in the correct Flink job" can you please clarify?
Flink would be the one contacting the remote workers and not the other way around. So as long as the new instances
are visible to Flink they would be reached with the same shared state.

I'd recommend watching [1] and the demo at the end, and [2] for a demo using stateful functions on AWS lambda.


It seems like you are on the correct path!
Good luck!
Igal.


On Tue, May 12, 2020 at 11:18 PM Wouter Zorgdrager <[hidden email]> wrote:
Hi Igal, all,

In the meantime we found a way to serve Flink stateful functions in a frontend. We decided to add another (set of) Flask application(s) which link to Kafka topics. These Kafka topics then serve as ingress and egress for the statefun cluster. However, we're wondering how we can scale this cluster. On the documentation page some nice figures are provided for different setups but no implementation details are given. In our case we are using a remote cluster so we have a Docker instance containing the `python-stateful-function` and of course the Flink cluster containing a `master` and `worker`. If I understood correctly, in a remote setting, we can scale both the Flink cluster and the `python-stateful-function`. Scaling the Flink cluster is trivial because I can add just more workers/task-managers (providing more taskslots) just by scaling the worker instance. However, how can I scale the stateful function also ensuring that it ends op in the correct Flink job (because we need shared state there). I tried scaling the Docker instance as well but that didn't seem to work. 

Hope you can give me some leads there. 
Thanks in advance!

Kind regards,
Wouter

Op do 7 mei 2020 om 17:17 schreef Wouter Zorgdrager <[hidden email]>:
Hi Igal,

Thanks for your quick reply. Getting back to point 2, I was wondering if you could trigger indeed a stateful function directly from Flask and also get the reply there instead of using Kafka in between. We want to experiment running stateful functions behind a front-end (which should be able to trigger a function), but we're a bit afraid that using Kafka doesn't scale well if on the frontend side a user has to consume all Kafka messages to find the correct reply/output for a certain request/input. Any thoughts?

Thanks in advance,
Wouter

Op do 7 mei 2020 om 10:51 schreef Igal Shilman <[hidden email]>:
Hi Wouter!

Glad to read that you are using Flink for quite some time, and also exploring with StateFun!

1) yes it is correct and you can follow the Dockerhub contribution PR at [1]

2) I’m not sure I understand what do you mean by trigger from the browser.
If you mean, for testing / illustration purposes triggering the function independently of StateFun, you would need to write some JavaScript and preform the POST (assuming CORS are enabled)
Let me know if you’d like getting further information of how to do it.
Broadly speaking, GET is traditionally used to get data from a resource and POST to send data (the data is the invocation batch in our case).

One easier walk around for you would be to expose another endpoint in your Flask application, and call your stateful function directly from there (possibly populating the function argument with values taken from the query params)

3) I would expect a performance loss when going from the embedded SDK to the remote one, simply because the remote function is at a different process, and a round trip is required. There are different ways of deployment even for remote functions.
For example they can be co-located with the Task managers and communicate via the loop back device /Unix domain socket, or they can be deployed behind a load balancer with an auto-scaler, and thus reacting to higher request rate/latency increases by spinning new instances (something that is not yet supported with the embedded API)

Good luck,
Igal.







On Wednesday, May 6, 2020, Wouter Zorgdrager <[hidden email]> wrote:
Hi all,

I've been using Flink for quite some time now and for a university project I'm planning to experiment with statefun. During the walkthrough I've run into some issues, I hope you can help me with.

1) Is it correct that the Docker image of statefun is not yet published? I couldn't find it anywhere, but was able to run it by building the image myself.
2) In the example project using the Python SDK, it uses Flask to expose a function using POST. Is there also a way to serve GET request so that you can trigger a stateful function by for instance using your browser? 
3) Do you expect a lot of performance loss when using the Python SDK over Java?

Thanks in advance!

Regards,
Wouter
Reply | Threaded
Open this post in threaded view
|

Re: Statefun 2.0 questions

Igal Shilman
Hi,
I'm glad things are getting clearer, looking forward to seeing how statefun is working out for you :-)
 
To change the parallelism you can simply set the "parallelism.default" [1] key in flink-conf.yaml.
It is located in the statefun container at /opt/flink/conf/flink-conf.yaml.
To avoid rebuilding the container you can mount the flink-conf.yaml externally, and if you are using Kubernetes then
simply define flink-conf.yaml it as a config map.


Good luck,
Igal.

On Wed, May 13, 2020 at 11:55 AM Wouter Zorgdrager <[hidden email]> wrote:
Dear Igal, all,

Thanks a lot. This is very helpful. I understand the architecture a bit more now. We can just scale the stateful functions and put a load balancer in front and Flink will contact them. The only part of the scaling I don't understand yet is how to scale the 'Flink side'. So If I understand correctly the Kafka ingress/egress parts runs on the Flink cluster and contacts the remote workers through HTTP. How can I scale this Kafka part then? For a normal Flink job I would just change the parallelism, but I couldn't really find that option yet. Is there some value I need to set in the module.yaml.

Once again, thanks for the help so far. It has been useful.

Regards,
Wouter

Op wo 13 mei 2020 om 00:03 schreef Igal Shilman <[hidden email]>:
Hi Wouter,

Triggering a stateful function from a frontend indeed requires an ingress between them, so the way you've approached this is also the way we were thinking of.
As Gordon mentioned a potential improvement might be an HTTP ingress, that would allow triggering stateful functions directly from the front end servers.
But this kind of ingress is not implemented yet.

Regarding scaling: Your understanding is correct, you can scale both the Flink cluster and the remote "python-stateful-function" cluster independently.
Scaling the Flink cluster, tho, requires taking a savepoint, bumping the job parallelism, and starting the cluster with more workers from the savepoint taken previously.

Scaling "python-stateful-function" workers can be done transparently to the Flink cluster, but the exact details are deployment specific.
- For example the python workers are a k8s service.
- Or the python workers are deployed behind a load balancer
- Or you add new entries to the DNS record of your python worker. 

I didn't understand "ensuring that it ends op in the correct Flink job" can you please clarify?
Flink would be the one contacting the remote workers and not the other way around. So as long as the new instances
are visible to Flink they would be reached with the same shared state.

I'd recommend watching [1] and the demo at the end, and [2] for a demo using stateful functions on AWS lambda.


It seems like you are on the correct path!
Good luck!
Igal.


On Tue, May 12, 2020 at 11:18 PM Wouter Zorgdrager <[hidden email]> wrote:
Hi Igal, all,

In the meantime we found a way to serve Flink stateful functions in a frontend. We decided to add another (set of) Flask application(s) which link to Kafka topics. These Kafka topics then serve as ingress and egress for the statefun cluster. However, we're wondering how we can scale this cluster. On the documentation page some nice figures are provided for different setups but no implementation details are given. In our case we are using a remote cluster so we have a Docker instance containing the `python-stateful-function` and of course the Flink cluster containing a `master` and `worker`. If I understood correctly, in a remote setting, we can scale both the Flink cluster and the `python-stateful-function`. Scaling the Flink cluster is trivial because I can add just more workers/task-managers (providing more taskslots) just by scaling the worker instance. However, how can I scale the stateful function also ensuring that it ends op in the correct Flink job (because we need shared state there). I tried scaling the Docker instance as well but that didn't seem to work. 

Hope you can give me some leads there. 
Thanks in advance!

Kind regards,
Wouter

Op do 7 mei 2020 om 17:17 schreef Wouter Zorgdrager <[hidden email]>:
Hi Igal,

Thanks for your quick reply. Getting back to point 2, I was wondering if you could trigger indeed a stateful function directly from Flask and also get the reply there instead of using Kafka in between. We want to experiment running stateful functions behind a front-end (which should be able to trigger a function), but we're a bit afraid that using Kafka doesn't scale well if on the frontend side a user has to consume all Kafka messages to find the correct reply/output for a certain request/input. Any thoughts?

Thanks in advance,
Wouter

Op do 7 mei 2020 om 10:51 schreef Igal Shilman <[hidden email]>:
Hi Wouter!

Glad to read that you are using Flink for quite some time, and also exploring with StateFun!

1) yes it is correct and you can follow the Dockerhub contribution PR at [1]

2) I’m not sure I understand what do you mean by trigger from the browser.
If you mean, for testing / illustration purposes triggering the function independently of StateFun, you would need to write some JavaScript and preform the POST (assuming CORS are enabled)
Let me know if you’d like getting further information of how to do it.
Broadly speaking, GET is traditionally used to get data from a resource and POST to send data (the data is the invocation batch in our case).

One easier walk around for you would be to expose another endpoint in your Flask application, and call your stateful function directly from there (possibly populating the function argument with values taken from the query params)

3) I would expect a performance loss when going from the embedded SDK to the remote one, simply because the remote function is at a different process, and a round trip is required. There are different ways of deployment even for remote functions.
For example they can be co-located with the Task managers and communicate via the loop back device /Unix domain socket, or they can be deployed behind a load balancer with an auto-scaler, and thus reacting to higher request rate/latency increases by spinning new instances (something that is not yet supported with the embedded API)

Good luck,
Igal.







On Wednesday, May 6, 2020, Wouter Zorgdrager <[hidden email]> wrote:
Hi all,

I've been using Flink for quite some time now and for a university project I'm planning to experiment with statefun. During the walkthrough I've run into some issues, I hope you can help me with.

1) Is it correct that the Docker image of statefun is not yet published? I couldn't find it anywhere, but was able to run it by building the image myself.
2) In the example project using the Python SDK, it uses Flask to expose a function using POST. Is there also a way to serve GET request so that you can trigger a stateful function by for instance using your browser? 
3) Do you expect a lot of performance loss when using the Python SDK over Java?

Thanks in advance!

Regards,
Wouter