Flink docker in session cluster mode - is a local distribution needed?

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Flink docker in session cluster mode - is a local distribution needed?

Manas Kale
Hi,

I have a project that is a set of 6 jobs out of which 4 are written in Java and 2 are written in pyFlink. I want to dockerize these so that all 6 can be run in a single Flink session cluster.

I have been able to successfully set up the JobManager and TaskManager containers as per [1] after creating a custom Docker image that has Python.
For the last step, the guide asks us to submit the job using a local distribution of Flink:
$ ./bin/flink run ./examples/streaming/TopSpeedWindowing.jar
I am probably missing something here because I have the following questions:
Why do I need to use a local distribution to submit a job?
Why can't I use the Flink distribution that already exists within the images?
How do I submit a job using the Docker image's distribution?



Reply | Threaded
Open this post in threaded view
|

Re: Flink docker in session cluster mode - is a local distribution needed?

Till Rohrmann
Hi Manas,

I think the documentation assumes that you first start a session cluster and then submit jobs from outside the Docker images. If your jobs are included in the Docker image, then you could log into the master process and start the jobs from within the Docker image.

Cheers,
Till

On Tue, Feb 16, 2021 at 1:00 PM Manas Kale <[hidden email]> wrote:
Hi,

I have a project that is a set of 6 jobs out of which 4 are written in Java and 2 are written in pyFlink. I want to dockerize these so that all 6 can be run in a single Flink session cluster.

I have been able to successfully set up the JobManager and TaskManager containers as per [1] after creating a custom Docker image that has Python.
For the last step, the guide asks us to submit the job using a local distribution of Flink:
$ ./bin/flink run ./examples/streaming/TopSpeedWindowing.jar
I am probably missing something here because I have the following questions:
Why do I need to use a local distribution to submit a job?
Why can't I use the Flink distribution that already exists within the images?
How do I submit a job using the Docker image's distribution?



Reply | Threaded
Open this post in threaded view
|

Re: Flink docker in session cluster mode - is a local distribution needed?

Manas Kale
Hi Till,
Oh I see... I managed to do what you said using a bunch of docker exec commands. However, I think this solution is quite hacky and could be improved by providing some simple command to submit jobs using the Flink runtime within the docker images. I believe this will achieve full containerization - the host system is not at all expected to have the Flink runtime, everything is within Docker images.

Thanks a lot!  

On Tue, Feb 16, 2021 at 6:08 PM Till Rohrmann <[hidden email]> wrote:
Hi Manas,

I think the documentation assumes that you first start a session cluster and then submit jobs from outside the Docker images. If your jobs are included in the Docker image, then you could log into the master process and start the jobs from within the Docker image.

Cheers,
Till

On Tue, Feb 16, 2021 at 1:00 PM Manas Kale <[hidden email]> wrote:
Hi,

I have a project that is a set of 6 jobs out of which 4 are written in Java and 2 are written in pyFlink. I want to dockerize these so that all 6 can be run in a single Flink session cluster.

I have been able to successfully set up the JobManager and TaskManager containers as per [1] after creating a custom Docker image that has Python.
For the last step, the guide asks us to submit the job using a local distribution of Flink:
$ ./bin/flink run ./examples/streaming/TopSpeedWindowing.jar
I am probably missing something here because I have the following questions:
Why do I need to use a local distribution to submit a job?
Why can't I use the Flink distribution that already exists within the images?
How do I submit a job using the Docker image's distribution?



Reply | Threaded
Open this post in threaded view
|

Re: Flink docker in session cluster mode - is a local distribution needed?

Till Rohrmann
Yes, agreed. This could be better streamlined. If you wanna help with this, then feel free to open a JIRA issue for it.

Cheers,
Till

On Wed, Feb 17, 2021 at 11:37 AM Manas Kale <[hidden email]> wrote:
Hi Till,
Oh I see... I managed to do what you said using a bunch of docker exec commands. However, I think this solution is quite hacky and could be improved by providing some simple command to submit jobs using the Flink runtime within the docker images. I believe this will achieve full containerization - the host system is not at all expected to have the Flink runtime, everything is within Docker images.

Thanks a lot!  

On Tue, Feb 16, 2021 at 6:08 PM Till Rohrmann <[hidden email]> wrote:
Hi Manas,

I think the documentation assumes that you first start a session cluster and then submit jobs from outside the Docker images. If your jobs are included in the Docker image, then you could log into the master process and start the jobs from within the Docker image.

Cheers,
Till

On Tue, Feb 16, 2021 at 1:00 PM Manas Kale <[hidden email]> wrote:
Hi,

I have a project that is a set of 6 jobs out of which 4 are written in Java and 2 are written in pyFlink. I want to dockerize these so that all 6 can be run in a single Flink session cluster.

I have been able to successfully set up the JobManager and TaskManager containers as per [1] after creating a custom Docker image that has Python.
For the last step, the guide asks us to submit the job using a local distribution of Flink:
$ ./bin/flink run ./examples/streaming/TopSpeedWindowing.jar
I am probably missing something here because I have the following questions:
Why do I need to use a local distribution to submit a job?
Why can't I use the Flink distribution that already exists within the images?
How do I submit a job using the Docker image's distribution?



Reply | Threaded
Open this post in threaded view
|

Re: Flink docker in session cluster mode - is a local distribution needed?

Yang Wang
I am not aware of some simple solution we could use the Flink runtime jars within the docker images, except for "docker run/exec".
So if we want to provide some easy commands to submit Flink jobs, I think they are also a wrapper of "docker run/exec".

Best,
Yang

Till Rohrmann <[hidden email]> 于2021年2月17日周三 下午9:04写道:
Yes, agreed. This could be better streamlined. If you wanna help with this, then feel free to open a JIRA issue for it.

Cheers,
Till

On Wed, Feb 17, 2021 at 11:37 AM Manas Kale <[hidden email]> wrote:
Hi Till,
Oh I see... I managed to do what you said using a bunch of docker exec commands. However, I think this solution is quite hacky and could be improved by providing some simple command to submit jobs using the Flink runtime within the docker images. I believe this will achieve full containerization - the host system is not at all expected to have the Flink runtime, everything is within Docker images.

Thanks a lot!  

On Tue, Feb 16, 2021 at 6:08 PM Till Rohrmann <[hidden email]> wrote:
Hi Manas,

I think the documentation assumes that you first start a session cluster and then submit jobs from outside the Docker images. If your jobs are included in the Docker image, then you could log into the master process and start the jobs from within the Docker image.

Cheers,
Till

On Tue, Feb 16, 2021 at 1:00 PM Manas Kale <[hidden email]> wrote:
Hi,

I have a project that is a set of 6 jobs out of which 4 are written in Java and 2 are written in pyFlink. I want to dockerize these so that all 6 can be run in a single Flink session cluster.

I have been able to successfully set up the JobManager and TaskManager containers as per [1] after creating a custom Docker image that has Python.
For the last step, the guide asks us to submit the job using a local distribution of Flink:
$ ./bin/flink run ./examples/streaming/TopSpeedWindowing.jar
I am probably missing something here because I have the following questions:
Why do I need to use a local distribution to submit a job?
Why can't I use the Flink distribution that already exists within the images?
How do I submit a job using the Docker image's distribution?