(DEPRECATED) Apache Flink User Mailing List archive.

separation of JVMs for different applications

Classic

List

Threaded

13 messages Options

Manu Zhang

separation of JVMs for different applications

Hi all,

It seems tasks of different Flink applications can end up in the same JVM (TaskManager) in standalone mode. Isn't this fragile since errors in one application could crash another ? I checked FLIP-6 but didn't found any mention of changing it in the future.

Any thoughts or have I missed anything ?

Thanks,

Manu Zhang

Fabian Hueske-2

Re: separation of JVMs for different applications

Hi Manu,

As far as I know, there are not plans to change the stand-alone deployment.

FLIP-6 is focusing on deployments via resource providers (YARN, Mesos, etc.) which allow to start Flink processes per job.

Till (in CC) is more familiar with the FLIP-6 effort and might be able to add more detail.

Best,

Fabian

2016-12-01 4:16 GMT+01:00 Manu Zhang <[hidden email]>:

Hi all,

It seems tasks of different Flink applications can end up in the same JVM (TaskManager) in standalone mode. Isn't this fragile since errors in one application could crash another ? I checked FLIP-6 but didn't found any mention of changing it in the future.

Any thoughts or have I missed anything ?

Thanks,
Manu Zhang

Till Rohrmann

Re: separation of JVMs for different applications

Hi Manu,

with Flip-6 we will be able to support stricter application isolation by starting for each job a dedicated JobManager which will execute its tasks on TM reserved solely for this job. But at the same time we will continue supporting the multi-tenant cluster mode where tasks belonging to multiple jobs share the same set of TMs and, thus, might share information between them.

Cheers,

Till

On Fri, Dec 2, 2016 at 11:19 AM, Fabian Hueske <[hidden email]> wrote:

Hi Manu,

As far as I know, there are not plans to change the stand-alone deployment.
FLIP-6 is focusing on deployments via resource providers (YARN, Mesos, etc.) which allow to start Flink processes per job.

Till (in CC) is more familiar with the FLIP-6 effort and might be able to add more detail.

Best,
Fabian

2016-12-01 4:16 GMT+01:00 Manu Zhang <[hidden email]>:
Hi all,

It seems tasks of different Flink applications can end up in the same JVM (TaskManager) in standalone mode. Isn't this fragile since errors in one application could crash another ? I checked FLIP-6 but didn't found any mention of changing it in the future.

Any thoughts or have I missed anything ?

Thanks,
Manu Zhang

Manu Zhang

Re: separation of JVMs for different applications

Thanks Fabian and Till.

We have customers who are interested in using Flink but very concerned about that "multiple jobs share the same set of TMs". I've just joined the community recently so I'm not sure whether there has been a discussion over the "multi-tenant cluster mode" before.

The cons are one job/user's failure may crash another, which is unacceptable in a multi-tenant scenario.

What are the pros ? Do the pros overweigh the cons ?

Manu

On Fri, Dec 2, 2016 at 7:06 PM Till Rohrmann <[hidden email]> wrote:

Hi Manu,

with Flip-6 we will be able to support stricter application isolation by starting for each job a dedicated JobManager which will execute its tasks on TM reserved solely for this job. But at the same time we will continue supporting the multi-tenant cluster mode where tasks belonging to multiple jobs share the same set of TMs and, thus, might share information between them.

Cheers,
Till

On Fri, Dec 2, 2016 at 11:19 AM, Fabian Hueske <[hidden email]> wrote:
Hi Manu,

As far as I know, there are not plans to change the stand-alone deployment.
FLIP-6 is focusing on deployments via resource providers (YARN, Mesos, etc.) which allow to start Flink processes per job.

Till (in CC) is more familiar with the FLIP-6 effort and might be able to add more detail.

Best,
Fabian

2016-12-01 4:16 GMT+01:00 Manu Zhang <[hidden email]>:
Hi all,

It seems tasks of different Flink applications can end up in the same JVM (TaskManager) in standalone mode. Isn't this fragile since errors in one application could crash another ? I checked FLIP-6 but didn't found any mention of changing it in the future.

Any thoughts or have I missed anything ?

Thanks,
Manu Zhang

Till Rohrmann

Re: separation of JVMs for different applications

The pro for the multi-tenant cluster mode is that you can share data between jobs and you don't have to spin up a new cluster for each job. This might be helpful for scenarios where you want to run many short-lived and light-weight jobs.

But the important part is that you don't have to use this method. You can also start a new Flink cluster per job which will then execute the job isolated from any other jobs (given that you don't submit other jobs to this cluster).

Cheers,

Till

On Sat, Dec 3, 2016 at 2:50 PM, Manu Zhang <[hidden email]> wrote:

Thanks Fabian and Till.

We have customers who are interested in using Flink but very concerned about that "multiple jobs share the same set of TMs". I've just joined the community recently so I'm not sure whether there has been a discussion over the "multi-tenant cluster mode" before.

The cons are one job/user's failure may crash another, which is unacceptable in a multi-tenant scenario.
What are the pros ? Do the pros overweigh the cons ?

Manu

On Fri, Dec 2, 2016 at 7:06 PM Till Rohrmann <[hidden email]> wrote:
Hi Manu,

with Flip-6 we will be able to support stricter application isolation by starting for each job a dedicated JobManager which will execute its tasks on TM reserved solely for this job. But at the same time we will continue supporting the multi-tenant cluster mode where tasks belonging to multiple jobs share the same set of TMs and, thus, might share information between them.

Cheers,
Till

On Fri, Dec 2, 2016 at 11:19 AM, Fabian Hueske <[hidden email]> wrote:
Hi Manu,

As far as I know, there are not plans to change the stand-alone deployment.
FLIP-6 is focusing on deployments via resource providers (YARN, Mesos, etc.) which allow to start Flink processes per job.

Till (in CC) is more familiar with the FLIP-6 effort and might be able to add more detail.

Best,
Fabian

2016-12-01 4:16 GMT+01:00 Manu Zhang <[hidden email]>:
Hi all,

It seems tasks of different Flink applications can end up in the same JVM (TaskManager) in standalone mode. Isn't this fragile since errors in one application could crash another ? I checked FLIP-6 but didn't found any mention of changing it in the future.

Any thoughts or have I missed anything ?

Thanks,
Manu Zhang

Manu Zhang

Re: separation of JVMs for different applications

The pro for the multi-tenant cluster mode is that you can share data between jobs and you don't have to spin up a new cluster for each job.

I don't think we have to spin up a new cluster for each job if every job gets its own JVMs. For examples, Storm will launch a new worker(JVM) for a new job when free slots are available. How can we share data between jobs and why ?

On Mon, Dec 5, 2016 at 6:27 PM, Till Rohrmann <[hidden email]> wrote:

The pro for the multi-tenant cluster mode is that you can share data between jobs and you don't have to spin up a new cluster for each job. This might be helpful for scenarios where you want to run many short-lived and light-weight jobs.

But the important part is that you don't have to use this method. You can also start a new Flink cluster per job which will then execute the job isolated from any other jobs (given that you don't submit other jobs to this cluster).

Cheers,
Till

On Sat, Dec 3, 2016 at 2:50 PM, Manu Zhang <[hidden email]> wrote:
Thanks Fabian and Till.

We have customers who are interested in using Flink but very concerned about that "multiple jobs share the same set of TMs". I've just joined the community recently so I'm not sure whether there has been a discussion over the "multi-tenant cluster mode" before.

The cons are one job/user's failure may crash another, which is unacceptable in a multi-tenant scenario.
What are the pros ? Do the pros overweigh the cons ?

Manu

On Fri, Dec 2, 2016 at 7:06 PM Till Rohrmann <[hidden email]> wrote:
Hi Manu,

with Flip-6 we will be able to support stricter application isolation by starting for each job a dedicated JobManager which will execute its tasks on TM reserved solely for this job. But at the same time we will continue supporting the multi-tenant cluster mode where tasks belonging to multiple jobs share the same set of TMs and, thus, might share information between them.

Cheers,
Till

On Fri, Dec 2, 2016 at 11:19 AM, Fabian Hueske <[hidden email]> wrote:
Hi Manu,

As far as I know, there are not plans to change the stand-alone deployment.
FLIP-6 is focusing on deployments via resource providers (YARN, Mesos, etc.) which allow to start Flink processes per job.

Till (in CC) is more familiar with the FLIP-6 effort and might be able to add more detail.

Best,
Fabian

2016-12-01 4:16 GMT+01:00 Manu Zhang <[hidden email]>:
Hi all,

It seems tasks of different Flink applications can end up in the same JVM (TaskManager) in standalone mode. Isn't this fragile since errors in one application could crash another ? I checked FLIP-6 but didn't found any mention of changing it in the future.

Any thoughts or have I missed anything ?

Thanks,
Manu Zhang

Stephan Ewen

Re: separation of JVMs for different applications

Hi!

Are your customers using YARN? In that case, the default configuration will start a new YARN application per Flink job, no JVMs are shared between jobs. By default, even each slot has its own JVM.

Greetings,

Stephan

PS: I think the "spawning new JVMs" is what Till referred to when saying "spinning up a new cluster". Keep in mind that Flink is also a batch processor, and it handles sequences of short batch jobs (as issued for example by interactive shells) and it pre-allocates and manages a lot of memory for batch jobs.

On Mon, Dec 5, 2016 at 3:48 PM, Manu Zhang <[hidden email]> wrote:

The pro for the multi-tenant cluster mode is that you can share data between jobs and you don't have to spin up a new cluster for each job.

I don't think we have to spin up a new cluster for each job if every job gets its own JVMs. For examples, Storm will launch a new worker(JVM) for a new job when free slots are available. How can we share data between jobs and why ?

On Mon, Dec 5, 2016 at 6:27 PM, Till Rohrmann <[hidden email]> wrote:
The pro for the multi-tenant cluster mode is that you can share data between jobs and you don't have to spin up a new cluster for each job. This might be helpful for scenarios where you want to run many short-lived and light-weight jobs.

But the important part is that you don't have to use this method. You can also start a new Flink cluster per job which will then execute the job isolated from any other jobs (given that you don't submit other jobs to this cluster).

Cheers,
Till

On Sat, Dec 3, 2016 at 2:50 PM, Manu Zhang <[hidden email]> wrote:
Thanks Fabian and Till.

We have customers who are interested in using Flink but very concerned about that "multiple jobs share the same set of TMs". I've just joined the community recently so I'm not sure whether there has been a discussion over the "multi-tenant cluster mode" before.

The cons are one job/user's failure may crash another, which is unacceptable in a multi-tenant scenario.
What are the pros ? Do the pros overweigh the cons ?

Manu

On Fri, Dec 2, 2016 at 7:06 PM Till Rohrmann <[hidden email]> wrote:
Hi Manu,

with Flip-6 we will be able to support stricter application isolation by starting for each job a dedicated JobManager which will execute its tasks on TM reserved solely for this job. But at the same time we will continue supporting the multi-tenant cluster mode where tasks belonging to multiple jobs share the same set of TMs and, thus, might share information between them.

Cheers,
Till

On Fri, Dec 2, 2016 at 11:19 AM, Fabian Hueske <[hidden email]> wrote:
Hi Manu,

As far as I know, there are not plans to change the stand-alone deployment.
FLIP-6 is focusing on deployments via resource providers (YARN, Mesos, etc.) which allow to start Flink processes per job.

Till (in CC) is more familiar with the FLIP-6 effort and might be able to add more detail.

Best,
Fabian

2016-12-01 4:16 GMT+01:00 Manu Zhang <[hidden email]>:
Hi all,

It seems tasks of different Flink applications can end up in the same JVM (TaskManager) in standalone mode. Isn't this fragile since errors in one application could crash another ? I checked FLIP-6 but didn't found any mention of changing it in the future.

Any thoughts or have I missed anything ?

Thanks,
Manu Zhang

Manu Zhang

Re: separation of JVMs for different applications

Thanks Stephan,

They don't use YARN now but I think they will consider it. Do you think it would be beneficial to provide such an option as "separate-jvm" in stand-alone mode for streaming processor and long running services ? Or do you think it would introduce too much complexity ?

Manu

On Tue, Dec 6, 2016 at 1:04 AM Stephan Ewen <[hidden email]> wrote:

Hi!

Are your customers using YARN? In that case, the default configuration will start a new YARN application per Flink job, no JVMs are shared between jobs. By default, even each slot has its own JVM.

Greetings,
Stephan

PS: I think the "spawning new JVMs" is what Till referred to when saying "spinning up a new cluster". Keep in mind that Flink is also a batch processor, and it handles sequences of short batch jobs (as issued for example by interactive shells) and it pre-allocates and manages a lot of memory for batch jobs.

On Mon, Dec 5, 2016 at 3:48 PM, Manu Zhang <[hidden email]> wrote:
The pro for the multi-tenant cluster mode is that you can share data between jobs and you don't have to spin up a new cluster for each job.

I don't think we have to spin up a new cluster for each job if every job gets its own JVMs. For examples, Storm will launch a new worker(JVM) for a new job when free slots are available. How can we share data between jobs and why ?

On Mon, Dec 5, 2016 at 6:27 PM, Till Rohrmann <[hidden email]> wrote:
The pro for the multi-tenant cluster mode is that you can share data between jobs and you don't have to spin up a new cluster for each job. This might be helpful for scenarios where you want to run many short-lived and light-weight jobs.

But the important part is that you don't have to use this method. You can also start a new Flink cluster per job which will then execute the job isolated from any other jobs (given that you don't submit other jobs to this cluster).

Cheers,
Till

On Sat, Dec 3, 2016 at 2:50 PM, Manu Zhang <[hidden email]> wrote:
Thanks Fabian and Till.

We have customers who are interested in using Flink but very concerned about that "multiple jobs share the same set of TMs". I've just joined the community recently so I'm not sure whether there has been a discussion over the "multi-tenant cluster mode" before.

The cons are one job/user's failure may crash another, which is unacceptable in a multi-tenant scenario.
What are the pros ? Do the pros overweigh the cons ?

Manu

On Fri, Dec 2, 2016 at 7:06 PM Till Rohrmann <[hidden email]> wrote:
Hi Manu,

with Flip-6 we will be able to support stricter application isolation by starting for each job a dedicated JobManager which will execute its tasks on TM reserved solely for this job. But at the same time we will continue supporting the multi-tenant cluster mode where tasks belonging to multiple jobs share the same set of TMs and, thus, might share information between them.

Cheers,
Till

On Fri, Dec 2, 2016 at 11:19 AM, Fabian Hueske <[hidden email]> wrote:
Hi Manu,

As far as I know, there are not plans to change the stand-alone deployment.
FLIP-6 is focusing on deployments via resource providers (YARN, Mesos, etc.) which allow to start Flink processes per job.

Till (in CC) is more familiar with the FLIP-6 effort and might be able to add more detail.

Best,
Fabian

2016-12-01 4:16 GMT+01:00 Manu Zhang <[hidden email]>:
Hi all,

It seems tasks of different Flink applications can end up in the same JVM (TaskManager) in standalone mode. Isn't this fragile since errors in one application could crash another ? I checked FLIP-6 but didn't found any mention of changing it in the future.

Any thoughts or have I missed anything ?

Thanks,
Manu Zhang

Stephan Ewen

Re: separation of JVMs for different applications

Hi!

We are currently changing the resource and process model quite a bit: https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=65147077

As part of that, I think it makes sense to introduce something like that.

What you can do today is to set TaskManagers to use one slot only, and then start multiple TaskManagers per machine. That makes sure that JVMs are never shared across machines.

If you use the "start-cluster.sh" script from Flink, you can enter the same hostname multiple times in the workers file, and it will start multiple TaskManagers on a machine.

Best,

Stephan

On Tue, Dec 6, 2016 at 3:51 AM, Manu Zhang <[hidden email]> wrote:

Thanks Stephan,

They don't use YARN now but I think they will consider it. Do you think it would be beneficial to provide such an option as "separate-jvm" in stand-alone mode for streaming processor and long running services ? Or do you think it would introduce too much complexity ?

Manu

On Tue, Dec 6, 2016 at 1:04 AM Stephan Ewen <[hidden email]> wrote:
Hi!

Are your customers using YARN? In that case, the default configuration will start a new YARN application per Flink job, no JVMs are shared between jobs. By default, even each slot has its own JVM.

Greetings,
Stephan

PS: I think the "spawning new JVMs" is what Till referred to when saying "spinning up a new cluster". Keep in mind that Flink is also a batch processor, and it handles sequences of short batch jobs (as issued for example by interactive shells) and it pre-allocates and manages a lot of memory for batch jobs.

On Mon, Dec 5, 2016 at 3:48 PM, Manu Zhang <[hidden email]> wrote:
The pro for the multi-tenant cluster mode is that you can share data between jobs and you don't have to spin up a new cluster for each job.

I don't think we have to spin up a new cluster for each job if every job gets its own JVMs. For examples, Storm will launch a new worker(JVM) for a new job when free slots are available. How can we share data between jobs and why ?

On Mon, Dec 5, 2016 at 6:27 PM, Till Rohrmann <[hidden email]> wrote:
The pro for the multi-tenant cluster mode is that you can share data between jobs and you don't have to spin up a new cluster for each job. This might be helpful for scenarios where you want to run many short-lived and light-weight jobs.

But the important part is that you don't have to use this method. You can also start a new Flink cluster per job which will then execute the job isolated from any other jobs (given that you don't submit other jobs to this cluster).

Cheers,
Till

On Sat, Dec 3, 2016 at 2:50 PM, Manu Zhang <[hidden email]> wrote:
Thanks Fabian and Till.

We have customers who are interested in using Flink but very concerned about that "multiple jobs share the same set of TMs". I've just joined the community recently so I'm not sure whether there has been a discussion over the "multi-tenant cluster mode" before.

The cons are one job/user's failure may crash another, which is unacceptable in a multi-tenant scenario.
What are the pros ? Do the pros overweigh the cons ?

Manu

On Fri, Dec 2, 2016 at 7:06 PM Till Rohrmann <[hidden email]> wrote:
Hi Manu,

with Flip-6 we will be able to support stricter application isolation by starting for each job a dedicated JobManager which will execute its tasks on TM reserved solely for this job. But at the same time we will continue supporting the multi-tenant cluster mode where tasks belonging to multiple jobs share the same set of TMs and, thus, might share information between them.

Cheers,
Till

On Fri, Dec 2, 2016 at 11:19 AM, Fabian Hueske <[hidden email]> wrote:
Hi Manu,

As far as I know, there are not plans to change the stand-alone deployment.
FLIP-6 is focusing on deployments via resource providers (YARN, Mesos, etc.) which allow to start Flink processes per job.

Till (in CC) is more familiar with the FLIP-6 effort and might be able to add more detail.

Best,
Fabian

2016-12-01 4:16 GMT+01:00 Manu Zhang <[hidden email]>:
Hi all,

It seems tasks of different Flink applications can end up in the same JVM (TaskManager) in standalone mode. Isn't this fragile since errors in one application could crash another ? I checked FLIP-6 but didn't found any mention of changing it in the future.

Any thoughts or have I missed anything ?

Thanks,
Manu Zhang

Manu Zhang

Re: separation of JVMs for different applications

Good to know that.

Is it the "standalone setup v2.0" section ? The wiki page has no Google-Doc-like change histories.

Any jiras opened for that ? Not sure that will be noticed given FLIP-6 is almost finished.

Thanks,

Manu

On Tue, Dec 6, 2016 at 11:55 PM Stephan Ewen <[hidden email]> wrote:

Hi!

We are currently changing the resource and process model quite a bit: https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=65147077
As part of that, I think it makes sense to introduce something like that.

What you can do today is to set TaskManagers to use one slot only, and then start multiple TaskManagers per machine. That makes sure that JVMs are never shared across machines.
If you use the "start-cluster.sh" script from Flink, you can enter the same hostname multiple times in the workers file, and it will start multiple TaskManagers on a machine.

Best,
Stephan

On Tue, Dec 6, 2016 at 3:51 AM, Manu Zhang <[hidden email]> wrote:
Thanks Stephan,

They don't use YARN now but I think they will consider it. Do you think it would be beneficial to provide such an option as "separate-jvm" in stand-alone mode for streaming processor and long running services ? Or do you think it would introduce too much complexity ?

Manu

On Tue, Dec 6, 2016 at 1:04 AM Stephan Ewen <[hidden email]> wrote:
Hi!

Are your customers using YARN? In that case, the default configuration will start a new YARN application per Flink job, no JVMs are shared between jobs. By default, even each slot has its own JVM.

Greetings,
Stephan

PS: I think the "spawning new JVMs" is what Till referred to when saying "spinning up a new cluster". Keep in mind that Flink is also a batch processor, and it handles sequences of short batch jobs (as issued for example by interactive shells) and it pre-allocates and manages a lot of memory for batch jobs.

On Mon, Dec 5, 2016 at 3:48 PM, Manu Zhang <[hidden email]> wrote:
The pro for the multi-tenant cluster mode is that you can share data between jobs and you don't have to spin up a new cluster for each job.

I don't think we have to spin up a new cluster for each job if every job gets its own JVMs. For examples, Storm will launch a new worker(JVM) for a new job when free slots are available. How can we share data between jobs and why ?

On Mon, Dec 5, 2016 at 6:27 PM, Till Rohrmann <[hidden email]> wrote:
The pro for the multi-tenant cluster mode is that you can share data between jobs and you don't have to spin up a new cluster for each job. This might be helpful for scenarios where you want to run many short-lived and light-weight jobs.

But the important part is that you don't have to use this method. You can also start a new Flink cluster per job which will then execute the job isolated from any other jobs (given that you don't submit other jobs to this cluster).

Cheers,
Till

On Sat, Dec 3, 2016 at 2:50 PM, Manu Zhang <[hidden email]> wrote:
Thanks Fabian and Till.

We have customers who are interested in using Flink but very concerned about that "multiple jobs share the same set of TMs". I've just joined the community recently so I'm not sure whether there has been a discussion over the "multi-tenant cluster mode" before.

The cons are one job/user's failure may crash another, which is unacceptable in a multi-tenant scenario.
What are the pros ? Do the pros overweigh the cons ?

Manu

On Fri, Dec 2, 2016 at 7:06 PM Till Rohrmann <[hidden email]> wrote:
Hi Manu,

with Flip-6 we will be able to support stricter application isolation by starting for each job a dedicated JobManager which will execute its tasks on TM reserved solely for this job. But at the same time we will continue supporting the multi-tenant cluster mode where tasks belonging to multiple jobs share the same set of TMs and, thus, might share information between them.

Cheers,
Till

On Fri, Dec 2, 2016 at 11:19 AM, Fabian Hueske <[hidden email]> wrote:
Hi Manu,

As far as I know, there are not plans to change the stand-alone deployment.
FLIP-6 is focusing on deployments via resource providers (YARN, Mesos, etc.) which allow to start Flink processes per job.

Till (in CC) is more familiar with the FLIP-6 effort and might be able to add more detail.

Best,
Fabian

2016-12-01 4:16 GMT+01:00 Manu Zhang <[hidden email]>:
Hi all,

It seems tasks of different Flink applications can end up in the same JVM (TaskManager) in standalone mode. Isn't this fragile since errors in one application could crash another ? I checked FLIP-6 but didn't found any mention of changing it in the future.

Any thoughts or have I missed anything ?

Thanks,
Manu Zhang

Manu Zhang

Re: separation of JVMs for different applications

If there are not any existing jira for standalone v2.0, may I open a new one ?

Thanks,

Manu

On Wed, Dec 7, 2016 at 12:39 PM Manu Zhang <[hidden email]> wrote:

Good to know that.

Is it the "standalone setup v2.0" section ? The wiki page has no Google-Doc-like change histories.
Any jiras opened for that ? Not sure that will be noticed given FLIP-6 is almost finished.

Thanks,
Manu

On Tue, Dec 6, 2016 at 11:55 PM Stephan Ewen <[hidden email]> wrote:
Hi!

We are currently changing the resource and process model quite a bit: https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=65147077
As part of that, I think it makes sense to introduce something like that.

What you can do today is to set TaskManagers to use one slot only, and then start multiple TaskManagers per machine. That makes sure that JVMs are never shared across machines.
If you use the "start-cluster.sh" script from Flink, you can enter the same hostname multiple times in the workers file, and it will start multiple TaskManagers on a machine.

Best,
Stephan

On Tue, Dec 6, 2016 at 3:51 AM, Manu Zhang <[hidden email]> wrote:
Thanks Stephan,

They don't use YARN now but I think they will consider it. Do you think it would be beneficial to provide such an option as "separate-jvm" in stand-alone mode for streaming processor and long running services ? Or do you think it would introduce too much complexity ?

Manu

On Tue, Dec 6, 2016 at 1:04 AM Stephan Ewen <[hidden email]> wrote:
Hi!

Are your customers using YARN? In that case, the default configuration will start a new YARN application per Flink job, no JVMs are shared between jobs. By default, even each slot has its own JVM.

Greetings,
Stephan

PS: I think the "spawning new JVMs" is what Till referred to when saying "spinning up a new cluster". Keep in mind that Flink is also a batch processor, and it handles sequences of short batch jobs (as issued for example by interactive shells) and it pre-allocates and manages a lot of memory for batch jobs.

On Mon, Dec 5, 2016 at 3:48 PM, Manu Zhang <[hidden email]> wrote:
The pro for the multi-tenant cluster mode is that you can share data between jobs and you don't have to spin up a new cluster for each job.

I don't think we have to spin up a new cluster for each job if every job gets its own JVMs. For examples, Storm will launch a new worker(JVM) for a new job when free slots are available. How can we share data between jobs and why ?

On Mon, Dec 5, 2016 at 6:27 PM, Till Rohrmann <[hidden email]> wrote:
The pro for the multi-tenant cluster mode is that you can share data between jobs and you don't have to spin up a new cluster for each job. This might be helpful for scenarios where you want to run many short-lived and light-weight jobs.

But the important part is that you don't have to use this method. You can also start a new Flink cluster per job which will then execute the job isolated from any other jobs (given that you don't submit other jobs to this cluster).

Cheers,
Till

On Sat, Dec 3, 2016 at 2:50 PM, Manu Zhang <[hidden email]> wrote:
Thanks Fabian and Till.

We have customers who are interested in using Flink but very concerned about that "multiple jobs share the same set of TMs". I've just joined the community recently so I'm not sure whether there has been a discussion over the "multi-tenant cluster mode" before.

The cons are one job/user's failure may crash another, which is unacceptable in a multi-tenant scenario.
What are the pros ? Do the pros overweigh the cons ?

Manu

On Fri, Dec 2, 2016 at 7:06 PM Till Rohrmann <[hidden email]> wrote:
Hi Manu,

with Flip-6 we will be able to support stricter application isolation by starting for each job a dedicated JobManager which will execute its tasks on TM reserved solely for this job. But at the same time we will continue supporting the multi-tenant cluster mode where tasks belonging to multiple jobs share the same set of TMs and, thus, might share information between them.

Cheers,
Till

On Fri, Dec 2, 2016 at 11:19 AM, Fabian Hueske <[hidden email]> wrote:
Hi Manu,

As far as I know, there are not plans to change the stand-alone deployment.
FLIP-6 is focusing on deployments via resource providers (YARN, Mesos, etc.) which allow to start Flink processes per job.

Till (in CC) is more familiar with the FLIP-6 effort and might be able to add more detail.

Best,
Fabian

2016-12-01 4:16 GMT+01:00 Manu Zhang <[hidden email]>:
Hi all,

It seems tasks of different Flink applications can end up in the same JVM (TaskManager) in standalone mode. Isn't this fragile since errors in one application could crash another ? I checked FLIP-6 but didn't found any mention of changing it in the future.

Any thoughts or have I missed anything ?

Thanks,
Manu Zhang

Till Rohrmann

Re: separation of JVMs for different applications

Hi Manu,

afaik there is no JIRA for standalone v2.0 yet. So feel free to open an JIRA for it.

Just a small correction, FLIP-6 is not almost finished yet. But we're working on it and are happy for every helping hand :-)

Cheers,

Till

On Fri, Dec 9, 2016 at 2:27 AM, Manu Zhang <[hidden email]> wrote:

If there are not any existing jira for standalone v2.0, may I open a new one ?

Thanks,
Manu

On Wed, Dec 7, 2016 at 12:39 PM Manu Zhang <[hidden email]> wrote:
Good to know that.

Is it the "standalone setup v2.0" section ? The wiki page has no Google-Doc-like change histories.
Any jiras opened for that ? Not sure that will be noticed given FLIP-6 is almost finished.

Thanks,
Manu

On Tue, Dec 6, 2016 at 11:55 PM Stephan Ewen <[hidden email]> wrote:
Hi!

We are currently changing the resource and process model quite a bit: https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=65147077
As part of that, I think it makes sense to introduce something like that.

What you can do today is to set TaskManagers to use one slot only, and then start multiple TaskManagers per machine. That makes sure that JVMs are never shared across machines.
If you use the "start-cluster.sh" script from Flink, you can enter the same hostname multiple times in the workers file, and it will start multiple TaskManagers on a machine.

Best,
Stephan

On Tue, Dec 6, 2016 at 3:51 AM, Manu Zhang <[hidden email]> wrote:
Thanks Stephan,

They don't use YARN now but I think they will consider it. Do you think it would be beneficial to provide such an option as "separate-jvm" in stand-alone mode for streaming processor and long running services ? Or do you think it would introduce too much complexity ?

Manu

On Tue, Dec 6, 2016 at 1:04 AM Stephan Ewen <[hidden email]> wrote:
Hi!

Are your customers using YARN? In that case, the default configuration will start a new YARN application per Flink job, no JVMs are shared between jobs. By default, even each slot has its own JVM.

Greetings,
Stephan

PS: I think the "spawning new JVMs" is what Till referred to when saying "spinning up a new cluster". Keep in mind that Flink is also a batch processor, and it handles sequences of short batch jobs (as issued for example by interactive shells) and it pre-allocates and manages a lot of memory for batch jobs.

On Mon, Dec 5, 2016 at 3:48 PM, Manu Zhang <[hidden email]> wrote:
The pro for the multi-tenant cluster mode is that you can share data between jobs and you don't have to spin up a new cluster for each job.

I don't think we have to spin up a new cluster for each job if every job gets its own JVMs. For examples, Storm will launch a new worker(JVM) for a new job when free slots are available. How can we share data between jobs and why ?

On Mon, Dec 5, 2016 at 6:27 PM, Till Rohrmann <[hidden email]> wrote:
The pro for the multi-tenant cluster mode is that you can share data between jobs and you don't have to spin up a new cluster for each job. This might be helpful for scenarios where you want to run many short-lived and light-weight jobs.

But the important part is that you don't have to use this method. You can also start a new Flink cluster per job which will then execute the job isolated from any other jobs (given that you don't submit other jobs to this cluster).

Cheers,
Till

On Sat, Dec 3, 2016 at 2:50 PM, Manu Zhang <[hidden email]> wrote:
Thanks Fabian and Till.

We have customers who are interested in using Flink but very concerned about that "multiple jobs share the same set of TMs". I've just joined the community recently so I'm not sure whether there has been a discussion over the "multi-tenant cluster mode" before.

The cons are one job/user's failure may crash another, which is unacceptable in a multi-tenant scenario.
What are the pros ? Do the pros overweigh the cons ?

Manu

On Fri, Dec 2, 2016 at 7:06 PM Till Rohrmann <[hidden email]> wrote:
Hi Manu,

with Flip-6 we will be able to support stricter application isolation by starting for each job a dedicated JobManager which will execute its tasks on TM reserved solely for this job. But at the same time we will continue supporting the multi-tenant cluster mode where tasks belonging to multiple jobs share the same set of TMs and, thus, might share information between them.

Cheers,
Till

On Fri, Dec 2, 2016 at 11:19 AM, Fabian Hueske <[hidden email]> wrote:
Hi Manu,

As far as I know, there are not plans to change the stand-alone deployment.
FLIP-6 is focusing on deployments via resource providers (YARN, Mesos, etc.) which allow to start Flink processes per job.

Till (in CC) is more familiar with the FLIP-6 effort and might be able to add more detail.

Best,
Fabian

2016-12-01 4:16 GMT+01:00 Manu Zhang <[hidden email]>:
Hi all,

It seems tasks of different Flink applications can end up in the same JVM (TaskManager) in standalone mode. Isn't this fragile since errors in one application could crash another ? I checked FLIP-6 but didn't found any mention of changing it in the future.

Any thoughts or have I missed anything ?

Thanks,
Manu Zhang

Manu Zhang

Re: separation of JVMs for different applications

Created https://issues.apache.org/jira/browse/FLINK-5312.

Thanks,

Manu

On Fri, Dec 9, 2016 at 7:17 PM Till Rohrmann <[hidden email]> wrote:

Hi Manu,

afaik there is no JIRA for standalone v2.0 yet. So feel free to open an JIRA for it.

Just a small correction, FLIP-6 is not almost finished yet. But we're working on it and are happy for every helping hand :-)

Cheers,
Till

On Fri, Dec 9, 2016 at 2:27 AM, Manu Zhang <[hidden email]> wrote:
If there are not any existing jira for standalone v2.0, may I open a new one ?

Thanks,
Manu

On Wed, Dec 7, 2016 at 12:39 PM Manu Zhang <[hidden email]> wrote:
Good to know that.

Is it the "standalone setup v2.0" section ? The wiki page has no Google-Doc-like change histories.
Any jiras opened for that ? Not sure that will be noticed given FLIP-6 is almost finished.

Thanks,
Manu

On Tue, Dec 6, 2016 at 11:55 PM Stephan Ewen <[hidden email]> wrote:
Hi!

We are currently changing the resource and process model quite a bit: https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=65147077
As part of that, I think it makes sense to introduce something like that.

What you can do today is to set TaskManagers to use one slot only, and then start multiple TaskManagers per machine. That makes sure that JVMs are never shared across machines.
If you use the "start-cluster.sh" script from Flink, you can enter the same hostname multiple times in the workers file, and it will start multiple TaskManagers on a machine.

Best,
Stephan

On Tue, Dec 6, 2016 at 3:51 AM, Manu Zhang <[hidden email]> wrote:
Thanks Stephan,

They don't use YARN now but I think they will consider it. Do you think it would be beneficial to provide such an option as "separate-jvm" in stand-alone mode for streaming processor and long running services ? Or do you think it would introduce too much complexity ?

Manu

On Tue, Dec 6, 2016 at 1:04 AM Stephan Ewen <[hidden email]> wrote:
Hi!

Are your customers using YARN? In that case, the default configuration will start a new YARN application per Flink job, no JVMs are shared between jobs. By default, even each slot has its own JVM.

Greetings,
Stephan

PS: I think the "spawning new JVMs" is what Till referred to when saying "spinning up a new cluster". Keep in mind that Flink is also a batch processor, and it handles sequences of short batch jobs (as issued for example by interactive shells) and it pre-allocates and manages a lot of memory for batch jobs.

On Mon, Dec 5, 2016 at 3:48 PM, Manu Zhang <[hidden email]> wrote:
The pro for the multi-tenant cluster mode is that you can share data between jobs and you don't have to spin up a new cluster for each job.

I don't think we have to spin up a new cluster for each job if every job gets its own JVMs. For examples, Storm will launch a new worker(JVM) for a new job when free slots are available. How can we share data between jobs and why ?

On Mon, Dec 5, 2016 at 6:27 PM, Till Rohrmann <[hidden email]> wrote:
The pro for the multi-tenant cluster mode is that you can share data between jobs and you don't have to spin up a new cluster for each job. This might be helpful for scenarios where you want to run many short-lived and light-weight jobs.

But the important part is that you don't have to use this method. You can also start a new Flink cluster per job which will then execute the job isolated from any other jobs (given that you don't submit other jobs to this cluster).

Cheers,
Till

On Sat, Dec 3, 2016 at 2:50 PM, Manu Zhang <[hidden email]> wrote:
Thanks Fabian and Till.

We have customers who are interested in using Flink but very concerned about that "multiple jobs share the same set of TMs". I've just joined the community recently so I'm not sure whether there has been a discussion over the "multi-tenant cluster mode" before.

The cons are one job/user's failure may crash another, which is unacceptable in a multi-tenant scenario.
What are the pros ? Do the pros overweigh the cons ?

Manu

On Fri, Dec 2, 2016 at 7:06 PM Till Rohrmann <[hidden email]> wrote:
Hi Manu,

with Flip-6 we will be able to support stricter application isolation by starting for each job a dedicated JobManager which will execute its tasks on TM reserved solely for this job. But at the same time we will continue supporting the multi-tenant cluster mode where tasks belonging to multiple jobs share the same set of TMs and, thus, might share information between them.

Cheers,
Till

On Fri, Dec 2, 2016 at 11:19 AM, Fabian Hueske <[hidden email]> wrote:
Hi Manu,

As far as I know, there are not plans to change the stand-alone deployment.
FLIP-6 is focusing on deployments via resource providers (YARN, Mesos, etc.) which allow to start Flink processes per job.

Till (in CC) is more familiar with the FLIP-6 effort and might be able to add more detail.

Best,
Fabian

2016-12-01 4:16 GMT+01:00 Manu Zhang <[hidden email]>:
Hi all,

It seems tasks of different Flink applications can end up in the same JVM (TaskManager) in standalone mode. Isn't this fragile since errors in one application could crash another ? I checked FLIP-6 but didn't found any mention of changing it in the future.

Any thoughts or have I missed anything ?

Thanks,
Manu Zhang