separation of JVMs for different applications

classic Classic list List threaded Threaded
13 messages Options
Reply | Threaded
Open this post in threaded view
|

separation of JVMs for different applications

Manu Zhang
Hi all,

It seems tasks of different Flink applications can end up in the same JVM (TaskManager) in standalone mode. Isn't this fragile since errors in one application could crash another ? I checked FLIP-6 but didn't found any mention of changing it in the future. 

Any thoughts or have I missed anything ?

Thanks,
Manu Zhang
Reply | Threaded
Open this post in threaded view
|

Re: separation of JVMs for different applications

Fabian Hueske-2
Hi Manu,

As far as I know, there are not plans to change the stand-alone deployment.
FLIP-6 is focusing on deployments via resource providers (YARN, Mesos, etc.) which allow to start Flink processes per job.

Till (in CC) is more familiar with the FLIP-6 effort and might be able to add more detail.

Best,
Fabian

2016-12-01 4:16 GMT+01:00 Manu Zhang <[hidden email]>:
Hi all,

It seems tasks of different Flink applications can end up in the same JVM (TaskManager) in standalone mode. Isn't this fragile since errors in one application could crash another ? I checked FLIP-6 but didn't found any mention of changing it in the future. 

Any thoughts or have I missed anything ?

Thanks,
Manu Zhang

Reply | Threaded
Open this post in threaded view
|

Re: separation of JVMs for different applications

Till Rohrmann
Hi Manu,

with Flip-6 we will be able to support stricter application isolation by starting for each job a dedicated JobManager which will execute its tasks on TM reserved solely for this job. But at the same time we will continue supporting the multi-tenant cluster mode where tasks belonging to multiple jobs share the same set of TMs and, thus, might share information between them.

Cheers,
Till

On Fri, Dec 2, 2016 at 11:19 AM, Fabian Hueske <[hidden email]> wrote:
Hi Manu,

As far as I know, there are not plans to change the stand-alone deployment.
FLIP-6 is focusing on deployments via resource providers (YARN, Mesos, etc.) which allow to start Flink processes per job.

Till (in CC) is more familiar with the FLIP-6 effort and might be able to add more detail.

Best,
Fabian

2016-12-01 4:16 GMT+01:00 Manu Zhang <[hidden email]>:
Hi all,

It seems tasks of different Flink applications can end up in the same JVM (TaskManager) in standalone mode. Isn't this fragile since errors in one application could crash another ? I checked FLIP-6 but didn't found any mention of changing it in the future. 

Any thoughts or have I missed anything ?

Thanks,
Manu Zhang


Reply | Threaded
Open this post in threaded view
|

Re: separation of JVMs for different applications

Manu Zhang
Thanks Fabian and Till.

We have customers who are interested in using Flink but very concerned about that "multiple jobs share the same set of TMs". I've just joined the community recently so I'm not sure whether there has been a discussion over the "multi-tenant cluster mode" before. 

The cons are one job/user's failure may crash another, which is unacceptable in a multi-tenant scenario.
What are the pros ? Do the pros overweigh the cons ?

Manu

On Fri, Dec 2, 2016 at 7:06 PM Till Rohrmann <[hidden email]> wrote:
Hi Manu,

with Flip-6 we will be able to support stricter application isolation by starting for each job a dedicated JobManager which will execute its tasks on TM reserved solely for this job. But at the same time we will continue supporting the multi-tenant cluster mode where tasks belonging to multiple jobs share the same set of TMs and, thus, might share information between them.

Cheers,
Till

On Fri, Dec 2, 2016 at 11:19 AM, Fabian Hueske <[hidden email]> wrote:
Hi Manu,

As far as I know, there are not plans to change the stand-alone deployment.
FLIP-6 is focusing on deployments via resource providers (YARN, Mesos, etc.) which allow to start Flink processes per job.

Till (in CC) is more familiar with the FLIP-6 effort and might be able to add more detail.

Best,
Fabian

2016-12-01 4:16 GMT+01:00 Manu Zhang <[hidden email]>:
Hi all,

It seems tasks of different Flink applications can end up in the same JVM (TaskManager) in standalone mode. Isn't this fragile since errors in one application could crash another ? I checked FLIP-6 but didn't found any mention of changing it in the future. 

Any thoughts or have I missed anything ?

Thanks,
Manu Zhang


Reply | Threaded
Open this post in threaded view
|

Re: separation of JVMs for different applications

Till Rohrmann
The pro for the multi-tenant cluster mode is that you can share data between jobs and you don't have to spin up a new cluster for each job. This might be helpful for scenarios where you want to run many short-lived and light-weight jobs. 

But the important part is that you don't have to use this method. You can also start a new Flink cluster per job which will then execute the job isolated from any other jobs (given that you don't submit other jobs to this cluster).

Cheers,
Till

On Sat, Dec 3, 2016 at 2:50 PM, Manu Zhang <[hidden email]> wrote:
Thanks Fabian and Till.

We have customers who are interested in using Flink but very concerned about that "multiple jobs share the same set of TMs". I've just joined the community recently so I'm not sure whether there has been a discussion over the "multi-tenant cluster mode" before. 

The cons are one job/user's failure may crash another, which is unacceptable in a multi-tenant scenario.
What are the pros ? Do the pros overweigh the cons ?

Manu

On Fri, Dec 2, 2016 at 7:06 PM Till Rohrmann <[hidden email]> wrote:
Hi Manu,

with Flip-6 we will be able to support stricter application isolation by starting for each job a dedicated JobManager which will execute its tasks on TM reserved solely for this job. But at the same time we will continue supporting the multi-tenant cluster mode where tasks belonging to multiple jobs share the same set of TMs and, thus, might share information between them.

Cheers,
Till

On Fri, Dec 2, 2016 at 11:19 AM, Fabian Hueske <[hidden email]> wrote:
Hi Manu,

As far as I know, there are not plans to change the stand-alone deployment.
FLIP-6 is focusing on deployments via resource providers (YARN, Mesos, etc.) which allow to start Flink processes per job.

Till (in CC) is more familiar with the FLIP-6 effort and might be able to add more detail.

Best,
Fabian

2016-12-01 4:16 GMT+01:00 Manu Zhang <[hidden email]>:
Hi all,

It seems tasks of different Flink applications can end up in the same JVM (TaskManager) in standalone mode. Isn't this fragile since errors in one application could crash another ? I checked FLIP-6 but didn't found any mention of changing it in the future. 

Any thoughts or have I missed anything ?

Thanks,
Manu Zhang



Reply | Threaded
Open this post in threaded view
|

Re: separation of JVMs for different applications

Manu Zhang
The pro for the multi-tenant cluster mode is that you can share data between jobs and you don't have to spin up a new cluster for each job.

I don't think we have to spin up a new cluster for each job if every job gets its own JVMs. For examples, Storm will launch a new worker(JVM) for a new job when free slots are available. How can we share data between jobs and why ? 



On Mon, Dec 5, 2016 at 6:27 PM, Till Rohrmann <[hidden email]> wrote:
The pro for the multi-tenant cluster mode is that you can share data between jobs and you don't have to spin up a new cluster for each job. This might be helpful for scenarios where you want to run many short-lived and light-weight jobs. 

But the important part is that you don't have to use this method. You can also start a new Flink cluster per job which will then execute the job isolated from any other jobs (given that you don't submit other jobs to this cluster).

Cheers,
Till

On Sat, Dec 3, 2016 at 2:50 PM, Manu Zhang <[hidden email]> wrote:
Thanks Fabian and Till.

We have customers who are interested in using Flink but very concerned about that "multiple jobs share the same set of TMs". I've just joined the community recently so I'm not sure whether there has been a discussion over the "multi-tenant cluster mode" before. 

The cons are one job/user's failure may crash another, which is unacceptable in a multi-tenant scenario.
What are the pros ? Do the pros overweigh the cons ?

Manu

On Fri, Dec 2, 2016 at 7:06 PM Till Rohrmann <[hidden email]> wrote:
Hi Manu,

with Flip-6 we will be able to support stricter application isolation by starting for each job a dedicated JobManager which will execute its tasks on TM reserved solely for this job. But at the same time we will continue supporting the multi-tenant cluster mode where tasks belonging to multiple jobs share the same set of TMs and, thus, might share information between them.

Cheers,
Till

On Fri, Dec 2, 2016 at 11:19 AM, Fabian Hueske <[hidden email]> wrote:
Hi Manu,

As far as I know, there are not plans to change the stand-alone deployment.
FLIP-6 is focusing on deployments via resource providers (YARN, Mesos, etc.) which allow to start Flink processes per job.

Till (in CC) is more familiar with the FLIP-6 effort and might be able to add more detail.

Best,
Fabian

2016-12-01 4:16 GMT+01:00 Manu Zhang <[hidden email]>:
Hi all,

It seems tasks of different Flink applications can end up in the same JVM (TaskManager) in standalone mode. Isn't this fragile since errors in one application could crash another ? I checked FLIP-6 but didn't found any mention of changing it in the future. 

Any thoughts or have I missed anything ?

Thanks,
Manu Zhang




Reply | Threaded
Open this post in threaded view
|

Re: separation of JVMs for different applications

Stephan Ewen
Hi!

Are your customers using YARN? In that case, the default configuration will start a new YARN application per Flink job, no JVMs are shared between jobs. By default, even each slot has its own JVM.

Greetings,
Stephan

PS: I think the "spawning new JVMs" is what Till referred to when saying "spinning up a new cluster". Keep in mind that Flink is also a batch processor, and it handles sequences of short batch jobs (as issued for example by interactive shells) and it pre-allocates and manages a lot of memory for batch jobs.



On Mon, Dec 5, 2016 at 3:48 PM, Manu Zhang <[hidden email]> wrote:
The pro for the multi-tenant cluster mode is that you can share data between jobs and you don't have to spin up a new cluster for each job.

I don't think we have to spin up a new cluster for each job if every job gets its own JVMs. For examples, Storm will launch a new worker(JVM) for a new job when free slots are available. How can we share data between jobs and why ? 



On Mon, Dec 5, 2016 at 6:27 PM, Till Rohrmann <[hidden email]> wrote:
The pro for the multi-tenant cluster mode is that you can share data between jobs and you don't have to spin up a new cluster for each job. This might be helpful for scenarios where you want to run many short-lived and light-weight jobs. 

But the important part is that you don't have to use this method. You can also start a new Flink cluster per job which will then execute the job isolated from any other jobs (given that you don't submit other jobs to this cluster).

Cheers,
Till

On Sat, Dec 3, 2016 at 2:50 PM, Manu Zhang <[hidden email]> wrote:
Thanks Fabian and Till.

We have customers who are interested in using Flink but very concerned about that "multiple jobs share the same set of TMs". I've just joined the community recently so I'm not sure whether there has been a discussion over the "multi-tenant cluster mode" before. 

The cons are one job/user's failure may crash another, which is unacceptable in a multi-tenant scenario.
What are the pros ? Do the pros overweigh the cons ?

Manu

On Fri, Dec 2, 2016 at 7:06 PM Till Rohrmann <[hidden email]> wrote:
Hi Manu,

with Flip-6 we will be able to support stricter application isolation by starting for each job a dedicated JobManager which will execute its tasks on TM reserved solely for this job. But at the same time we will continue supporting the multi-tenant cluster mode where tasks belonging to multiple jobs share the same set of TMs and, thus, might share information between them.

Cheers,
Till

On Fri, Dec 2, 2016 at 11:19 AM, Fabian Hueske <[hidden email]> wrote:
Hi Manu,

As far as I know, there are not plans to change the stand-alone deployment.
FLIP-6 is focusing on deployments via resource providers (YARN, Mesos, etc.) which allow to start Flink processes per job.

Till (in CC) is more familiar with the FLIP-6 effort and might be able to add more detail.

Best,
Fabian

2016-12-01 4:16 GMT+01:00 Manu Zhang <[hidden email]>:
Hi all,

It seems tasks of different Flink applications can end up in the same JVM (TaskManager) in standalone mode. Isn't this fragile since errors in one application could crash another ? I checked FLIP-6 but didn't found any mention of changing it in the future. 

Any thoughts or have I missed anything ?

Thanks,
Manu Zhang





Reply | Threaded
Open this post in threaded view
|

Re: separation of JVMs for different applications

Manu Zhang
Thanks Stephan,

They don't use YARN now but I think they will consider it.  Do you think it would be beneficial to provide such an option as "separate-jvm" in stand-alone mode for streaming processor and long running services ? Or do you think it would introduce too much complexity ?

Manu

On Tue, Dec 6, 2016 at 1:04 AM Stephan Ewen <[hidden email]> wrote:
Hi!

Are your customers using YARN? In that case, the default configuration will start a new YARN application per Flink job, no JVMs are shared between jobs. By default, even each slot has its own JVM.

Greetings,
Stephan

PS: I think the "spawning new JVMs" is what Till referred to when saying "spinning up a new cluster". Keep in mind that Flink is also a batch processor, and it handles sequences of short batch jobs (as issued for example by interactive shells) and it pre-allocates and manages a lot of memory for batch jobs.



On Mon, Dec 5, 2016 at 3:48 PM, Manu Zhang <[hidden email]> wrote:
The pro for the multi-tenant cluster mode is that you can share data between jobs and you don't have to spin up a new cluster for each job.

I don't think we have to spin up a new cluster for each job if every job gets its own JVMs. For examples, Storm will launch a new worker(JVM) for a new job when free slots are available. How can we share data between jobs and why ? 



On Mon, Dec 5, 2016 at 6:27 PM, Till Rohrmann <[hidden email]> wrote:
The pro for the multi-tenant cluster mode is that you can share data between jobs and you don't have to spin up a new cluster for each job. This might be helpful for scenarios where you want to run many short-lived and light-weight jobs. 

But the important part is that you don't have to use this method. You can also start a new Flink cluster per job which will then execute the job isolated from any other jobs (given that you don't submit other jobs to this cluster).

Cheers,
Till

On Sat, Dec 3, 2016 at 2:50 PM, Manu Zhang <[hidden email]> wrote:
Thanks Fabian and Till.

We have customers who are interested in using Flink but very concerned about that "multiple jobs share the same set of TMs". I've just joined the community recently so I'm not sure whether there has been a discussion over the "multi-tenant cluster mode" before. 

The cons are one job/user's failure may crash another, which is unacceptable in a multi-tenant scenario.
What are the pros ? Do the pros overweigh the cons ?

Manu

On Fri, Dec 2, 2016 at 7:06 PM Till Rohrmann <[hidden email]> wrote:
Hi Manu,

with Flip-6 we will be able to support stricter application isolation by starting for each job a dedicated JobManager which will execute its tasks on TM reserved solely for this job. But at the same time we will continue supporting the multi-tenant cluster mode where tasks belonging to multiple jobs share the same set of TMs and, thus, might share information between them.

Cheers,
Till

On Fri, Dec 2, 2016 at 11:19 AM, Fabian Hueske <[hidden email]> wrote:
Hi Manu,

As far as I know, there are not plans to change the stand-alone deployment.
FLIP-6 is focusing on deployments via resource providers (YARN, Mesos, etc.) which allow to start Flink processes per job.

Till (in CC) is more familiar with the FLIP-6 effort and might be able to add more detail.

Best,
Fabian

2016-12-01 4:16 GMT+01:00 Manu Zhang <[hidden email]>:
Hi all,

It seems tasks of different Flink applications can end up in the same JVM (TaskManager) in standalone mode. Isn't this fragile since errors in one application could crash another ? I checked FLIP-6 but didn't found any mention of changing it in the future. 

Any thoughts or have I missed anything ?

Thanks,
Manu Zhang





Reply | Threaded
Open this post in threaded view
|

Re: separation of JVMs for different applications

Stephan Ewen
Hi!

We are currently changing the resource and process model quite a bit: https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=65147077
As part of that, I think it makes sense to introduce something like that.

What you can do today is to set TaskManagers to use one slot only, and then start multiple TaskManagers per machine. That makes sure that JVMs are never shared across machines.
If you use the "start-cluster.sh" script from Flink, you can enter the same hostname multiple times in the workers file, and it will start multiple TaskManagers on a machine.

Best,
Stephan



On Tue, Dec 6, 2016 at 3:51 AM, Manu Zhang <[hidden email]> wrote:
Thanks Stephan,

They don't use YARN now but I think they will consider it.  Do you think it would be beneficial to provide such an option as "separate-jvm" in stand-alone mode for streaming processor and long running services ? Or do you think it would introduce too much complexity ?

Manu

On Tue, Dec 6, 2016 at 1:04 AM Stephan Ewen <[hidden email]> wrote:
Hi!

Are your customers using YARN? In that case, the default configuration will start a new YARN application per Flink job, no JVMs are shared between jobs. By default, even each slot has its own JVM.

Greetings,
Stephan

PS: I think the "spawning new JVMs" is what Till referred to when saying "spinning up a new cluster". Keep in mind that Flink is also a batch processor, and it handles sequences of short batch jobs (as issued for example by interactive shells) and it pre-allocates and manages a lot of memory for batch jobs.



On Mon, Dec 5, 2016 at 3:48 PM, Manu Zhang <[hidden email]> wrote:
The pro for the multi-tenant cluster mode is that you can share data between jobs and you don't have to spin up a new cluster for each job.

I don't think we have to spin up a new cluster for each job if every job gets its own JVMs. For examples, Storm will launch a new worker(JVM) for a new job when free slots are available. How can we share data between jobs and why ? 



On Mon, Dec 5, 2016 at 6:27 PM, Till Rohrmann <[hidden email]> wrote:
The pro for the multi-tenant cluster mode is that you can share data between jobs and you don't have to spin up a new cluster for each job. This might be helpful for scenarios where you want to run many short-lived and light-weight jobs. 

But the important part is that you don't have to use this method. You can also start a new Flink cluster per job which will then execute the job isolated from any other jobs (given that you don't submit other jobs to this cluster).

Cheers,
Till

On Sat, Dec 3, 2016 at 2:50 PM, Manu Zhang <[hidden email]> wrote:
Thanks Fabian and Till.

We have customers who are interested in using Flink but very concerned about that "multiple jobs share the same set of TMs". I've just joined the community recently so I'm not sure whether there has been a discussion over the "multi-tenant cluster mode" before. 

The cons are one job/user's failure may crash another, which is unacceptable in a multi-tenant scenario.
What are the pros ? Do the pros overweigh the cons ?

Manu

On Fri, Dec 2, 2016 at 7:06 PM Till Rohrmann <[hidden email]> wrote:
Hi Manu,

with Flip-6 we will be able to support stricter application isolation by starting for each job a dedicated JobManager which will execute its tasks on TM reserved solely for this job. But at the same time we will continue supporting the multi-tenant cluster mode where tasks belonging to multiple jobs share the same set of TMs and, thus, might share information between them.

Cheers,
Till

On Fri, Dec 2, 2016 at 11:19 AM, Fabian Hueske <[hidden email]> wrote:
Hi Manu,

As far as I know, there are not plans to change the stand-alone deployment.
FLIP-6 is focusing on deployments via resource providers (YARN, Mesos, etc.) which allow to start Flink processes per job.

Till (in CC) is more familiar with the FLIP-6 effort and might be able to add more detail.

Best,
Fabian

2016-12-01 4:16 GMT+01:00 Manu Zhang <[hidden email]>:
Hi all,

It seems tasks of different Flink applications can end up in the same JVM (TaskManager) in standalone mode. Isn't this fragile since errors in one application could crash another ? I checked FLIP-6 but didn't found any mention of changing it in the future. 

Any thoughts or have I missed anything ?

Thanks,
Manu Zhang






Reply | Threaded
Open this post in threaded view
|

Re: separation of JVMs for different applications

Manu Zhang
Good to know that. 

Is it the "standalone setup v2.0" section ? The wiki page has no Google-Doc-like change histories.  
Any jiras opened for that ? Not sure that will be noticed given FLIP-6 is almost finished.

Thanks,
Manu

On Tue, Dec 6, 2016 at 11:55 PM Stephan Ewen <[hidden email]> wrote:
Hi!

We are currently changing the resource and process model quite a bit: https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=65147077
As part of that, I think it makes sense to introduce something like that.

What you can do today is to set TaskManagers to use one slot only, and then start multiple TaskManagers per machine. That makes sure that JVMs are never shared across machines.
If you use the "start-cluster.sh" script from Flink, you can enter the same hostname multiple times in the workers file, and it will start multiple TaskManagers on a machine.

Best,
Stephan



On Tue, Dec 6, 2016 at 3:51 AM, Manu Zhang <[hidden email]> wrote:
Thanks Stephan,

They don't use YARN now but I think they will consider it.  Do you think it would be beneficial to provide such an option as "separate-jvm" in stand-alone mode for streaming processor and long running services ? Or do you think it would introduce too much complexity ?

Manu

On Tue, Dec 6, 2016 at 1:04 AM Stephan Ewen <[hidden email]> wrote:
Hi!

Are your customers using YARN? In that case, the default configuration will start a new YARN application per Flink job, no JVMs are shared between jobs. By default, even each slot has its own JVM.

Greetings,
Stephan

PS: I think the "spawning new JVMs" is what Till referred to when saying "spinning up a new cluster". Keep in mind that Flink is also a batch processor, and it handles sequences of short batch jobs (as issued for example by interactive shells) and it pre-allocates and manages a lot of memory for batch jobs.



On Mon, Dec 5, 2016 at 3:48 PM, Manu Zhang <[hidden email]> wrote:
The pro for the multi-tenant cluster mode is that you can share data between jobs and you don't have to spin up a new cluster for each job.

I don't think we have to spin up a new cluster for each job if every job gets its own JVMs. For examples, Storm will launch a new worker(JVM) for a new job when free slots are available. How can we share data between jobs and why ? 



On Mon, Dec 5, 2016 at 6:27 PM, Till Rohrmann <[hidden email]> wrote:
The pro for the multi-tenant cluster mode is that you can share data between jobs and you don't have to spin up a new cluster for each job. This might be helpful for scenarios where you want to run many short-lived and light-weight jobs. 

But the important part is that you don't have to use this method. You can also start a new Flink cluster per job which will then execute the job isolated from any other jobs (given that you don't submit other jobs to this cluster).

Cheers,
Till

On Sat, Dec 3, 2016 at 2:50 PM, Manu Zhang <[hidden email]> wrote:
Thanks Fabian and Till.

We have customers who are interested in using Flink but very concerned about that "multiple jobs share the same set of TMs". I've just joined the community recently so I'm not sure whether there has been a discussion over the "multi-tenant cluster mode" before. 

The cons are one job/user's failure may crash another, which is unacceptable in a multi-tenant scenario.
What are the pros ? Do the pros overweigh the cons ?

Manu

On Fri, Dec 2, 2016 at 7:06 PM Till Rohrmann <[hidden email]> wrote:
Hi Manu,

with Flip-6 we will be able to support stricter application isolation by starting for each job a dedicated JobManager which will execute its tasks on TM reserved solely for this job. But at the same time we will continue supporting the multi-tenant cluster mode where tasks belonging to multiple jobs share the same set of TMs and, thus, might share information between them.

Cheers,
Till

On Fri, Dec 2, 2016 at 11:19 AM, Fabian Hueske <[hidden email]> wrote:
Hi Manu,

As far as I know, there are not plans to change the stand-alone deployment.
FLIP-6 is focusing on deployments via resource providers (YARN, Mesos, etc.) which allow to start Flink processes per job.

Till (in CC) is more familiar with the FLIP-6 effort and might be able to add more detail.

Best,
Fabian

2016-12-01 4:16 GMT+01:00 Manu Zhang <[hidden email]>:
Hi all,

It seems tasks of different Flink applications can end up in the same JVM (TaskManager) in standalone mode. Isn't this fragile since errors in one application could crash another ? I checked FLIP-6 but didn't found any mention of changing it in the future. 

Any thoughts or have I missed anything ?

Thanks,
Manu Zhang






Reply | Threaded
Open this post in threaded view
|

Re: separation of JVMs for different applications

Manu Zhang
If there are not any existing jira for standalone v2.0, may I open a new one ?

Thanks,
Manu

On Wed, Dec 7, 2016 at 12:39 PM Manu Zhang <[hidden email]> wrote:
Good to know that. 

Is it the "standalone setup v2.0" section ? The wiki page has no Google-Doc-like change histories.  
Any jiras opened for that ? Not sure that will be noticed given FLIP-6 is almost finished.

Thanks,
Manu

On Tue, Dec 6, 2016 at 11:55 PM Stephan Ewen <[hidden email]> wrote:
Hi!

We are currently changing the resource and process model quite a bit: https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=65147077
As part of that, I think it makes sense to introduce something like that.

What you can do today is to set TaskManagers to use one slot only, and then start multiple TaskManagers per machine. That makes sure that JVMs are never shared across machines.
If you use the "start-cluster.sh" script from Flink, you can enter the same hostname multiple times in the workers file, and it will start multiple TaskManagers on a machine.

Best,
Stephan



On Tue, Dec 6, 2016 at 3:51 AM, Manu Zhang <[hidden email]> wrote:
Thanks Stephan,

They don't use YARN now but I think they will consider it.  Do you think it would be beneficial to provide such an option as "separate-jvm" in stand-alone mode for streaming processor and long running services ? Or do you think it would introduce too much complexity ?

Manu

On Tue, Dec 6, 2016 at 1:04 AM Stephan Ewen <[hidden email]> wrote:
Hi!

Are your customers using YARN? In that case, the default configuration will start a new YARN application per Flink job, no JVMs are shared between jobs. By default, even each slot has its own JVM.

Greetings,
Stephan

PS: I think the "spawning new JVMs" is what Till referred to when saying "spinning up a new cluster". Keep in mind that Flink is also a batch processor, and it handles sequences of short batch jobs (as issued for example by interactive shells) and it pre-allocates and manages a lot of memory for batch jobs.



On Mon, Dec 5, 2016 at 3:48 PM, Manu Zhang <[hidden email]> wrote:
The pro for the multi-tenant cluster mode is that you can share data between jobs and you don't have to spin up a new cluster for each job.

I don't think we have to spin up a new cluster for each job if every job gets its own JVMs. For examples, Storm will launch a new worker(JVM) for a new job when free slots are available. How can we share data between jobs and why ? 



On Mon, Dec 5, 2016 at 6:27 PM, Till Rohrmann <[hidden email]> wrote:
The pro for the multi-tenant cluster mode is that you can share data between jobs and you don't have to spin up a new cluster for each job. This might be helpful for scenarios where you want to run many short-lived and light-weight jobs. 

But the important part is that you don't have to use this method. You can also start a new Flink cluster per job which will then execute the job isolated from any other jobs (given that you don't submit other jobs to this cluster).

Cheers,
Till

On Sat, Dec 3, 2016 at 2:50 PM, Manu Zhang <[hidden email]> wrote:
Thanks Fabian and Till.

We have customers who are interested in using Flink but very concerned about that "multiple jobs share the same set of TMs". I've just joined the community recently so I'm not sure whether there has been a discussion over the "multi-tenant cluster mode" before. 

The cons are one job/user's failure may crash another, which is unacceptable in a multi-tenant scenario.
What are the pros ? Do the pros overweigh the cons ?

Manu

On Fri, Dec 2, 2016 at 7:06 PM Till Rohrmann <[hidden email]> wrote:
Hi Manu,

with Flip-6 we will be able to support stricter application isolation by starting for each job a dedicated JobManager which will execute its tasks on TM reserved solely for this job. But at the same time we will continue supporting the multi-tenant cluster mode where tasks belonging to multiple jobs share the same set of TMs and, thus, might share information between them.

Cheers,
Till

On Fri, Dec 2, 2016 at 11:19 AM, Fabian Hueske <[hidden email]> wrote:
Hi Manu,

As far as I know, there are not plans to change the stand-alone deployment.
FLIP-6 is focusing on deployments via resource providers (YARN, Mesos, etc.) which allow to start Flink processes per job.

Till (in CC) is more familiar with the FLIP-6 effort and might be able to add more detail.

Best,
Fabian

2016-12-01 4:16 GMT+01:00 Manu Zhang <[hidden email]>:
Hi all,

It seems tasks of different Flink applications can end up in the same JVM (TaskManager) in standalone mode. Isn't this fragile since errors in one application could crash another ? I checked FLIP-6 but didn't found any mention of changing it in the future. 

Any thoughts or have I missed anything ?

Thanks,
Manu Zhang






Reply | Threaded
Open this post in threaded view
|

Re: separation of JVMs for different applications

Till Rohrmann
Hi Manu,

afaik there is no JIRA for standalone v2.0 yet. So feel free to open an JIRA for it.

Just a small correction, FLIP-6 is not almost finished yet. But we're working on it and are happy for every helping hand :-)

Cheers,
Till

On Fri, Dec 9, 2016 at 2:27 AM, Manu Zhang <[hidden email]> wrote:
If there are not any existing jira for standalone v2.0, may I open a new one ?

Thanks,
Manu

On Wed, Dec 7, 2016 at 12:39 PM Manu Zhang <[hidden email]> wrote:
Good to know that. 

Is it the "standalone setup v2.0" section ? The wiki page has no Google-Doc-like change histories.  
Any jiras opened for that ? Not sure that will be noticed given FLIP-6 is almost finished.

Thanks,
Manu

On Tue, Dec 6, 2016 at 11:55 PM Stephan Ewen <[hidden email]> wrote:
Hi!

We are currently changing the resource and process model quite a bit: https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=65147077
As part of that, I think it makes sense to introduce something like that.

What you can do today is to set TaskManagers to use one slot only, and then start multiple TaskManagers per machine. That makes sure that JVMs are never shared across machines.
If you use the "start-cluster.sh" script from Flink, you can enter the same hostname multiple times in the workers file, and it will start multiple TaskManagers on a machine.

Best,
Stephan



On Tue, Dec 6, 2016 at 3:51 AM, Manu Zhang <[hidden email]> wrote:
Thanks Stephan,

They don't use YARN now but I think they will consider it.  Do you think it would be beneficial to provide such an option as "separate-jvm" in stand-alone mode for streaming processor and long running services ? Or do you think it would introduce too much complexity ?

Manu

On Tue, Dec 6, 2016 at 1:04 AM Stephan Ewen <[hidden email]> wrote:
Hi!

Are your customers using YARN? In that case, the default configuration will start a new YARN application per Flink job, no JVMs are shared between jobs. By default, even each slot has its own JVM.

Greetings,
Stephan

PS: I think the "spawning new JVMs" is what Till referred to when saying "spinning up a new cluster". Keep in mind that Flink is also a batch processor, and it handles sequences of short batch jobs (as issued for example by interactive shells) and it pre-allocates and manages a lot of memory for batch jobs.



On Mon, Dec 5, 2016 at 3:48 PM, Manu Zhang <[hidden email]> wrote:
The pro for the multi-tenant cluster mode is that you can share data between jobs and you don't have to spin up a new cluster for each job.

I don't think we have to spin up a new cluster for each job if every job gets its own JVMs. For examples, Storm will launch a new worker(JVM) for a new job when free slots are available. How can we share data between jobs and why ? 



On Mon, Dec 5, 2016 at 6:27 PM, Till Rohrmann <[hidden email]> wrote:
The pro for the multi-tenant cluster mode is that you can share data between jobs and you don't have to spin up a new cluster for each job. This might be helpful for scenarios where you want to run many short-lived and light-weight jobs. 

But the important part is that you don't have to use this method. You can also start a new Flink cluster per job which will then execute the job isolated from any other jobs (given that you don't submit other jobs to this cluster).

Cheers,
Till

On Sat, Dec 3, 2016 at 2:50 PM, Manu Zhang <[hidden email]> wrote:
Thanks Fabian and Till.

We have customers who are interested in using Flink but very concerned about that "multiple jobs share the same set of TMs". I've just joined the community recently so I'm not sure whether there has been a discussion over the "multi-tenant cluster mode" before. 

The cons are one job/user's failure may crash another, which is unacceptable in a multi-tenant scenario.
What are the pros ? Do the pros overweigh the cons ?

Manu

On Fri, Dec 2, 2016 at 7:06 PM Till Rohrmann <[hidden email]> wrote:
Hi Manu,

with Flip-6 we will be able to support stricter application isolation by starting for each job a dedicated JobManager which will execute its tasks on TM reserved solely for this job. But at the same time we will continue supporting the multi-tenant cluster mode where tasks belonging to multiple jobs share the same set of TMs and, thus, might share information between them.

Cheers,
Till

On Fri, Dec 2, 2016 at 11:19 AM, Fabian Hueske <[hidden email]> wrote:
Hi Manu,

As far as I know, there are not plans to change the stand-alone deployment.
FLIP-6 is focusing on deployments via resource providers (YARN, Mesos, etc.) which allow to start Flink processes per job.

Till (in CC) is more familiar with the FLIP-6 effort and might be able to add more detail.

Best,
Fabian

2016-12-01 4:16 GMT+01:00 Manu Zhang <[hidden email]>:
Hi all,

It seems tasks of different Flink applications can end up in the same JVM (TaskManager) in standalone mode. Isn't this fragile since errors in one application could crash another ? I checked FLIP-6 but didn't found any mention of changing it in the future. 

Any thoughts or have I missed anything ?

Thanks,
Manu Zhang







Reply | Threaded
Open this post in threaded view
|

Re: separation of JVMs for different applications

Manu Zhang

On Fri, Dec 9, 2016 at 7:17 PM Till Rohrmann <[hidden email]> wrote:
Hi Manu,

afaik there is no JIRA for standalone v2.0 yet. So feel free to open an JIRA for it.

Just a small correction, FLIP-6 is not almost finished yet. But we're working on it and are happy for every helping hand :-)

Cheers,
Till

On Fri, Dec 9, 2016 at 2:27 AM, Manu Zhang <[hidden email]> wrote:
If there are not any existing jira for standalone v2.0, may I open a new one ?

Thanks,
Manu

On Wed, Dec 7, 2016 at 12:39 PM Manu Zhang <[hidden email]> wrote:
Good to know that. 

Is it the "standalone setup v2.0" section ? The wiki page has no Google-Doc-like change histories.  
Any jiras opened for that ? Not sure that will be noticed given FLIP-6 is almost finished.

Thanks,
Manu

On Tue, Dec 6, 2016 at 11:55 PM Stephan Ewen <[hidden email]> wrote:
Hi!

We are currently changing the resource and process model quite a bit: https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=65147077
As part of that, I think it makes sense to introduce something like that.

What you can do today is to set TaskManagers to use one slot only, and then start multiple TaskManagers per machine. That makes sure that JVMs are never shared across machines.
If you use the "start-cluster.sh" script from Flink, you can enter the same hostname multiple times in the workers file, and it will start multiple TaskManagers on a machine.

Best,
Stephan



On Tue, Dec 6, 2016 at 3:51 AM, Manu Zhang <[hidden email]> wrote:
Thanks Stephan,

They don't use YARN now but I think they will consider it.  Do you think it would be beneficial to provide such an option as "separate-jvm" in stand-alone mode for streaming processor and long running services ? Or do you think it would introduce too much complexity ?

Manu

On Tue, Dec 6, 2016 at 1:04 AM Stephan Ewen <[hidden email]> wrote:
Hi!

Are your customers using YARN? In that case, the default configuration will start a new YARN application per Flink job, no JVMs are shared between jobs. By default, even each slot has its own JVM.

Greetings,
Stephan

PS: I think the "spawning new JVMs" is what Till referred to when saying "spinning up a new cluster". Keep in mind that Flink is also a batch processor, and it handles sequences of short batch jobs (as issued for example by interactive shells) and it pre-allocates and manages a lot of memory for batch jobs.



On Mon, Dec 5, 2016 at 3:48 PM, Manu Zhang <[hidden email]> wrote:
The pro for the multi-tenant cluster mode is that you can share data between jobs and you don't have to spin up a new cluster for each job.

I don't think we have to spin up a new cluster for each job if every job gets its own JVMs. For examples, Storm will launch a new worker(JVM) for a new job when free slots are available. How can we share data between jobs and why ? 



On Mon, Dec 5, 2016 at 6:27 PM, Till Rohrmann <[hidden email]> wrote:
The pro for the multi-tenant cluster mode is that you can share data between jobs and you don't have to spin up a new cluster for each job. This might be helpful for scenarios where you want to run many short-lived and light-weight jobs. 

But the important part is that you don't have to use this method. You can also start a new Flink cluster per job which will then execute the job isolated from any other jobs (given that you don't submit other jobs to this cluster).

Cheers,
Till

On Sat, Dec 3, 2016 at 2:50 PM, Manu Zhang <[hidden email]> wrote:
Thanks Fabian and Till.

We have customers who are interested in using Flink but very concerned about that "multiple jobs share the same set of TMs". I've just joined the community recently so I'm not sure whether there has been a discussion over the "multi-tenant cluster mode" before. 

The cons are one job/user's failure may crash another, which is unacceptable in a multi-tenant scenario.
What are the pros ? Do the pros overweigh the cons ?

Manu

On Fri, Dec 2, 2016 at 7:06 PM Till Rohrmann <[hidden email]> wrote:
Hi Manu,

with Flip-6 we will be able to support stricter application isolation by starting for each job a dedicated JobManager which will execute its tasks on TM reserved solely for this job. But at the same time we will continue supporting the multi-tenant cluster mode where tasks belonging to multiple jobs share the same set of TMs and, thus, might share information between them.

Cheers,
Till

On Fri, Dec 2, 2016 at 11:19 AM, Fabian Hueske <[hidden email]> wrote:
Hi Manu,

As far as I know, there are not plans to change the stand-alone deployment.
FLIP-6 is focusing on deployments via resource providers (YARN, Mesos, etc.) which allow to start Flink processes per job.

Till (in CC) is more familiar with the FLIP-6 effort and might be able to add more detail.

Best,
Fabian

2016-12-01 4:16 GMT+01:00 Manu Zhang <[hidden email]>:
Hi all,

It seems tasks of different Flink applications can end up in the same JVM (TaskManager) in standalone mode. Isn't this fragile since errors in one application could crash another ? I checked FLIP-6 but didn't found any mention of changing it in the future. 

Any thoughts or have I missed anything ?

Thanks,
Manu Zhang