Taskmanager memory

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

Taskmanager memory

Kruse, Sebastian

Hi everyone,


I am currently looking into how Flink can coexist and interoperate with other frameworks in a cluster, such as plain single-machine processes or Spark​. ​Tachyon seems to be nice solution to exchange data between them. 


However, I think it is a problem that Flink's taskmanagers allocate their managed memory upfront - in contrast to Spark, as far as I know. If I want ​a taskmanager to yield its main memory, so that another process can use that memory, is there any other option besides shutting that taskmanager down? Would it be beneficial to use YARN?

Thanks for your help!


Cheers,

Sebastian

Reply | Threaded
Open this post in threaded view
|

Re: Taskmanager memory

Fabian Hueske-2
Hi Sebastian,

There is no way to return memory from a Flink process except shutting the process down.
I think YARN could help in your setup. In a YARN setup, you can flexibly start and stop Flink sessions with different configurations (memory, TMs, slots) or run a single job. When running a single job, Flink will allocate resources and free them after the job is done.

Best, Fabian

2015-12-09 9:46 GMT+01:00 Kruse, Sebastian <[hidden email]>:

Hi everyone,


I am currently looking into how Flink can coexist and interoperate with other frameworks in a cluster, such as plain single-machine processes or Spark​. ​Tachyon seems to be nice solution to exchange data between them. 


However, I think it is a problem that Flink's taskmanagers allocate their managed memory upfront - in contrast to Spark, as far as I know. If I want ​a taskmanager to yield its main memory, so that another process can use that memory, is there any other option besides shutting that taskmanager down? Would it be beneficial to use YARN?

Thanks for your help!


Cheers,

Sebastian


Reply | Threaded
Open this post in threaded view
|

Re: Taskmanager memory

Stephan Ewen
@Sebastian: Getting memory away from the JVM is tricky always, completely independent of pre-allocation of managed memory or lazy allocation.

But here is something that may work:
  - Start Flink in streaming mode - that will make it allocate managed memory lazily
  - Set the memory to offheap memory. That way the JVM heap is small. The off-heap memory is returned when no longer used deallocated - this releases memory much better than JVM shrinking the heap.



On Wed, Dec 9, 2015 at 10:06 AM, Fabian Hueske <[hidden email]> wrote:
Hi Sebastian,

There is no way to return memory from a Flink process except shutting the process down.
I think YARN could help in your setup. In a YARN setup, you can flexibly start and stop Flink sessions with different configurations (memory, TMs, slots) or run a single job. When running a single job, Flink will allocate resources and free them after the job is done.

Best, Fabian

2015-12-09 9:46 GMT+01:00 Kruse, Sebastian <[hidden email]>:

Hi everyone,


I am currently looking into how Flink can coexist and interoperate with other frameworks in a cluster, such as plain single-machine processes or Spark​. ​Tachyon seems to be nice solution to exchange data between them. 


However, I think it is a problem that Flink's taskmanagers allocate their managed memory upfront - in contrast to Spark, as far as I know. If I want ​a taskmanager to yield its main memory, so that another process can use that memory, is there any other option besides shutting that taskmanager down? Would it be beneficial to use YARN?

Thanks for your help!


Cheers,

Sebastian



Reply | Threaded
Open this post in threaded view
|

Re: Taskmanager memory

Fabian Hueske-2
Streaming mode with on-heap memory won't help because the JVM allocates all memory but doesn't convert it to managed memory internally, right?

Is offheap memory actually freed after it has been allocated as managed memory? Does this happen after a job finishes?

2015-12-09 10:44 GMT+01:00 Stephan Ewen <[hidden email]>:
@Sebastian: Getting memory away from the JVM is tricky always, completely independent of pre-allocation of managed memory or lazy allocation.

But here is something that may work:
  - Start Flink in streaming mode - that will make it allocate managed memory lazily
  - Set the memory to offheap memory. That way the JVM heap is small. The off-heap memory is returned when no longer used deallocated - this releases memory much better than JVM shrinking the heap.



On Wed, Dec 9, 2015 at 10:06 AM, Fabian Hueske <[hidden email]> wrote:
Hi Sebastian,

There is no way to return memory from a Flink process except shutting the process down.
I think YARN could help in your setup. In a YARN setup, you can flexibly start and stop Flink sessions with different configurations (memory, TMs, slots) or run a single job. When running a single job, Flink will allocate resources and free them after the job is done.

Best, Fabian

2015-12-09 9:46 GMT+01:00 Kruse, Sebastian <[hidden email]>:

Hi everyone,


I am currently looking into how Flink can coexist and interoperate with other frameworks in a cluster, such as plain single-machine processes or Spark​. ​Tachyon seems to be nice solution to exchange data between them. 


However, I think it is a problem that Flink's taskmanagers allocate their managed memory upfront - in contrast to Spark, as far as I know. If I want ​a taskmanager to yield its main memory, so that another process can use that memory, is there any other option besides shutting that taskmanager down? Would it be beneficial to use YARN?

Thanks for your help!


Cheers,

Sebastian




Reply | Threaded
Open this post in threaded view
|

Re: Taskmanager memory

Stephan Ewen
Off heap memory is freed when the memory consuming operators release the memory.

The Java process releases that memory then on the next GC, as far as I know.

On Wed, Dec 9, 2015 at 11:01 AM, Fabian Hueske <[hidden email]> wrote:
Streaming mode with on-heap memory won't help because the JVM allocates all memory but doesn't convert it to managed memory internally, right?

Is offheap memory actually freed after it has been allocated as managed memory? Does this happen after a job finishes?

2015-12-09 10:44 GMT+01:00 Stephan Ewen <[hidden email]>:
@Sebastian: Getting memory away from the JVM is tricky always, completely independent of pre-allocation of managed memory or lazy allocation.

But here is something that may work:
  - Start Flink in streaming mode - that will make it allocate managed memory lazily
  - Set the memory to offheap memory. That way the JVM heap is small. The off-heap memory is returned when no longer used deallocated - this releases memory much better than JVM shrinking the heap.



On Wed, Dec 9, 2015 at 10:06 AM, Fabian Hueske <[hidden email]> wrote:
Hi Sebastian,

There is no way to return memory from a Flink process except shutting the process down.
I think YARN could help in your setup. In a YARN setup, you can flexibly start and stop Flink sessions with different configurations (memory, TMs, slots) or run a single job. When running a single job, Flink will allocate resources and free them after the job is done.

Best, Fabian

2015-12-09 9:46 GMT+01:00 Kruse, Sebastian <[hidden email]>:

Hi everyone,


I am currently looking into how Flink can coexist and interoperate with other frameworks in a cluster, such as plain single-machine processes or Spark​. ​Tachyon seems to be nice solution to exchange data between them. 


However, I think it is a problem that Flink's taskmanagers allocate their managed memory upfront - in contrast to Spark, as far as I know. If I want ​a taskmanager to yield its main memory, so that another process can use that memory, is there any other option besides shutting that taskmanager down? Would it be beneficial to use YARN?

Thanks for your help!


Cheers,

Sebastian





Reply | Threaded
Open this post in threaded view
|

Re: Taskmanager memory

Kruse, Sebastian

Thanks for your answers. So the problem with on-heap memory would be that the JVM would not shrink its already allocated heap even if it is largely unused?

Pertaining to the streaming-mode: If I run Flink in that mode, can I still submit batch jobs? Because that's what I want to do.


Thanks,

Sebastian


From: [hidden email] <[hidden email]> on behalf of Stephan Ewen <[hidden email]>
Sent: Wednesday, December 9, 2015 11:15
To: [hidden email]
Subject: Re: Taskmanager memory
 
Off heap memory is freed when the memory consuming operators release the memory.

The Java process releases that memory then on the next GC, as far as I know.

On Wed, Dec 9, 2015 at 11:01 AM, Fabian Hueske <[hidden email]> wrote:
Streaming mode with on-heap memory won't help because the JVM allocates all memory but doesn't convert it to managed memory internally, right?

Is offheap memory actually freed after it has been allocated as managed memory? Does this happen after a job finishes?

2015-12-09 10:44 GMT+01:00 Stephan Ewen <[hidden email]>:
@Sebastian: Getting memory away from the JVM is tricky always, completely independent of pre-allocation of managed memory or lazy allocation.

But here is something that may work:
  - Start Flink in streaming mode - that will make it allocate managed memory lazily
  - Set the memory to offheap memory. That way the JVM heap is small. The off-heap memory is returned when no longer used deallocated - this releases memory much better than JVM shrinking the heap.



On Wed, Dec 9, 2015 at 10:06 AM, Fabian Hueske <[hidden email]> wrote:
Hi Sebastian,

There is no way to return memory from a Flink process except shutting the process down.
I think YARN could help in your setup. In a YARN setup, you can flexibly start and stop Flink sessions with different configurations (memory, TMs, slots) or run a single job. When running a single job, Flink will allocate resources and free them after the job is done.

Best, Fabian

2015-12-09 9:46 GMT+01:00 Kruse, Sebastian <[hidden email]>:

Hi everyone,


I am currently looking into how Flink can coexist and interoperate with other frameworks in a cluster, such as plain single-machine processes or Spark​. ​Tachyon seems to be nice solution to exchange data between them. 


However, I think it is a problem that Flink's taskmanagers allocate their managed memory upfront - in contrast to Spark, as far as I know. If I want ​a taskmanager to yield its main memory, so that another process can use that memory, is there any other option besides shutting that taskmanager down? Would it be beneficial to use YARN?

Thanks for your help!


Cheers,

Sebastian





Reply | Threaded
Open this post in threaded view
|

Re: Taskmanager memory

Fabian Hueske-2
Yes, streaming mode supports batch jobs as well.
The difference is that in streaming mode, managed memory is lazily allocated. This is because the streaming runtime does not use managed memory but only heap memory.

2015-12-09 11:55 GMT+01:00 Kruse, Sebastian <[hidden email]>:

Thanks for your answers. So the problem with on-heap memory would be that the JVM would not shrink its already allocated heap even if it is largely unused?

Pertaining to the streaming-mode: If I run Flink in that mode, can I still submit batch jobs? Because that's what I want to do.


Thanks,

Sebastian


From: [hidden email] <[hidden email]> on behalf of Stephan Ewen <[hidden email]>
Sent: Wednesday, December 9, 2015 11:15
To: [hidden email]
Subject: Re: Taskmanager memory
 
Off heap memory is freed when the memory consuming operators release the memory.

The Java process releases that memory then on the next GC, as far as I know.

On Wed, Dec 9, 2015 at 11:01 AM, Fabian Hueske <[hidden email]> wrote:
Streaming mode with on-heap memory won't help because the JVM allocates all memory but doesn't convert it to managed memory internally, right?

Is offheap memory actually freed after it has been allocated as managed memory? Does this happen after a job finishes?

2015-12-09 10:44 GMT+01:00 Stephan Ewen <[hidden email]>:
@Sebastian: Getting memory away from the JVM is tricky always, completely independent of pre-allocation of managed memory or lazy allocation.

But here is something that may work:
  - Start Flink in streaming mode - that will make it allocate managed memory lazily
  - Set the memory to offheap memory. That way the JVM heap is small. The off-heap memory is returned when no longer used deallocated - this releases memory much better than JVM shrinking the heap.



On Wed, Dec 9, 2015 at 10:06 AM, Fabian Hueske <[hidden email]> wrote:
Hi Sebastian,

There is no way to return memory from a Flink process except shutting the process down.
I think YARN could help in your setup. In a YARN setup, you can flexibly start and stop Flink sessions with different configurations (memory, TMs, slots) or run a single job. When running a single job, Flink will allocate resources and free them after the job is done.

Best, Fabian

2015-12-09 9:46 GMT+01:00 Kruse, Sebastian <[hidden email]>:

Hi everyone,


I am currently looking into how Flink can coexist and interoperate with other frameworks in a cluster, such as plain single-machine processes or Spark​. ​Tachyon seems to be nice solution to exchange data between them. 


However, I think it is a problem that Flink's taskmanagers allocate their managed memory upfront - in contrast to Spark, as far as I know. If I want ​a taskmanager to yield its main memory, so that another process can use that memory, is there any other option besides shutting that taskmanager down? Would it be beneficial to use YARN?

Thanks for your help!


Cheers,

Sebastian






Reply | Threaded
Open this post in threaded view
|

Re: Taskmanager memory

Stephan Ewen
BTW, for 1.0, this is consolidated into one single mode...

On Wed, Dec 9, 2015 at 1:45 PM, Fabian Hueske <[hidden email]> wrote:
Yes, streaming mode supports batch jobs as well.
The difference is that in streaming mode, managed memory is lazily allocated. This is because the streaming runtime does not use managed memory but only heap memory.

2015-12-09 11:55 GMT+01:00 Kruse, Sebastian <[hidden email]>:

Thanks for your answers. So the problem with on-heap memory would be that the JVM would not shrink its already allocated heap even if it is largely unused?

Pertaining to the streaming-mode: If I run Flink in that mode, can I still submit batch jobs? Because that's what I want to do.


Thanks,

Sebastian


From: [hidden email] <[hidden email]> on behalf of Stephan Ewen <[hidden email]>
Sent: Wednesday, December 9, 2015 11:15
To: [hidden email]
Subject: Re: Taskmanager memory
 
Off heap memory is freed when the memory consuming operators release the memory.

The Java process releases that memory then on the next GC, as far as I know.

On Wed, Dec 9, 2015 at 11:01 AM, Fabian Hueske <[hidden email]> wrote:
Streaming mode with on-heap memory won't help because the JVM allocates all memory but doesn't convert it to managed memory internally, right?

Is offheap memory actually freed after it has been allocated as managed memory? Does this happen after a job finishes?

2015-12-09 10:44 GMT+01:00 Stephan Ewen <[hidden email]>:
@Sebastian: Getting memory away from the JVM is tricky always, completely independent of pre-allocation of managed memory or lazy allocation.

But here is something that may work:
  - Start Flink in streaming mode - that will make it allocate managed memory lazily
  - Set the memory to offheap memory. That way the JVM heap is small. The off-heap memory is returned when no longer used deallocated - this releases memory much better than JVM shrinking the heap.



On Wed, Dec 9, 2015 at 10:06 AM, Fabian Hueske <[hidden email]> wrote:
Hi Sebastian,

There is no way to return memory from a Flink process except shutting the process down.
I think YARN could help in your setup. In a YARN setup, you can flexibly start and stop Flink sessions with different configurations (memory, TMs, slots) or run a single job. When running a single job, Flink will allocate resources and free them after the job is done.

Best, Fabian

2015-12-09 9:46 GMT+01:00 Kruse, Sebastian <[hidden email]>:

Hi everyone,


I am currently looking into how Flink can coexist and interoperate with other frameworks in a cluster, such as plain single-machine processes or Spark​. ​Tachyon seems to be nice solution to exchange data between them. 


However, I think it is a problem that Flink's taskmanagers allocate their managed memory upfront - in contrast to Spark, as far as I know. If I want ​a taskmanager to yield its main memory, so that another process can use that memory, is there any other option besides shutting that taskmanager down? Would it be beneficial to use YARN?

Thanks for your help!


Cheers,

Sebastian