[DISCUSS] Make Managed Memory always off-heap (Adjustment to FLIP-49)

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

[DISCUSS] Make Managed Memory always off-heap (Adjustment to FLIP-49)

Stephan Ewen
Hi all!

Yesterday, some of the people involved in FLIP-49 had a long discussion about managed memory in Flink.
Particularly, the fact that we have managed memory either on heap or off heap and that FLIP-49 introduced having both of these types of memory at the same time.

==> What we want to suggest is a simplification to only have off-heap managed memory.

The rationale is the following:
  - Integrating state backends with managed memory means we need to support "reserving" memory on top of creating MemorySegments.
    Reserving memory isn't really possible on the Java Heap, but works well off-heap

  - All components that will use managed memory will work with off-heap managed memory: MemorySegment-based structures, RocksDB, possibly external processes in the future.

  - A setup where state backends integrate with managed memory, but managed memory is by default all on-heap breaks the RocksDB backend out of the box experience.

  - The only state backend to not use managed memory is the HeapKeyedStateBackend (used in MemoryStateBackend and FileStateBackend). It means that the HeapKeyedStateBackend always, also when all managed memory is off-heap.

  - The larger use of the HeapKeyedStateBackend needs a larger JVM heap. The current FLIP-49 way to get this is to "configure managed memory to on-heap, but the managed memory will not be used, it just helps to implicitly grow the heap through the way the heap size is computed. That is a pretty confusing story. Especially when we start thinking about scenarios where Flink runs as a library in pre-existing JVM, about the mini-cluster, etc. It is simpler (and more accurate) to just say that the HeapKeyedStateBackend does not participate in managed memory, and extensive use of it requires to user to reserve heap memory (in FLIP-49 you have a new TaskHeapMemory option to request that a larger heap should be created).

==> This seems to support all scenarios in a nice way out of the box.

==> This seems easier to understand for users.

==> This simplifies the implementation of resource profiles, configuration, and computation of memory pools.


Does anybody have a concern about his? In particular, would any users be impacted if MemorySegment based jobs (batch) would now run always with off-heap memory?

If no one raises an objection, we would update the FLIP-49 proposal to have a default setup of dividing the Flink memory by default into 50% JVM heap and 50% managed memory (or even 60%/40%). All state backends and batch jobs will have a good out-of-the-box experience that way.

Best,
Stephan
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Make Managed Memory always off-heap (Adjustment to FLIP-49)

Jingsong Li
Hi Stephan,

+1 to default have off-heap managed memory.

From the perspective of batch, In our long-term performance test and online practice:
- There is no significant difference in performance between heap and off-heap memory. If it is a heap object, the JVM has many opportunities to optimize in JIT, so generally speaking, the heap object will be faster. But at present, the manage memory we use in Flink is used as binary. In this case, we use unsafe api to operate, so there is no obvious performance gap.
- On the contrary, too much memory in the heap will affect the performance and latency of GC.

But I'm not sure if we should only have off heap managed memory.
According to previous experience, array and object operations in the JVM will be more beneficial. As mentioned earlier, the JVM/JIT will do a lot of optimization.
- For vectorization, the way of array is obviously more conducive to calculation. JVM can have many optimizations in array loop.
- We can consider using some deep code generation to generate some dynamic Java objects to further speed up the operators. The snappydata[1] has done some work in this area.

So I am +0 to only have off-heap managed memory. Because we don't rely on heap memory right now, only a few ideas for the future.


Best,
Jingsong Lee

On Wed, Nov 27, 2019 at 10:14 AM Stephan Ewen <[hidden email]> wrote:
Hi all!

Yesterday, some of the people involved in FLIP-49 had a long discussion about managed memory in Flink.
Particularly, the fact that we have managed memory either on heap or off heap and that FLIP-49 introduced having both of these types of memory at the same time.

==> What we want to suggest is a simplification to only have off-heap managed memory.

The rationale is the following:
  - Integrating state backends with managed memory means we need to support "reserving" memory on top of creating MemorySegments.
    Reserving memory isn't really possible on the Java Heap, but works well off-heap

  - All components that will use managed memory will work with off-heap managed memory: MemorySegment-based structures, RocksDB, possibly external processes in the future.

  - A setup where state backends integrate with managed memory, but managed memory is by default all on-heap breaks the RocksDB backend out of the box experience.

  - The only state backend to not use managed memory is the HeapKeyedStateBackend (used in MemoryStateBackend and FileStateBackend). It means that the HeapKeyedStateBackend always, also when all managed memory is off-heap.

  - The larger use of the HeapKeyedStateBackend needs a larger JVM heap. The current FLIP-49 way to get this is to "configure managed memory to on-heap, but the managed memory will not be used, it just helps to implicitly grow the heap through the way the heap size is computed. That is a pretty confusing story. Especially when we start thinking about scenarios where Flink runs as a library in pre-existing JVM, about the mini-cluster, etc. It is simpler (and more accurate) to just say that the HeapKeyedStateBackend does not participate in managed memory, and extensive use of it requires to user to reserve heap memory (in FLIP-49 you have a new TaskHeapMemory option to request that a larger heap should be created).

==> This seems to support all scenarios in a nice way out of the box.

==> This seems easier to understand for users.

==> This simplifies the implementation of resource profiles, configuration, and computation of memory pools.


Does anybody have a concern about his? In particular, would any users be impacted if MemorySegment based jobs (batch) would now run always with off-heap memory?

If no one raises an objection, we would update the FLIP-49 proposal to have a default setup of dividing the Flink memory by default into 50% JVM heap and 50% managed memory (or even 60%/40%). All state backends and batch jobs will have a good out-of-the-box experience that way.

Best,
Stephan


--
Best, Jingsong Lee
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Make Managed Memory always off-heap (Adjustment to FLIP-49)

Xintong Song
Sorry, I just realized that I've send my feedbacks to Jingsong's email address, instead of the dev / user mailing list.

Please find my comments below.


Thank you~

Xintong Song


On Wed, Nov 27, 2019 at 4:32 PM Xintong Song <[hidden email]> wrote:
As a participant of the discussion yesterday, I'm +1 for the proposal of removing on-heap managed memory.

And there's one thing I want to add. In order to "reserving" memory (where memory consumers do not allocate MemorySegments from MemoryManager but allocate the reserved memory themselves), we no longer support pre-allocation of memory segments in FLIP-49. That means even if we do not remove on-heap managed memory, the MemorySegment will not be allocated unless requested by the consumer, and will be deallocated immediately when released by the consumer. Thus, it is likely that the memory segments will not always stays in the JVM old generation, and will be affected by GC / swapping just like other java objects.

@Jingsong, I'm not sure whether this will be related to the performance issue that you mentioned.

Thank you~

Xintong Song



On Wed, Nov 27, 2019 at 12:10 PM Jingsong Li <[hidden email]> wrote:
Hi Stephan,

+1 to default have off-heap managed memory.

From the perspective of batch, In our long-term performance test and online practice:
- There is no significant difference in performance between heap and off-heap memory. If it is a heap object, the JVM has many opportunities to optimize in JIT, so generally speaking, the heap object will be faster. But at present, the manage memory we use in Flink is used as binary. In this case, we use unsafe api to operate, so there is no obvious performance gap.
- On the contrary, too much memory in the heap will affect the performance and latency of GC.

But I'm not sure if we should only have off heap managed memory.
According to previous experience, array and object operations in the JVM will be more beneficial. As mentioned earlier, the JVM/JIT will do a lot of optimization.
- For vectorization, the way of array is obviously more conducive to calculation. JVM can have many optimizations in array loop.
- We can consider using some deep code generation to generate some dynamic Java objects to further speed up the operators. The snappydata[1] has done some work in this area.

So I am +0 to only have off-heap managed memory. Because we don't rely on heap memory right now, only a few ideas for the future.


Best,
Jingsong Lee

On Wed, Nov 27, 2019 at 10:14 AM Stephan Ewen <[hidden email]> wrote:
Hi all!

Yesterday, some of the people involved in FLIP-49 had a long discussion about managed memory in Flink.
Particularly, the fact that we have managed memory either on heap or off heap and that FLIP-49 introduced having both of these types of memory at the same time.

==> What we want to suggest is a simplification to only have off-heap managed memory.

The rationale is the following:
  - Integrating state backends with managed memory means we need to support "reserving" memory on top of creating MemorySegments.
    Reserving memory isn't really possible on the Java Heap, but works well off-heap

  - All components that will use managed memory will work with off-heap managed memory: MemorySegment-based structures, RocksDB, possibly external processes in the future.

  - A setup where state backends integrate with managed memory, but managed memory is by default all on-heap breaks the RocksDB backend out of the box experience.

  - The only state backend to not use managed memory is the HeapKeyedStateBackend (used in MemoryStateBackend and FileStateBackend). It means that the HeapKeyedStateBackend always, also when all managed memory is off-heap.

  - The larger use of the HeapKeyedStateBackend needs a larger JVM heap. The current FLIP-49 way to get this is to "configure managed memory to on-heap, but the managed memory will not be used, it just helps to implicitly grow the heap through the way the heap size is computed. That is a pretty confusing story. Especially when we start thinking about scenarios where Flink runs as a library in pre-existing JVM, about the mini-cluster, etc. It is simpler (and more accurate) to just say that the HeapKeyedStateBackend does not participate in managed memory, and extensive use of it requires to user to reserve heap memory (in FLIP-49 you have a new TaskHeapMemory option to request that a larger heap should be created).

==> This seems to support all scenarios in a nice way out of the box.

==> This seems easier to understand for users.

==> This simplifies the implementation of resource profiles, configuration, and computation of memory pools.


Does anybody have a concern about his? In particular, would any users be impacted if MemorySegment based jobs (batch) would now run always with off-heap memory?

If no one raises an objection, we would update the FLIP-49 proposal to have a default setup of dividing the Flink memory by default into 50% JVM heap and 50% managed memory (or even 60%/40%). All state backends and batch jobs will have a good out-of-the-box experience that way.

Best,
Stephan


--
Best, Jingsong Lee