(DEPRECATED) Apache Flink User Mailing List archive.

RocksDB CPU resource usage

Classic

List

Threaded

7 messages Options

Padarn Wilson-2

RocksDB CPU resource usage

Hi all,

We have a job that we just enabled rocksdb on (instead of file backend), and see that the CPU usage is almost 3x greater on (we had to increase taskmanagers 3x to get it to run.

I don't really understand this, is there something we can look at to understand why CPU use is so high? Our state mostly consists of aggregation windows.

Cheers,

Padarn

JING ZHANG

Re: RocksDB CPU resource usage

Hi Padarn,

After switch stateBackend from filesystem to rocksdb, all reads/writes from/to backend have to go through de-/serialization to retrieve/store the state objects, this may cause more cpu cost.

But I'm not sure it is the main reason leads to 3x CPU cost in your job.

To find out the reason, we need more profile on CPU cost, such as Flame Graphs. BTW, starting with Flink 1.13, Flame Graphs are natively supported in Flink[1].

[1] https://ci.apache.org/projects/flink/flink-docs-release-1.13/docs/ops/debugging/flame_graphs/

Best,

JING ZHANG

Padarn Wilson <[hidden email]> 于2021年6月15日周二下午5:05写道：

Hi all,

We have a job that we just enabled rocksdb on (instead of file backend), and see that the CPU usage is almost 3x greater on (we had to increase taskmanagers 3x to get it to run.

I don't really understand this, is there something we can look at to understand why CPU use is so high? Our state mostly consists of aggregation windows.

Cheers,
Padarn

rmetzger0

Re: RocksDB CPU resource usage

Depending on the datatypes you are using, seeing 3x more CPU usage seems realistic.

Serialization can be quite expensive. See also: https://flink.apache.org/news/2020/04/15/flink-serialization-tuning-vol-1.html Maybe it makes sense to optimize there a bit.

On Tue, Jun 15, 2021 at 5:23 PM JING ZHANG <[hidden email]> wrote:

Hi Padarn,
After switch stateBackend from filesystem to rocksdb, all reads/writes from/to backend have to go through de-/serialization to retrieve/store the state objects, this may cause more cpu cost.
But I'm not sure it is the main reason leads to 3x CPU cost in your job.
To find out the reason, we need more profile on CPU cost, such as Flame Graphs. BTW, starting with Flink 1.13, Flame Graphs are natively supported in Flink[1].

[1] https://ci.apache.org/projects/flink/flink-docs-release-1.13/docs/ops/debugging/flame_graphs/

Best,
JING ZHANG

Padarn Wilson <[hidden email]> 于2021年6月15日周二下午5:05写道：
Hi all,

We have a job that we just enabled rocksdb on (instead of file backend), and see that the CPU usage is almost 3x greater on (we had to increase taskmanagers 3x to get it to run.

I don't really understand this, is there something we can look at to understand why CPU use is so high? Our state mostly consists of aggregation windows.

Cheers,
Padarn

Padarn Wilson-2

Re: RocksDB CPU resource usage

Thanks Robert. I think it would be easy enough to test this hypothesis by making the same comparison with some simpler state inside the aggregation window.

On Wed, 16 Jun 2021, 7:58 pm Robert Metzger, <[hidden email]> wrote:

Depending on the datatypes you are using, seeing 3x more CPU usage seems realistic.
Serialization can be quite expensive. See also: https://flink.apache.org/news/2020/04/15/flink-serialization-tuning-vol-1.html Maybe it makes sense to optimize there a bit.

On Tue, Jun 15, 2021 at 5:23 PM JING ZHANG <[hidden email]> wrote:
Hi Padarn,
After switch stateBackend from filesystem to rocksdb, all reads/writes from/to backend have to go through de-/serialization to retrieve/store the state objects, this may cause more cpu cost.
But I'm not sure it is the main reason leads to 3x CPU cost in your job.
To find out the reason, we need more profile on CPU cost, such as Flame Graphs. BTW, starting with Flink 1.13, Flame Graphs are natively supported in Flink[1].

[1] https://ci.apache.org/projects/flink/flink-docs-release-1.13/docs/ops/debugging/flame_graphs/

Best,
JING ZHANG

Padarn Wilson <[hidden email]> 于2021年6月15日周二下午5:05写道：
Hi all,

We have a job that we just enabled rocksdb on (instead of file backend), and see that the CPU usage is almost 3x greater on (we had to increase taskmanagers 3x to get it to run.

I don't really understand this, is there something we can look at to understand why CPU use is so high? Our state mostly consists of aggregation windows.

Cheers,
Padarn

rmetzger0

Re: RocksDB CPU resource usage

If you are able to execute your job locally as well (with enough data), you can also run it with a profiler and see the CPU cycles spent on serialization (you can also use RocksDB locally)

On Wed, Jun 16, 2021 at 3:51 PM Padarn Wilson <[hidden email]> wrote:

Thanks Robert. I think it would be easy enough to test this hypothesis by making the same comparison with some simpler state inside the aggregation window.

On Wed, 16 Jun 2021, 7:58 pm Robert Metzger, <[hidden email]> wrote:
Depending on the datatypes you are using, seeing 3x more CPU usage seems realistic.
Serialization can be quite expensive. See also: https://flink.apache.org/news/2020/04/15/flink-serialization-tuning-vol-1.html Maybe it makes sense to optimize there a bit.

On Tue, Jun 15, 2021 at 5:23 PM JING ZHANG <[hidden email]> wrote:
Hi Padarn,
After switch stateBackend from filesystem to rocksdb, all reads/writes from/to backend have to go through de-/serialization to retrieve/store the state objects, this may cause more cpu cost.
But I'm not sure it is the main reason leads to 3x CPU cost in your job.
To find out the reason, we need more profile on CPU cost, such as Flame Graphs. BTW, starting with Flink 1.13, Flame Graphs are natively supported in Flink[1].

[1] https://ci.apache.org/projects/flink/flink-docs-release-1.13/docs/ops/debugging/flame_graphs/

Best,
JING ZHANG

Padarn Wilson <[hidden email]> 于2021年6月15日周二下午5:05写道：
Hi all,

We have a job that we just enabled rocksdb on (instead of file backend), and see that the CPU usage is almost 3x greater on (we had to increase taskmanagers 3x to get it to run.

I don't really understand this, is there something we can look at to understand why CPU use is so high? Our state mostly consists of aggregation windows.

Cheers,
Padarn

Yun Tang

Re: RocksDB CPU resource usage

Hi Padarn,

From my experiences, de-/serialization might not consume 3x CPU usage, and the background compaction could also increase the CPU usage. You could use async-profiler [1] to figure out what really consumed your CPU usage as it could also detect the native RocksDB thread stack.

[1] https://github.com/jvm-profiling-tools/async-profiler

Best

Yun Tang

From: Robert Metzger <[hidden email]>
Sent: Thursday, June 17, 2021 14:11
To: Padarn Wilson <[hidden email]>
Cc: JING ZHANG <[hidden email]>; user <[hidden email]>
Subject: Re: RocksDB CPU resource usage

If you are able to execute your job locally as well (with enough data), you can also run it with a profiler and see the CPU cycles spent on serialization (you can also use RocksDB locally)

On Wed, Jun 16, 2021 at 3:51 PM Padarn Wilson <[hidden email]> wrote:

Thanks Robert. I think it would be easy enough to test this hypothesis by making the same comparison with some simpler state inside the aggregation window.

On Wed, 16 Jun 2021, 7:58 pm Robert Metzger, <[hidden email]> wrote:

Depending on the datatypes you are using, seeing 3x more CPU usage seems realistic.
Serialization can be quite expensive. See also: https://flink.apache.org/news/2020/04/15/flink-serialization-tuning-vol-1.html Maybe it makes sense to optimize there a bit.

On Tue, Jun 15, 2021 at 5:23 PM JING ZHANG <[hidden email]> wrote:

Hi Padarn,
After switch stateBackend from filesystem to rocksdb, all reads/writes from/to backend have to go through de-/serialization to retrieve/store the state objects, this may cause more cpu cost.

But I'm not sure it is the main reason leads to 3x CPU cost in your job.

To find out the reason, we need more profile on CPU cost, such as Flame Graphs. BTW, starting with Flink 1.13, Flame Graphs are natively supported in Flink[1].

[1] https://ci.apache.org/projects/flink/flink-docs-release-1.13/docs/ops/debugging/flame_graphs/

Best,

JING ZHANG

Padarn Wilson <[hidden email]> 于2021年6月15日周二下午5:05写道：

Hi all,

We have a job that we just enabled rocksdb on (instead of file backend), and see that the CPU usage is almost 3x greater on (we had to increase taskmanagers 3x to get it to run.

I don't really understand this, is there something we can look at to understand why CPU use is so high? Our state mostly consists of aggregation windows.

Cheers,

Padarn

Padarn Wilson-2

Re: RocksDB CPU resource usage

Thanks both for the suggestions, all good ideas. I will try some of the profiling suggestions and report back.

On Thu, Jun 17, 2021 at 4:13 PM Yun Tang <[hidden email]> wrote:

Hi Padarn,

From my experiences, de-/serialization might not consume 3x CPU usage, and the background compaction could also increase the CPU usage. You could use async-profiler [1] to figure out what really consumed your CPU usage as it could also detect the native RocksDB thread stack.

[1] https://github.com/jvm-profiling-tools/async-profiler

Best

Yun Tang

From: Robert Metzger <[hidden email]>
Sent: Thursday, June 17, 2021 14:11
To: Padarn Wilson <[hidden email]>
Cc: JING ZHANG <[hidden email]>; user <[hidden email]>
Subject: Re: RocksDB CPU resource usage

If you are able to execute your job locally as well (with enough data), you can also run it with a profiler and see the CPU cycles spent on serialization (you can also use RocksDB locally)

On Wed, Jun 16, 2021 at 3:51 PM Padarn Wilson <[hidden email]> wrote:

Thanks Robert. I think it would be easy enough to test this hypothesis by making the same comparison with some simpler state inside the aggregation window.

On Wed, 16 Jun 2021, 7:58 pm Robert Metzger, <[hidden email]> wrote:

Depending on the datatypes you are using, seeing 3x more CPU usage seems realistic.
Serialization can be quite expensive. See also: https://flink.apache.org/news/2020/04/15/flink-serialization-tuning-vol-1.html Maybe it makes sense to optimize there a bit.

On Tue, Jun 15, 2021 at 5:23 PM JING ZHANG <[hidden email]> wrote:

Hi Padarn,
After switch stateBackend from filesystem to rocksdb, all reads/writes from/to backend have to go through de-/serialization to retrieve/store the state objects, this may cause more cpu cost.

But I'm not sure it is the main reason leads to 3x CPU cost in your job.

To find out the reason, we need more profile on CPU cost, such as Flame Graphs. BTW, starting with Flink 1.13, Flame Graphs are natively supported in Flink[1].

[1] https://ci.apache.org/projects/flink/flink-docs-release-1.13/docs/ops/debugging/flame_graphs/

Best,

JING ZHANG

Padarn Wilson <[hidden email]> 于2021年6月15日周二下午5:05写道：

Hi all,

We have a job that we just enabled rocksdb on (instead of file backend), and see that the CPU usage is almost 3x greater on (we had to increase taskmanagers 3x to get it to run.

I don't really understand this, is there something we can look at to understand why CPU use is so high? Our state mostly consists of aggregation windows.

Cheers,

Padarn