StatsD metric name prefix change for task manager after upgrading to Flink 1.11

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

StatsD metric name prefix change for task manager after upgrading to Flink 1.11

Allen Wang
Hello,

We noticed that after upgrading to Flink 1.11, the StatsD metric prefix is changed from the hostname to IP address of the task manager.

The Flink job runs in a k8s cluster.

Here is an example of metric reported to StatsD in Flink 1.10:
flink-ingest-cx-home-page-feed-flink-task-manager-7f8c7677l85pl.taskmanager.16c2dbc84eb27f336455615e642c6cdd.flink-ingest-cx-home-page-feed.Source- Custom Source.1.assigned-partitions:3.0|g
Here is an example of metric reported to StatsD in Flink 1.11:
10.4.155.205.taskmanager.0a900ab762d7d534ea8b20e84438166b.flink-ingest-xp-xp.Source- Custom Source.0.assigned-partitions:3.0|g
This caused a problem for us as StatsD interprets the segment before the first dot as the source. So after upgrading to 1.11, the task manager metrics all have "10" as the source. 
Is there any configuration to change this behavior back to the 1.10 version where the prefix of the metric is the host name?

Thanks,
Allen

Reply | Threaded
Open this post in threaded view
|

Re: StatsD metric name prefix change for task manager after upgrading to Flink 1.11

Chesnay Schepler
The TaskExecutor host being exposed is directly wired to what the RPC system for addresses, which may have changed due to (FLINK-15911; NAT support).

If the problem is purely about the periods in the IP, then I would suggest to create a custom reporter that extends the StatsDReporter and overrides filterCharacters to also replace periods.
This also reminds me of a suggestion we got in the past where we automatically replace occurrences of the delimiter; let me open an issue for that...

On 10/14/2020 6:54 PM, Allen Wang wrote:
Hello,

We noticed that after upgrading to Flink 1.11, the StatsD metric prefix is changed from the hostname to IP address of the task manager.

The Flink job runs in a k8s cluster.

Here is an example of metric reported to StatsD in Flink 1.10:
flink-ingest-cx-home-page-feed-flink-task-manager-7f8c7677l85pl.taskmanager.16c2dbc84eb27f336455615e642c6cdd.flink-ingest-cx-home-page-feed.Source- Custom Source.1.assigned-partitions:3.0|g
Here is an example of metric reported to StatsD in Flink 1.11:
10.4.155.205.taskmanager.0a900ab762d7d534ea8b20e84438166b.flink-ingest-xp-xp.Source- Custom Source.0.assigned-partitions:3.0|g
This caused a problem for us as StatsD interprets the segment before the first dot as the source. So after upgrading to 1.11, the task manager metrics all have "10" as the source. 
Is there any configuration to change this behavior back to the 1.10 version where the prefix of the metric is the host name?

Thanks,
Allen


Reply | Threaded
Open this post in threaded view
|

Re: StatsD metric name prefix change for task manager after upgrading to Flink 1.11

Nikola Hrusov
Hi,

I have also observed the same when upgrading to flink 1.11 running in docker and sending to graphite.
Prior to upgrading the taskmanagers would use the hostname. Since 1.11 they report their IPs
Sadly I did not find any resolution to my issue: https://lists.apache.org/thread.html/r620b18d12c08d13375a390f94e0cdff26462c6e26440b31236473793%40%3Cuser.flink.apache.org%3E

Regards
,
Nikola Hrusov


On Thu, Oct 15, 2020 at 3:49 PM Chesnay Schepler <[hidden email]> wrote:
The TaskExecutor host being exposed is directly wired to what the RPC system for addresses, which may have changed due to (FLINK-15911; NAT support).

If the problem is purely about the periods in the IP, then I would suggest to create a custom reporter that extends the StatsDReporter and overrides filterCharacters to also replace periods.
This also reminds me of a suggestion we got in the past where we automatically replace occurrences of the delimiter; let me open an issue for that...

On 10/14/2020 6:54 PM, Allen Wang wrote:
Hello,

We noticed that after upgrading to Flink 1.11, the StatsD metric prefix is changed from the hostname to IP address of the task manager.

The Flink job runs in a k8s cluster.

Here is an example of metric reported to StatsD in Flink 1.10:
flink-ingest-cx-home-page-feed-flink-task-manager-7f8c7677l85pl.taskmanager.16c2dbc84eb27f336455615e642c6cdd.flink-ingest-cx-home-page-feed.Source- Custom Source.1.assigned-partitions:3.0|g
Here is an example of metric reported to StatsD in Flink 1.11:
10.4.155.205.taskmanager.0a900ab762d7d534ea8b20e84438166b.flink-ingest-xp-xp.Source- Custom Source.0.assigned-partitions:3.0|g
This caused a problem for us as StatsD interprets the segment before the first dot as the source. So after upgrading to 1.11, the task manager metrics all have "10" as the source. 
Is there any configuration to change this behavior back to the 1.10 version where the prefix of the metric is the host name?

Thanks,
Allen