Hi,
I have a streaming pipeline running on flink and I need to collect metrics to identify how my algorithm is performing. The entire pipeline is multi-tenanted and I also need metrics per tenant. Lets say there would be around 20 metrics to be captured per tenant. I have the following ideas for implemention but any suggestions on which one might be better will help. 1. Use flink metric group and register a group per tenant at the operator level. The disadvantage of this approach for me is I need the runtimecontext parameter to register a metric and I have various subclasses to which I need to pass this object to limit the metric scope within the operator. Also there will be too many metrics reported if there are higher number of subtasks. How is everyone accessing flink state/ metrics from other classes where you don't have access to runtimecontext? 2. Use a custom singleton metric registry to add and send these metrics using custom sink. Instead of using flink metric group to collect metrics per operatior - subtask, collect per jvm and use influx sink to send the metric data. What i'm not sure in this case is how to collect only once per node/jvm. Thanks a bunch in advance. |
Hi,
I’m not sure if I completely understand your issue. 1. - You don’t have to pass RuntimeContext, you can always pass just the MetricGroup or ask your components/subclasses “what metrics do you want to register” and register them at the top level. - Reporting tens/hundreds/thousands of metrics shouldn’t be an issue for Flink, as long as you have a reasonable reporting interval. However keep in mind that Flink only reports your metrics and you still need something to read/handle/process/aggregate your metrics 2. I don’t think that reporting per node/jvm is possible with Flink’s metric system. For that you would need some other solution, like report your metrics using JMX (directly register MBeans from your code) Piotrek > On 10 Dec 2017, at 18:51, Navneeth Krishnan <[hidden email]> wrote: > > Hi, > > I have a streaming pipeline running on flink and I need to collect metrics to identify how my algorithm is performing. The entire pipeline is multi-tenanted and I also need metrics per tenant. Lets say there would be around 20 metrics to be captured per tenant. I have the following ideas for implemention but any suggestions on which one might be better will help. > > 1. Use flink metric group and register a group per tenant at the operator level. The disadvantage of this approach for me is I need the runtimecontext parameter to register a metric and I have various subclasses to which I need to pass this object to limit the metric scope within the operator. Also there will be too many metrics reported if there are higher number of subtasks. > How is everyone accessing flink state/ metrics from other classes where you don't have access to runtimecontext? > > 2. Use a custom singleton metric registry to add and send these metrics using custom sink. Instead of using flink metric group to collect metrics per operatior - subtask, collect per jvm and use influx sink to send the metric data. What i'm not sure in this case is how to collect only once per node/jvm. > > Thanks a bunch in advance. |
Thanks Piotr. Yes, passing the metric group should be sufficient. The subcomponents will not be able to provide the list of metrics to register since the metrics are created based on incoming data by tenant. Also I am planning to have the metrics reported every 10 seconds and hope it shouldn't be a problem. We use influx and grafana to plot the metrics. The option 2 that I had in mind was to collect all metrics and use influx db sink to report it directly inside the pipeline. But it seems reporting per node might not be possible. On Mon, Dec 11, 2017 at 3:14 AM, Piotr Nowojski <[hidden email]> wrote: Hi, |
Hi,
Reporting once per 10 seconds shouldn’t create problems. Best to try it out. Let us know if you get into some troubles :) Piotrek
|
Thanks Pitor. Also, Is there a way to specify custom metics scope? Basically I register metrics like below, add a custom metric group and then add a meter per user. I would like this to be reported as measurement "Users" and tags with user id. This way I can easily visualize the data in grafana or any other tool by selecting the measurement and group by tag. Is there a way to report like that instead of host, process_type, tm_id, job_name, task_name & subtask_index? metricGroup.addGroup("Users") Thanks a bunch. On Mon, Dec 11, 2017 at 11:12 PM, Piotr Nowojski <[hidden email]> wrote:
|
Hi,
At this point is up to either reporter, or up to the system that metrics are reported. You would need to extend an Influx db reporter to add some configuration options to ignore some metrics.
Can not you ignore first couple of groups/scopes in the Grafana? I think you can also add more groups in the user scope. metricGroup.addGroup("Users”).addGroup(“Foo”).addGroup(“Bar”). Piotrek
|
Free forum by Nabble | Edit this page |