[Flink 1.6] How to get current total number of processed events

classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

[Flink 1.6] How to get current total number of processed events

Thanh-Nhan Vo

Hello all,

I have a question, please !
I’m using Flink 1.6 to process our data in streaming mode.
I wonder if at a given event, there is a way to get the current total number of processed events (before this event).

If possible, I want to get this total number of processed events as a value state in Keystream.
It means that for a given key in KeyStream, I want to retrieve not only the total number of processed events for this key but also the total number of processed events for all keys.

There is a way to do this in Flink 1.6, please!

Best regard,
Nhan

 

Reply | Threaded
Open this post in threaded view
|

Re: [Flink 1.6] How to get current total number of processed events

Kien Truong

Hi Nhan,

Logically, the total number of processed events before an event cannot be accurately calculated unless events processing are synchronized.

This is not scalable, so naturally I don't think Flink supports it.

Although, I suppose you can get an approximate count by using a non-keyed TumblingWindow, count the item inside the window, then use that value in the next window.


Regards,

Kien


On 1/21/2019 9:34 PM, Thanh-Nhan Vo wrote:

Hello all,

I have a question, please !
I’m using Flink 1.6 to process our data in streaming mode.
I wonder if at a given event, there is a way to get the current total number of processed events (before this event).

If possible, I want to get this total number of processed events as a value state in Keystream.
It means that for a given key in KeyStream, I want to retrieve not only the total number of processed events for this key but also the total number of processed events for all keys.

There is a way to do this in Flink 1.6, please!

Best regard,
Nhan

 

Reply | Threaded
Open this post in threaded view
|

RE: [Flink 1.6] How to get current total number of processed events

Thanh-Nhan Vo

Hi Kien Truong,


Thank you for your answer. I have another question, please !
If I count the number of messages processed for a given key j (denoted c_j), is there a way to retrieve max{c_j}, min{c_j}?

Thanks

 

De : Kien Truong [mailto:[hidden email]]
Envoyé : mercredi 23 janvier 2019 16:04
À : [hidden email]
Objet : Re: [Flink 1.6] How to get current total number of processed events

 

Hi Nhan,

Logically, the total number of processed events before an event cannot be accurately calculated unless events processing are synchronized.

This is not scalable, so naturally I don't think Flink supports it.

Although, I suppose you can get an approximate count by using a non-keyed TumblingWindow, count the item inside the window, then use that value in the next window.

 

Regards,

Kien

 

On 1/21/2019 9:34 PM, Thanh-Nhan Vo wrote:

Hello all,

I have a question, please !
I’m using Flink 1.6 to process our data in streaming mode.
I wonder if at a given event, there is a way to get the current total number of processed events (before this event).

If possible, I want to get this total number of processed events as a value state in Keystream.
It means that for a given key in KeyStream, I want to retrieve not only the total number of processed events for this key but also the total number of processed events for all keys.

There is a way to do this in Flink 1.6, please!

Best regard,
Nhan

 

Reply | Threaded
Open this post in threaded view
|

Re: [Flink 1.6] How to get current total number of processed events

Kien Truong

Hi Nhan,

You can store the max/min value using the value states of a KeyedProcessFunction,

or in the global state of a ProcessWindowFunction.


On processing each item, compare its value to the current max/min and update the stored value as needed.


Regards,

Kien


On 1/24/2019 12:37 AM, Thanh-Nhan Vo wrote:

Hi Kien Truong,


Thank you for your answer. I have another question, please !
If I count the number of messages processed for a given key j (denoted c_j), is there a way to retrieve max{c_j}, min{c_j}?

Thanks

 

De : Kien Truong [[hidden email]]
Envoyé : mercredi 23 janvier 2019 16:04
À : [hidden email]
Objet : Re: [Flink 1.6] How to get current total number of processed events

 

Hi Nhan,

Logically, the total number of processed events before an event cannot be accurately calculated unless events processing are synchronized.

This is not scalable, so naturally I don't think Flink supports it.

Although, I suppose you can get an approximate count by using a non-keyed TumblingWindow, count the item inside the window, then use that value in the next window.

 

Regards,

Kien

 

On 1/21/2019 9:34 PM, Thanh-Nhan Vo wrote:

Hello all,

I have a question, please !
I’m using Flink 1.6 to process our data in streaming mode.
I wonder if at a given event, there is a way to get the current total number of processed events (before this event).

If possible, I want to get this total number of processed events as a value state in Keystream.
It means that for a given key in KeyStream, I want to retrieve not only the total number of processed events for this key but also the total number of processed events for all keys.

There is a way to do this in Flink 1.6, please!

Best regard,
Nhan

 

Reply | Threaded
Open this post in threaded view
|

Re: [Flink 1.6] How to get current total number of processed events

Congxian Qiu
Hi, Nhan
Do you want the total number of the current parallelism or the operator? If you want the total number of the current parallelism, Is the operator state[1] satisfied with your use case?


Kien Truong <[hidden email]> 于2019年1月24日周四 下午7:45写道:

Hi Nhan,

You can store the max/min value using the value states of a KeyedProcessFunction,

or in the global state of a ProcessWindowFunction.


On processing each item, compare its value to the current max/min and update the stored value as needed.


Regards,

Kien


On 1/24/2019 12:37 AM, Thanh-Nhan Vo wrote:

Hi Kien Truong,


Thank you for your answer. I have another question, please !
If I count the number of messages processed for a given key j (denoted c_j), is there a way to retrieve max{c_j}, min{c_j}?

Thanks

 

De : Kien Truong [[hidden email]]
Envoyé : mercredi 23 janvier 2019 16:04
À : [hidden email]
Objet : Re: [Flink 1.6] How to get current total number of processed events

 

Hi Nhan,

Logically, the total number of processed events before an event cannot be accurately calculated unless events processing are synchronized.

This is not scalable, so naturally I don't think Flink supports it.

Although, I suppose you can get an approximate count by using a non-keyed TumblingWindow, count the item inside the window, then use that value in the next window.

 

Regards,

Kien

 

On 1/21/2019 9:34 PM, Thanh-Nhan Vo wrote:

Hello all,

I have a question, please !
I’m using Flink 1.6 to process our data in streaming mode.
I wonder if at a given event, there is a way to get the current total number of processed events (before this event).

If possible, I want to get this total number of processed events as a value state in Keystream.
It means that for a given key in KeyStream, I want to retrieve not only the total number of processed events for this key but also the total number of processed events for all keys.

There is a way to do this in Flink 1.6, please!

Best regard,
Nhan

 



--
Best,
Congxian
Reply | Threaded
Open this post in threaded view
|

RE: [Flink 1.6] How to get current total number of processed events

Thanh-Nhan Vo
In reply to this post by Kien Truong

Hi Kien,

Thank you for your answer.

Please correct me if I’m wrong. If I understand well, if I store the max/min value using the value states of a KeyedProcessFunction, this max/min value is calculated  per key?

Note that in my case, I expect that at every instant,  I can obtain the maximum/minimum number of processed messages for all keys. For example:

-        Input datastream : [ message1(k1, v1)  messages2(k2,v2)  message3(k1, v3)  message4(k4, v4)  message5(k1, v5) message6(k2, v6)  message7(k7, v7)]

 

-        When processing message7(k7, v7), I expect to obtain:

 

o   Maximum number of processed messages: 3 (corresponding to key k1)

o   Minimum number of processed messages: 1 (corresponding to keys 4 and 7)

 

Do you have any idea to obtain this, please?

Thank you so much !

 

Nhan

De : Kien Truong [mailto:[hidden email]]
Envoyé : jeudi 24 janvier 2019 12:45
À : Thanh-Nhan Vo <[hidden email]>; [hidden email]
Objet : Re: [Flink 1.6] How to get current total number of processed events

 

Hi Nhan,

You can store the max/min value using the value states of a KeyedProcessFunction,

or in the global state of a ProcessWindowFunction.

 

On processing each item, compare its value to the current max/min and update the stored value as needed.

 

Regards,

Kien

 

On 1/24/2019 12:37 AM, Thanh-Nhan Vo wrote:

Hi Kien Truong,


Thank you for your answer. I have another question, please !
If I count the number of messages processed for a given key j (denoted c_j), is there a way to retrieve max{c_j}, min{c_j}?

Thanks

 

De : Kien Truong [[hidden email]]
Envoyé : mercredi 23 janvier 2019 16:04
À : [hidden email]
Objet : Re: [Flink 1.6] How to get current total number of processed events

 

Hi Nhan,

Logically, the total number of processed events before an event cannot be accurately calculated unless events processing are synchronized.

This is not scalable, so naturally I don't think Flink supports it.

Although, I suppose you can get an approximate count by using a non-keyed TumblingWindow, count the item inside the window, then use that value in the next window.

 

Regards,

Kien

 

On 1/21/2019 9:34 PM, Thanh-Nhan Vo wrote:

Hello all,

I have a question, please !
I’m using Flink 1.6 to process our data in streaming mode.
I wonder if at a given event, there is a way to get the current total number of processed events (before this event).

If possible, I want to get this total number of processed events as a value state in Keystream.
It means that for a given key in KeyStream, I want to retrieve not only the total number of processed events for this key but also the total number of processed events for all keys.

There is a way to do this in Flink 1.6, please!

Best regard,
Nhan

 

Reply | Threaded
Open this post in threaded view
|

RE: [Flink 1.6] How to get current total number of processed events

Thanh-Nhan Vo
In reply to this post by Congxian Qiu

Hi Congixan Wiu,

Thank you for your answer.

If I understand well, each operator state is bound to one parallel operator instance.
Indeed, I expect to get the total number of all parallel operator instances.

Is there a way to sum up all these operator states , please?

Best regard,
Nhan

 

De : Congxian Qiu [mailto:[hidden email]]
Envoyé : vendredi 25 janvier 2019 07:30
À : Kien Truong <[hidden email]>
Cc : Thanh-Nhan Vo <[hidden email]>; [hidden email]
Objet : Re: [Flink 1.6] How to get current total number of processed events

 

Hi, Nhan

Do you want the total number of the current parallelism or the operator? If you want the total number of the current parallelism, Is the operator state[1] satisfied with your use case?

 

 

Kien Truong <[hidden email]> 2019124日周四 下午7:45写道:

Hi Nhan,

You can store the max/min value using the value states of a KeyedProcessFunction,

or in the global state of a ProcessWindowFunction.

 

On processing each item, compare its value to the current max/min and update the stored value as needed.

 

Regards,

Kien

 

On 1/24/2019 12:37 AM, Thanh-Nhan Vo wrote:

Hi Kien Truong,


Thank you for your answer. I have another question, please !
If I count the number of messages processed for a given key j (denoted c_j), is there a way to retrieve max{c_j}, min{c_j}?

Thanks

 

De : Kien Truong [[hidden email]]
Envoyé : mercredi 23 janvier 2019 16:04
À : [hidden email]
Objet : Re: [Flink 1.6] How to get current total number of processed events

 

Hi Nhan,

Logically, the total number of processed events before an event cannot be accurately calculated unless events processing are synchronized.

This is not scalable, so naturally I don't think Flink supports it.

Although, I suppose you can get an approximate count by using a non-keyed TumblingWindow, count the item inside the window, then use that value in the next window.

 

Regards,

Kien

 

On 1/21/2019 9:34 PM, Thanh-Nhan Vo wrote:

Hello all,

I have a question, please !
I’m using Flink 1.6 to process our data in streaming mode.
I wonder if at a given event, there is a way to get the current total number of processed events (before this event).

If possible, I want to get this total number of processed events as a value state in Keystream.
It means that for a given key in KeyStream, I want to retrieve not only the total number of processed events for this key but also the total number of processed events for all keys.

There is a way to do this in Flink 1.6, please!

Best regard,
Nhan

 


 

--

Best,

Congxian

Reply | Threaded
Open this post in threaded view
|

Re: [Flink 1.6] How to get current total number of processed events

Congxian Qiu
Hi, Nhan
There is only one way I know to sum up all the parallel operator instances: set parallel to 1.

Best,
Congxian 

Thanh-Nhan Vo <[hidden email]> 于2019年1月25日周五 下午4:38写道:

Hi Congixan Wiu,

Thank you for your answer.

If I understand well, each operator state is bound to one parallel operator instance.
Indeed, I expect to get the total number of all parallel operator instances.

Is there a way to sum up all these operator states , please?

Best regard,
Nhan

 

De : Congxian Qiu [mailto:[hidden email]]
Envoyé : vendredi 25 janvier 2019 07:30
À : Kien Truong <[hidden email]>
Cc : Thanh-Nhan Vo <[hidden email]>; [hidden email]
Objet : Re: [Flink 1.6] How to get current total number of processed events

 

Hi, Nhan

Do you want the total number of the current parallelism or the operator? If you want the total number of the current parallelism, Is the operator state[1] satisfied with your use case?

 

 

Kien Truong <[hidden email]> 2019124日周四 下午7:45写道:

Hi Nhan,

You can store the max/min value using the value states of a KeyedProcessFunction,

or in the global state of a ProcessWindowFunction.

 

On processing each item, compare its value to the current max/min and update the stored value as needed.

 

Regards,

Kien

 

On 1/24/2019 12:37 AM, Thanh-Nhan Vo wrote:

Hi Kien Truong,


Thank you for your answer. I have another question, please !
If I count the number of messages processed for a given key j (denoted c_j), is there a way to retrieve max{c_j}, min{c_j}?

Thanks

 

De : Kien Truong [[hidden email]]
Envoyé : mercredi 23 janvier 2019 16:04
À : [hidden email]
Objet : Re: [Flink 1.6] How to get current total number of processed events

 

Hi Nhan,

Logically, the total number of processed events before an event cannot be accurately calculated unless events processing are synchronized.

This is not scalable, so naturally I don't think Flink supports it.

Although, I suppose you can get an approximate count by using a non-keyed TumblingWindow, count the item inside the window, then use that value in the next window.

 

Regards,

Kien

 

On 1/21/2019 9:34 PM, Thanh-Nhan Vo wrote:

Hello all,

I have a question, please !
I’m using Flink 1.6 to process our data in streaming mode.
I wonder if at a given event, there is a way to get the current total number of processed events (before this event).

If possible, I want to get this total number of processed events as a value state in Keystream.
It means that for a given key in KeyStream, I want to retrieve not only the total number of processed events for this key but also the total number of processed events for all keys.

There is a way to do this in Flink 1.6, please!

Best regard,
Nhan

 


 

--

Best,

Congxian



--
Best,
Congxian
Reply | Threaded
Open this post in threaded view
|

Re: [Flink 1.6] How to get current total number of processed events

Kien Truong
In reply to this post by Thanh-Nhan Vo

Hi Nhan,

To get a global view over all events, you can use a non-keyed TumblingWindow and a ProcessAllWindowFunction.

Inside the ProcessAllWindowFunction, you calculate the min/max/count of the elements of that window,

compared them to the existing values in the global state, then update the new min/max/count to global state to use in the next window.

You can also get the min/max/count downstream by emitting it together with the window's item.


Do note that non-keyed Window always run with a parallelism of 1, so it might create a hotspot/bottleneck in your stream.


Regards,

Kien


On 1/25/2019 3:17 PM, Thanh-Nhan Vo wrote:

Hi Kien,

Thank you for your answer.

Please correct me if I’m wrong. If I understand well, if I store the max/min value using the value states of a KeyedProcessFunction, this max/min value is calculated  per key?

Note that in my case, I expect that at every instant,  I can obtain the maximum/minimum number of processed messages for all keys. For example:

-        Input datastream : [ message1(k1, v1)  messages2(k2,v2)  message3(k1, v3)  message4(k4, v4)  message5(k1, v5) message6(k2, v6)  message7(k7, v7)]

 

-        When processing message7(k7, v7), I expect to obtain:

 

o   Maximum number of processed messages: 3 (corresponding to key k1)

o   Minimum number of processed messages: 1 (corresponding to keys 4 and 7)

 

Do you have any idea to obtain this, please?

Thank you so much !

 

Nhan

De : Kien Truong [[hidden email]]
Envoyé : jeudi 24 janvier 2019 12:45
À : Thanh-Nhan Vo [hidden email]; [hidden email]
Objet : Re: [Flink 1.6] How to get current total number of processed events

 

Hi Nhan,

You can store the max/min value using the value states of a KeyedProcessFunction,

or in the global state of a ProcessWindowFunction.

 

On processing each item, compare its value to the current max/min and update the stored value as needed.

 

Regards,

Kien

 

On 1/24/2019 12:37 AM, Thanh-Nhan Vo wrote:

Hi Kien Truong,


Thank you for your answer. I have another question, please !
If I count the number of messages processed for a given key j (denoted c_j), is there a way to retrieve max{c_j}, min{c_j}?

Thanks

 

De : Kien Truong [[hidden email]]
Envoyé : mercredi 23 janvier 2019 16:04
À : [hidden email]
Objet : Re: [Flink 1.6] How to get current total number of processed events

 

Hi Nhan,

Logically, the total number of processed events before an event cannot be accurately calculated unless events processing are synchronized.

This is not scalable, so naturally I don't think Flink supports it.

Although, I suppose you can get an approximate count by using a non-keyed TumblingWindow, count the item inside the window, then use that value in the next window.

 

Regards,

Kien

 

On 1/21/2019 9:34 PM, Thanh-Nhan Vo wrote:

Hello all,

I have a question, please !
I’m using Flink 1.6 to process our data in streaming mode.
I wonder if at a given event, there is a way to get the current total number of processed events (before this event).

If possible, I want to get this total number of processed events as a value state in Keystream.
It means that for a given key in KeyStream, I want to retrieve not only the total number of processed events for this key but also the total number of processed events for all keys.

There is a way to do this in Flink 1.6, please!

Best regard,
Nhan

 

Reply | Threaded
Open this post in threaded view
|

RE: [Flink 1.6] How to get current total number of processed events

Thanh-Nhan Vo

Hi Kien,

Thanks you so much for you answer !

Regards,

Nhan

 

De : Kien Truong [mailto:[hidden email]]
Envoyé : vendredi 25 janvier 2019 13:47
À : Thanh-Nhan Vo <[hidden email]>; [hidden email]
Objet : Re: [Flink 1.6] How to get current total number of processed events

 

Hi Nhan,

To get a global view over all events, you can use a non-keyed TumblingWindow and a ProcessAllWindowFunction.

Inside the ProcessAllWindowFunction, you calculate the min/max/count of the elements of that window,

compared them to the existing values in the global state, then update the new min/max/count to global state to use in the next window.

You can also get the min/max/count downstream by emitting it together with the window's item.

 

Do note that non-keyed Window always run with a parallelism of 1, so it might create a hotspot/bottleneck in your stream.

 

Regards,

Kien

 

On 1/25/2019 3:17 PM, Thanh-Nhan Vo wrote:

Hi Kien,

Thank you for your answer.

Please correct me if I’m wrong. If I understand well, if I store the max/min value using the value states of a KeyedProcessFunction, this max/min value is calculated  per key?

Note that in my case, I expect that at every instant,  I can obtain the maximum/minimum number of processed messages for all keys. For example:


-        Input datastream : [ message1(k1, v1)  messages2(k2,v2)  message3(k1, v3)  message4(k4, v4)  message5(k1, v5) message6(k2, v6)  message7(k7, v7)]

 

-        When processing message7(k7, v7), I expect to obtain:

 

o   Maximum number of processed messages: 3 (corresponding to key k1)

o   Minimum number of processed messages: 1 (corresponding to keys 4 and 7)

 

Do you have any idea to obtain this, please?

Thank you so much !

 

Nhan

De : Kien Truong [[hidden email]]
Envoyé : jeudi 24 janvier 2019 12:45
À : Thanh-Nhan Vo [hidden email]; [hidden email]
Objet : Re: [Flink 1.6] How to get current total number of processed events

 

Hi Nhan,

You can store the max/min value using the value states of a KeyedProcessFunction,

or in the global state of a ProcessWindowFunction.

 

On processing each item, compare its value to the current max/min and update the stored value as needed.

 

Regards,

Kien

 

On 1/24/2019 12:37 AM, Thanh-Nhan Vo wrote:

Hi Kien Truong,


Thank you for your answer. I have another question, please !
If I count the number of messages processed for a given key j (denoted c_j), is there a way to retrieve max{c_j}, min{c_j}?

Thanks

 

De : Kien Truong [[hidden email]]
Envoyé : mercredi 23 janvier 2019 16:04
À : [hidden email]
Objet : Re: [Flink 1.6] How to get current total number of processed events

 

Hi Nhan,

Logically, the total number of processed events before an event cannot be accurately calculated unless events processing are synchronized.

This is not scalable, so naturally I don't think Flink supports it.

Although, I suppose you can get an approximate count by using a non-keyed TumblingWindow, count the item inside the window, then use that value in the next window.

 

Regards,

Kien

 

On 1/21/2019 9:34 PM, Thanh-Nhan Vo wrote:

Hello all,

I have a question, please !
I’m using Flink 1.6 to process our data in streaming mode.
I wonder if at a given event, there is a way to get the current total number of processed events (before this event).

If possible, I want to get this total number of processed events as a value state in Keystream.
It means that for a given key in KeyStream, I want to retrieve not only the total number of processed events for this key but also the total number of processed events for all keys.

There is a way to do this in Flink 1.6, please!

Best regard,
Nhan