Applying multiple calculation on data aggregated on window

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Applying multiple calculation on data aggregated on window

Soheil Pourbafrani
Hi,

Im my environment I need to collect stream of messages into windows based on some fields as key and then I need to do multiple calculations that will apply on specaified messages. for example if i had the following messages on the window:
{ts: 1, key: a, value: 10}
{ts: 1, key: b, value: 0}
{ts: 1, key: c, value: 2}
{ts: 1, key: d, value: 5}
{ts: 1, key: e, value: 6}
{ts: 1, key: f, value: 7}
{ts: 1, key: g, value: 9}

- for the keys a, b and c I need to calculate the average of the values (12/3=4) and generate another message like {ts: 1, key: abc, value: 4}

- for the key f and d I need to get the sum (5 + 7 = 12) and generate {ts: 1, key: fd, value: 12}

and I don't need the messages with the key e and g

So I did the following:
raw
.keyBy(4, 5)
.timeWindow(Time.seconds(5))
but I don't know how flink can help me to apply the logic to the data. I think I need to use some method other than reduce or aggregate.

Any help will be appreciated.

thanks
Reply | Threaded
Open this post in threaded view
|

Re: Applying multiple calculation on data aggregated on window

Yun Gao
Hi Soheil,

    Is it possible to first add an operator to preprocess the records to filter out unused records and add a special operation id ? It may looks like
  
     raw..filter()  // Filter out e and g
        .map()  // Transform {ts: 1, key: a, value: 10} to {ts: 1, key: a, value: 10, op-id: "1-avg"}
        .keyBy() // Key by the op-id
        .timeWindow(Time.seconds(5))
        .process()  // Process the window. The operation is able to be deduced from the operation id.
           

Best,
Yun Gao

------------------------------------------------------------------
From:Soheil Pourbafrani <[hidden email]>
Send Time:2019 May 16 (Thu.) 06:47
To:user <[hidden email]>
Subject:Applying multiple calculation on data aggregated on window

Hi,

Im my environment I need to collect stream of messages into windows based on some fields as key and then I need to do multiple calculations that will apply on specaified messages. for example if i had the following messages on the window:
{ts: 1, key: a, value: 10}
{ts: 1, key: b, value: 0}
{ts: 1, key: c, value: 2}
{ts: 1, key: d, value: 5}
{ts: 1, key: e, value: 6}
{ts: 1, key: f, value: 7}
{ts: 1, key: g, value: 9}

- for the keys a, b and c I need to calculate the average of the values (12/3=4) and generate another message like {ts: 1, key: abc, value: 4}

- for the key f and d I need to get the sum (5 + 7 = 12) and generate {ts: 1, key: fd, value: 12}

and I don't need the messages with the key e and g

So I did the following:
raw
.keyBy(4, 5)
.timeWindow(Time.seconds(5))
but I don't know how flink can help me to apply the logic to the data. I think I need to use some method other than reduce or aggregate.

Any help will be appreciated.

thanks