(DEPRECATED) Apache Flink User Mailing List archive.

Optimizing multiple aggregate queries on a CEP using Flink

Classic

List

Threaded

5 messages Options

Sahil Arora

Optimizing multiple aggregate queries on a CEP using Flink

Hi there,

We have been working on a project with the title "Optimizing Multiple Aggregate Queries over a Complex Event Processing Engine". The aim is to optimize a group of queries. Take such as "how many cars passed the post in the past 1 minute" and "how many cars passed the post in the past 2 minutes" are 2 queries, and the naive and inefficient method to answer both the queries is to independently solve both of these queries one by one and find the answer. However, the optimum way would be to minimize the computation by using the answer given by query 1 and using it in query 2. This is basically what our aim is, to minimize computation cost when we have multiple aggregate queries in a CEP.

We have been searching for some platform which supports CEP, and Flink is probably one of them. Hence, it would be very helpful if we could get some answers to the following questions:

1. Does flink already have some method of optimizing multiple aggregate queries?

2. Is it possible for us to implement / test such an algorithm in flink which considers multiple queries in a CEP, like having a database of SQL queries and testing an algorithm of our choice?

Any other inputs which may help us with solving the problem would be highly welcome.

Thanks a lot.

Sahil Arora

Final year B.Tech Undergrad | Indian Institute of Technology Mandi

Web: https://sahilarora535.github.io
LinkedIn: sahilarora535
Ph: <a href="tel:+91%2081305%2006047" value="+918130506047" class="gmail_msg" target="_blank">+91-8130506047

Timo Walther

Re: Optimizing multiple aggregate queries on a CEP using Flink

Hi Sahil,

I'm not a CEP expert but I will loop in Kostas (in CC). In general, the example that you described can be easily done with a ProcessFunction [1]. A process function not only allows to keep state (like a count) but also allows you to set timers flexibly for specific use cases such that aggregations can be triggered/reused. So in general I would say that implementing and testing such an algorithm is possible. How easy it can be interegrated into the CEP API, I don't know.

Regards,
Timo

[1] https://ci.apache.org/projects/flink/flink-docs-release-1.4/dev/stream/operators/process_function.html

Am 2/9/18 um 11:28 PM schrieb Sahil Arora:

Hi there,

We have been working on a project with the title "Optimizing Multiple Aggregate Queries over a Complex Event Processing Engine". The aim is to optimize a group of queries. Take such as "how many cars passed the post in the past 1 minute" and "how many cars passed the post in the past 2 minutes" are 2 queries, and the naive and inefficient method to answer both the queries is to independently solve both of these queries one by one and find the answer. However, the optimum way would be to minimize the computation by using the answer given by query 1 and using it in query 2. This is basically what our aim is, to minimize computation cost when we have multiple aggregate queries in a CEP.

We have been searching for some platform which supports CEP, and Flink is probably one of them. Hence, it would be very helpful if we could get some answers to the following questions:

1. Does flink already have some method of optimizing multiple aggregate queries?

2. Is it possible for us to implement / test such an algorithm in flink which considers multiple queries in a CEP, like having a database of SQL queries and testing an algorithm of our choice?

Any other inputs which may help us with solving the problem would be highly welcome.

Thanks a lot.
--

Sahil Arora

Final year B.Tech Undergrad | Indian Institute of Technology Mandi

Web: https://sahilarora535.github.io
LinkedIn: sahilarora535
Ph: <a href="tel:+91%2081305%2006047" value="+918130506047" class="gmail_msg" target="_blank" moz-do-not-send="true">+91-8130506047

Sahil Arora

Re: Optimizing multiple aggregate queries on a CEP using Flink

Hi Timo,

Thanks a lot for the help. I will be looking forward to a reply from Kostas to be clearer on this.

On Mon, 12 Feb 2018, 10:01 pm Timo Walther, <[hidden email]> wrote:

Hi Sahil,

I'm not a CEP expert but I will loop in Kostas (in CC). In general, the example that you described can be easily done with a ProcessFunction [1]. A process function not only allows to keep state (like a count) but also allows you to set timers flexibly for specific use cases such that aggregations can be triggered/reused. So in general I would say that implementing and testing such an algorithm is possible. How easy it can be interegrated into the CEP API, I don't know.

Regards,
Timo

[1] https://ci.apache.org/projects/flink/flink-docs-release-1.4/dev/stream/operators/process_function.html

Am 2/9/18 um 11:28 PM schrieb Sahil Arora:

Hi there,

We have been working on a project with the title "Optimizing Multiple Aggregate Queries over a Complex Event Processing Engine". The aim is to optimize a group of queries. Take such as "how many cars passed the post in the past 1 minute" and "how many cars passed the post in the past 2 minutes" are 2 queries, and the naive and inefficient method to answer both the queries is to independently solve both of these queries one by one and find the answer. However, the optimum way would be to minimize the computation by using the answer given by query 1 and using it in query 2. This is basically what our aim is, to minimize computation cost when we have multiple aggregate queries in a CEP.

We have been searching for some platform which supports CEP, and Flink is probably one of them. Hence, it would be very helpful if we could get some answers to the following questions:

1. Does flink already have some method of optimizing multiple aggregate queries?

2. Is it possible for us to implement / test such an algorithm in flink which considers multiple queries in a CEP, like having a database of SQL queries and testing an algorithm of our choice?

Any other inputs which may help us with solving the problem would be highly welcome.

Thanks a lot.
--

Sahil Arora

Final year B.Tech Undergrad | Indian Institute of Technology Mandi

Web: https://sahilarora535.github.io
LinkedIn: sahilarora535
Ph: <a href="tel:+91%2081305%2006047" value="+918130506047" class="m_6635186891631496388m_6393719586519281239gmail_msg" target="_blank">+91-8130506047

Sahil Arora

Final year B.Tech Undergrad | Indian Institute of Technology Mandi

Web: https://sahilarora535.github.io
LinkedIn: sahilarora535
Ph: <a href="tel:+91%2081305%2006047" value="+918130506047" class="gmail_msg" target="_blank">+91-8130506047

Kostas Kloudas

Re: Optimizing multiple aggregate queries on a CEP using Flink

Hi Sahil,

Currently CEP does not support multi-query optimizations out-of-the-box.

In some cases you can do manual optimizations to your code, but there is

no optimizer involved.

Cheers,

Kostas

On Feb 15, 2018, at 11:12 AM, Sahil Arora <[hidden email]> wrote:

Hi Timo,
Thanks a lot for the help. I will be looking forward to a reply from Kostas to be clearer on this.

On Mon, 12 Feb 2018, 10:01 pm Timo Walther, <[hidden email]> wrote:

Hi Sahil,

I'm not a CEP expert but I will loop in Kostas (in CC). In general, the example that you described can be easily done with a ProcessFunction [1]. A process function not only allows to keep state (like a count) but also allows you to set timers flexibly for specific use cases such that aggregations can be triggered/reused. So in general I would say that implementing and testing such an algorithm is possible. How easy it can be interegrated into the CEP API, I don't know.

Regards,
Timo

[1] https://ci.apache.org/projects/flink/flink-docs-release-1.4/dev/stream/operators/process_function.html

Am 2/9/18 um 11:28 PM schrieb Sahil Arora:

Hi there,

We have been working on a project with the title "Optimizing Multiple Aggregate Queries over a Complex Event Processing Engine". The aim is to optimize a group of queries. Take such as "how many cars passed the post in the past 1 minute" and "how many cars passed the post in the past 2 minutes" are 2 queries, and the naive and inefficient method to answer both the queries is to independently solve both of these queries one by one and find the answer. However, the optimum way would be to minimize the computation by using the answer given by query 1 and using it in query 2. This is basically what our aim is, to minimize computation cost when we have multiple aggregate queries in a CEP.

We have been searching for some platform which supports CEP, and Flink is probably one of them. Hence, it would be very helpful if we could get some answers to the following questions:

1. Does flink already have some method of optimizing multiple aggregate queries?

2. Is it possible for us to implement / test such an algorithm in flink which considers multiple queries in a CEP, like having a database of SQL queries and testing an algorithm of our choice?

Any other inputs which may help us with solving the problem would be highly welcome.

Thanks a lot.
--

Sahil Arora

Final year B.Tech Undergrad | Indian Institute of Technology Mandi

Web: https://sahilarora535.github.io
LinkedIn: sahilarora535
Ph: <a href="tel:+91%2081305%2006047" value="+918130506047" class="m_6635186891631496388m_6393719586519281239gmail_msg" target="_blank">+91-8130506047

--
Sahil Arora
Final year B.Tech Undergrad | Indian Institute of Technology Mandi
Web: https://sahilarora535.github.io
LinkedIn: sahilarora535
Ph: <a href="tel:+91%2081305%2006047" value="+918130506047" class="gmail_msg" target="_blank">+91-8130506047

Sahil Arora

Re: Optimizing multiple aggregate queries on a CEP using Flink

Thank you Kostas for your inputs. We will try to integrate an optimizer into flink and will get back in case we get stuck.

Regards.

On Thu, 15 Feb 2018 at 19:11 Kostas Kloudas <[hidden email]> wrote:

Hi Sahil,

Currently CEP does not support multi-query optimizations out-of-the-box.
In some cases you can do manual optimizations to your code, but there is
no optimizer involved.

Cheers,
Kostas

On Feb 15, 2018, at 11:12 AM, Sahil Arora <[hidden email]> wrote:

Hi Timo,
Thanks a lot for the help. I will be looking forward to a reply from Kostas to be clearer on this.

On Mon, 12 Feb 2018, 10:01 pm Timo Walther, <[hidden email]> wrote:

Hi Sahil,

I'm not a CEP expert but I will loop in Kostas (in CC). In general, the example that you described can be easily done with a ProcessFunction [1]. A process function not only allows to keep state (like a count) but also allows you to set timers flexibly for specific use cases such that aggregations can be triggered/reused. So in general I would say that implementing and testing such an algorithm is possible. How easy it can be interegrated into the CEP API, I don't know.

Regards,
Timo

[1] https://ci.apache.org/projects/flink/flink-docs-release-1.4/dev/stream/operators/process_function.html

Am 2/9/18 um 11:28 PM schrieb Sahil Arora:

Hi there,

We have been working on a project with the title "Optimizing Multiple Aggregate Queries over a Complex Event Processing Engine". The aim is to optimize a group of queries. Take such as "how many cars passed the post in the past 1 minute" and "how many cars passed the post in the past 2 minutes" are 2 queries, and the naive and inefficient method to answer both the queries is to independently solve both of these queries one by one and find the answer. However, the optimum way would be to minimize the computation by using the answer given by query 1 and using it in query 2. This is basically what our aim is, to minimize computation cost when we have multiple aggregate queries in a CEP.

We have been searching for some platform which supports CEP, and Flink is probably one of them. Hence, it would be very helpful if we could get some answers to the following questions:

1. Does flink already have some method of optimizing multiple aggregate queries?

2. Is it possible for us to implement / test such an algorithm in flink which considers multiple queries in a CEP, like having a database of SQL queries and testing an algorithm of our choice?

Any other inputs which may help us with solving the problem would be highly welcome.

Thanks a lot.
--

Sahil Arora

Final year B.Tech Undergrad | Indian Institute of Technology Mandi

Web: https://sahilarora535.github.io
LinkedIn: sahilarora535
Ph: <a href="tel:+91%2081305%2006047" value="+918130506047" class="m_9020530067246445362m_6635186891631496388m_6393719586519281239gmail_msg" target="_blank">+91-8130506047

--
Sahil Arora
Final year B.Tech Undergrad | Indian Institute of Technology Mandi
Web: https://sahilarora535.github.io
LinkedIn: sahilarora535
Ph: <a href="tel:+91%2081305%2006047" value="+918130506047" class="m_9020530067246445362gmail_msg" target="_blank">+91-8130506047

Sahil Arora

Final year B.Tech Undergrad | Indian Institute of Technology Mandi

Web: https://sahilarora535.github.io
LinkedIn: sahilarora535
Ph: <a href="tel:+91%2081305%2006047" value="+918130506047" class="gmail_msg" target="_blank">+91-8130506047