Optimizing multiple aggregate queries on a CEP using Flink

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Optimizing multiple aggregate queries on a CEP using Flink

Sahil Arora
Hi there,
We have been working on a project with the title "Optimizing Multiple Aggregate Queries over a Complex Event Processing Engine". The aim is to optimize a group of queries. Take such as "how many cars passed the post in the past 1 minute" and "how many cars passed the post in the past 2 minutes" are 2 queries, and the naive and inefficient method to answer both the queries is to independently solve both of these queries one by one and find the answer. However, the optimum way would be to minimize the computation by using the answer given by query 1 and using it in query 2. This is basically what our aim is, to minimize computation cost when we have multiple aggregate queries in a CEP.

We have been searching for some platform which supports CEP, and Flink is probably one of them. Hence, it would be very helpful if we could get some answers to the following questions:

1. Does flink already have some method of optimizing multiple aggregate queries?
2. Is it possible for us to implement / test such an algorithm in flink which considers multiple queries in a CEP, like having a database of SQL queries and testing an algorithm of our choice?

Any other inputs which may help us with solving the problem would be highly welcome.

Thanks a lot.
--
Sahil Arora
Final year B.Tech Undergrad | Indian Institute of Technology Mandi
Web: https://sahilarora535.github.io
LinkedIn: sahilarora535
Ph: <a href="tel:+91%2081305%2006047" value="+918130506047" class="gmail_msg" target="_blank">+91-8130506047
Reply | Threaded
Open this post in threaded view
|

Re: Optimizing multiple aggregate queries on a CEP using Flink

Timo Walther
Hi Sahil,

I'm not a CEP expert but I will loop in Kostas (in CC). In general, the example that you described can be easily done with a ProcessFunction [1]. A process function not only allows to keep state (like a count) but also allows you to set timers flexibly for specific use cases such that aggregations can be triggered/reused. So in general I would say that implementing and testing such an algorithm is possible. How easy it can be interegrated into the CEP API, I don't know.

Regards,
Timo



[1] https://ci.apache.org/projects/flink/flink-docs-release-1.4/dev/stream/operators/process_function.html

Am 2/9/18 um 11:28 PM schrieb Sahil Arora:
Hi there,
We have been working on a project with the title "Optimizing Multiple Aggregate Queries over a Complex Event Processing Engine". The aim is to optimize a group of queries. Take such as "how many cars passed the post in the past 1 minute" and "how many cars passed the post in the past 2 minutes" are 2 queries, and the naive and inefficient method to answer both the queries is to independently solve both of these queries one by one and find the answer. However, the optimum way would be to minimize the computation by using the answer given by query 1 and using it in query 2. This is basically what our aim is, to minimize computation cost when we have multiple aggregate queries in a CEP.

We have been searching for some platform which supports CEP, and Flink is probably one of them. Hence, it would be very helpful if we could get some answers to the following questions:

1. Does flink already have some method of optimizing multiple aggregate queries?
2. Is it possible for us to implement / test such an algorithm in flink which considers multiple queries in a CEP, like having a database of SQL queries and testing an algorithm of our choice?

Any other inputs which may help us with solving the problem would be highly welcome.

Thanks a lot.
--
Sahil Arora
Final year B.Tech Undergrad | Indian Institute of Technology Mandi
Web: https://sahilarora535.github.io
LinkedIn: sahilarora535
Ph: <a href="tel:+91%2081305%2006047" value="+918130506047" class="gmail_msg" target="_blank" moz-do-not-send="true">+91-8130506047


Reply | Threaded
Open this post in threaded view
|

Re: Optimizing multiple aggregate queries on a CEP using Flink

Sahil Arora
Hi Timo,
Thanks a lot for the help. I will be looking forward to a reply from Kostas to be clearer on this.
 

On Mon, 12 Feb 2018, 10:01 pm Timo Walther, <[hidden email]> wrote:
Hi Sahil,

I'm not a CEP expert but I will loop in Kostas (in CC). In general, the example that you described can be easily done with a ProcessFunction [1]. A process function not only allows to keep state (like a count) but also allows you to set timers flexibly for specific use cases such that aggregations can be triggered/reused. So in general I would say that implementing and testing such an algorithm is possible. How easy it can be interegrated into the CEP API, I don't know.

Regards,
Timo



[1] https://ci.apache.org/projects/flink/flink-docs-release-1.4/dev/stream/operators/process_function.html

Am 2/9/18 um 11:28 PM schrieb Sahil Arora:
Hi there,
We have been working on a project with the title "Optimizing Multiple Aggregate Queries over a Complex Event Processing Engine". The aim is to optimize a group of queries. Take such as "how many cars passed the post in the past 1 minute" and "how many cars passed the post in the past 2 minutes" are 2 queries, and the naive and inefficient method to answer both the queries is to independently solve both of these queries one by one and find the answer. However, the optimum way would be to minimize the computation by using the answer given by query 1 and using it in query 2. This is basically what our aim is, to minimize computation cost when we have multiple aggregate queries in a CEP.

We have been searching for some platform which supports CEP, and Flink is probably one of them. Hence, it would be very helpful if we could get some answers to the following questions:

1. Does flink already have some method of optimizing multiple aggregate queries?
2. Is it possible for us to implement / test such an algorithm in flink which considers multiple queries in a CEP, like having a database of SQL queries and testing an algorithm of our choice?

Any other inputs which may help us with solving the problem would be highly welcome.

Thanks a lot.
--
Sahil Arora
Final year B.Tech Undergrad | Indian Institute of Technology Mandi
Web: https://sahilarora535.github.io
LinkedIn: sahilarora535
Ph: <a href="tel:+91%2081305%2006047" value="+918130506047" class="m_6635186891631496388m_6393719586519281239gmail_msg" target="_blank">+91-8130506047


--
Sahil Arora
Final year B.Tech Undergrad | Indian Institute of Technology Mandi
Web: https://sahilarora535.github.io
LinkedIn: sahilarora535
Ph: <a href="tel:+91%2081305%2006047" value="+918130506047" class="gmail_msg" target="_blank">+91-8130506047
Reply | Threaded
Open this post in threaded view
|

Re: Optimizing multiple aggregate queries on a CEP using Flink

Kostas Kloudas
Hi Sahil,

Currently CEP does not support multi-query optimizations out-of-the-box.
In some cases you can do manual optimizations to your code, but there is 
no optimizer involved.

Cheers,
Kostas

On Feb 15, 2018, at 11:12 AM, Sahil Arora <[hidden email]> wrote:

Hi Timo,
Thanks a lot for the help. I will be looking forward to a reply from Kostas to be clearer on this.
 

On Mon, 12 Feb 2018, 10:01 pm Timo Walther, <[hidden email]> wrote:
Hi Sahil,

I'm not a CEP expert but I will loop in Kostas (in CC). In general, the example that you described can be easily done with a ProcessFunction [1]. A process function not only allows to keep state (like a count) but also allows you to set timers flexibly for specific use cases such that aggregations can be triggered/reused. So in general I would say that implementing and testing such an algorithm is possible. How easy it can be interegrated into the CEP API, I don't know.

Regards,
Timo



[1] https://ci.apache.org/projects/flink/flink-docs-release-1.4/dev/stream/operators/process_function.html

Am 2/9/18 um 11:28 PM schrieb Sahil Arora:
Hi there,
We have been working on a project with the title "Optimizing Multiple Aggregate Queries over a Complex Event Processing Engine". The aim is to optimize a group of queries. Take such as "how many cars passed the post in the past 1 minute" and "how many cars passed the post in the past 2 minutes" are 2 queries, and the naive and inefficient method to answer both the queries is to independently solve both of these queries one by one and find the answer. However, the optimum way would be to minimize the computation by using the answer given by query 1 and using it in query 2. This is basically what our aim is, to minimize computation cost when we have multiple aggregate queries in a CEP.

We have been searching for some platform which supports CEP, and Flink is probably one of them. Hence, it would be very helpful if we could get some answers to the following questions:

1. Does flink already have some method of optimizing multiple aggregate queries?
2. Is it possible for us to implement / test such an algorithm in flink which considers multiple queries in a CEP, like having a database of SQL queries and testing an algorithm of our choice?

Any other inputs which may help us with solving the problem would be highly welcome.

Thanks a lot.
--
Sahil Arora
Final year B.Tech Undergrad | Indian Institute of Technology Mandi
Web: https://sahilarora535.github.io
LinkedIn: sahilarora535
Ph: <a href="tel:+91%2081305%2006047" value="+918130506047" class="m_6635186891631496388m_6393719586519281239gmail_msg" target="_blank">+91-8130506047


--
Sahil Arora
Final year B.Tech Undergrad | Indian Institute of Technology Mandi
Web: https://sahilarora535.github.io
LinkedIn: sahilarora535
Ph: <a href="tel:+91%2081305%2006047" value="+918130506047" class="gmail_msg" target="_blank">+91-8130506047

Reply | Threaded
Open this post in threaded view
|

Re: Optimizing multiple aggregate queries on a CEP using Flink

Sahil Arora
Thank you Kostas for your inputs. We will try to integrate an optimizer into flink and will get back in case we get stuck.

Regards.

On Thu, 15 Feb 2018 at 19:11 Kostas Kloudas <[hidden email]> wrote:
Hi Sahil,

Currently CEP does not support multi-query optimizations out-of-the-box.
In some cases you can do manual optimizations to your code, but there is 
no optimizer involved.

Cheers,
Kostas


On Feb 15, 2018, at 11:12 AM, Sahil Arora <[hidden email]> wrote:

Hi Timo,
Thanks a lot for the help. I will be looking forward to a reply from Kostas to be clearer on this.
 

On Mon, 12 Feb 2018, 10:01 pm Timo Walther, <[hidden email]> wrote:
Hi Sahil,

I'm not a CEP expert but I will loop in Kostas (in CC). In general, the example that you described can be easily done with a ProcessFunction [1]. A process function not only allows to keep state (like a count) but also allows you to set timers flexibly for specific use cases such that aggregations can be triggered/reused. So in general I would say that implementing and testing such an algorithm is possible. How easy it can be interegrated into the CEP API, I don't know.

Regards,
Timo



[1] https://ci.apache.org/projects/flink/flink-docs-release-1.4/dev/stream/operators/process_function.html

Am 2/9/18 um 11:28 PM schrieb Sahil Arora:
Hi there,
We have been working on a project with the title "Optimizing Multiple Aggregate Queries over a Complex Event Processing Engine". The aim is to optimize a group of queries. Take such as "how many cars passed the post in the past 1 minute" and "how many cars passed the post in the past 2 minutes" are 2 queries, and the naive and inefficient method to answer both the queries is to independently solve both of these queries one by one and find the answer. However, the optimum way would be to minimize the computation by using the answer given by query 1 and using it in query 2. This is basically what our aim is, to minimize computation cost when we have multiple aggregate queries in a CEP.

We have been searching for some platform which supports CEP, and Flink is probably one of them. Hence, it would be very helpful if we could get some answers to the following questions:

1. Does flink already have some method of optimizing multiple aggregate queries?
2. Is it possible for us to implement / test such an algorithm in flink which considers multiple queries in a CEP, like having a database of SQL queries and testing an algorithm of our choice?

Any other inputs which may help us with solving the problem would be highly welcome.

Thanks a lot.
--
Sahil Arora
Final year B.Tech Undergrad | Indian Institute of Technology Mandi
Web: https://sahilarora535.github.io
LinkedIn: sahilarora535
Ph: <a href="tel:+91%2081305%2006047" value="+918130506047" class="m_9020530067246445362m_6635186891631496388m_6393719586519281239gmail_msg" target="_blank">+91-8130506047


--
Sahil Arora
Final year B.Tech Undergrad | Indian Institute of Technology Mandi
Web: https://sahilarora535.github.io
LinkedIn: sahilarora535
Ph: <a href="tel:+91%2081305%2006047" value="+918130506047" class="m_9020530067246445362gmail_msg" target="_blank">+91-8130506047

--
Sahil Arora
Final year B.Tech Undergrad | Indian Institute of Technology Mandi
Web: https://sahilarora535.github.io
LinkedIn: sahilarora535
Ph: <a href="tel:+91%2081305%2006047" value="+918130506047" class="gmail_msg" target="_blank">+91-8130506047