Anomaly detection Apache Flink

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Anomaly detection Apache Flink

Salvador Vigo
Hi there,
I am working in an approach to make some experiments related with anomaly detection in real time with Apache Flink. I would like to know if there are already some open issues in the community.
The only example I found was the one of Scott Kidder and the Mux platform, 2017. If any one is already working in this topic or know some related work or publication I will be grateful.
Best, 
Reply | Threaded
Open this post in threaded view
|

Re: Anomaly detection Apache Flink

Marta Paes Moreira
Hi, Salvador.

You can find some more examples of real-time anomaly detection with Flink in these presentations from Microsoft [1] and Salesforce [2] at Flink Forward. This blogpost [3] also describes how to build that kind of application using Kinesis Data Analytics (based on Flink).

Let me know if these resources help!

[1] https://www.youtube.com/watch?v=NhOZ9Q9_wwI
[2] https://www.youtube.com/watch?v=D4kk1JM8Kcg
[3] https://towardsdatascience.com/real-time-anomaly-detection-with-aws-c237db9eaa3f

On Fri, Apr 3, 2020 at 11:37 AM Salvador Vigo <[hidden email]> wrote:
Hi there,
I am working in an approach to make some experiments related with anomaly detection in real time with Apache Flink. I would like to know if there are already some open issues in the community.
The only example I found was the one of Scott Kidder and the Mux platform, 2017. If any one is already working in this topic or know some related work or publication I will be grateful.
Best, 
Reply | Threaded
Open this post in threaded view
|

RE: Anomaly detection Apache Flink

Nienhuis, Ryan

I would also have a look at the random cut forest algorithm. This is the base algorithm that is used for anomaly detection in several AWS services (Quicksight, Kinesis Data Analytics, etc.). It doesn’t help with getting it working with Flink, but may be a good place to start for an algorithm.

 

https://github.com/aws/random-cut-forest-by-aws

 

Ryan

 

From: Marta Paes Moreira <[hidden email]>
Sent: Friday, April 3, 2020 5:25 AM
To: Salvador Vigo <[hidden email]>
Cc: user <[hidden email]>
Subject: RE: [EXTERNAL] Anomaly detection Apache Flink

 

CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.

 

Hi, Salvador.

You can find some more examples of real-time anomaly detection with Flink in these presentations from Microsoft [1] and Salesforce [2] at Flink Forward. This blogpost [3] also describes how to build that kind of application using Kinesis Data Analytics (based on Flink).

Let me know if these resources help!

[1] https://www.youtube.com/watch?v=NhOZ9Q9_wwI
[2] https://www.youtube.com/watch?v=D4kk1JM8Kcg
[3] https://towardsdatascience.com/real-time-anomaly-detection-with-aws-c237db9eaa3f

 

On Fri, Apr 3, 2020 at 11:37 AM Salvador Vigo <[hidden email]> wrote:

Hi there,

I am working in an approach to make some experiments related with anomaly detection in real time with Apache Flink. I would like to know if there are already some open issues in the community.

The only example I found was the one of Scott Kidder and the Mux platform, 2017. If any one is already working in this topic or know some related work or publication I will be grateful.

Best, 

Reply | Threaded
Open this post in threaded view
|

Re: Anomaly detection Apache Flink

Marta Paes Moreira
Forgot to mention that you might also want to have a look into Flink CEP [1], Flink's library for Complex Event Processing.

It allows you to define and detect event patterns over streams, which can come in pretty handy for anomaly detection.


On Fri, Apr 3, 2020 at 6:08 PM Nienhuis, Ryan <[hidden email]> wrote:

I would also have a look at the random cut forest algorithm. This is the base algorithm that is used for anomaly detection in several AWS services (Quicksight, Kinesis Data Analytics, etc.). It doesn’t help with getting it working with Flink, but may be a good place to start for an algorithm.

 

https://github.com/aws/random-cut-forest-by-aws

 

Ryan

 

From: Marta Paes Moreira <[hidden email]>
Sent: Friday, April 3, 2020 5:25 AM
To: Salvador Vigo <[hidden email]>
Cc: user <[hidden email]>
Subject: RE: [EXTERNAL] Anomaly detection Apache Flink

 

CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.

 

Hi, Salvador.

You can find some more examples of real-time anomaly detection with Flink in these presentations from Microsoft [1] and Salesforce [2] at Flink Forward. This blogpost [3] also describes how to build that kind of application using Kinesis Data Analytics (based on Flink).

Let me know if these resources help!

[1] https://www.youtube.com/watch?v=NhOZ9Q9_wwI
[2] https://www.youtube.com/watch?v=D4kk1JM8Kcg
[3] https://towardsdatascience.com/real-time-anomaly-detection-with-aws-c237db9eaa3f

 

On Fri, Apr 3, 2020 at 11:37 AM Salvador Vigo <[hidden email]> wrote:

Hi there,

I am working in an approach to make some experiments related with anomaly detection in real time with Apache Flink. I would like to know if there are already some open issues in the community.

The only example I found was the one of Scott Kidder and the Mux platform, 2017. If any one is already working in this topic or know some related work or publication I will be grateful.

Best, 

Reply | Threaded
Open this post in threaded view
|

Re: Anomaly detection Apache Flink

Salvador Vigo
Thanks for answer.

@Marta, First answer videos [1], [2]. It was interesting to see this two different approaches, although I was looking for some more specific implementation. Link number [3], I didn't know the existence of Kinesis, so maybe could be good for benchmarking and comparing my results with the Kinesis results. Then the approach of CEP, I am very related with this topic since my current work is based in the implementation of a CEP pipeline for monitoring. The only problem I see here is that you need in advance a predefined pattern. But it worth a try.

@Ryan, I see this idea of the random cut forest algorithm more close to the idea I am looking for. What do you mean when you say that doesn't work getting it works with Flink?

Best,

On Fri, Apr 3, 2020 at 8:47 PM Marta Paes Moreira <[hidden email]> wrote:
Forgot to mention that you might also want to have a look into Flink CEP [1], Flink's library for Complex Event Processing.

It allows you to define and detect event patterns over streams, which can come in pretty handy for anomaly detection.


On Fri, Apr 3, 2020 at 6:08 PM Nienhuis, Ryan <[hidden email]> wrote:

I would also have a look at the random cut forest algorithm. This is the base algorithm that is used for anomaly detection in several AWS services (Quicksight, Kinesis Data Analytics, etc.). It doesn’t help with getting it working with Flink, but may be a good place to start for an algorithm.

 

https://github.com/aws/random-cut-forest-by-aws

 

Ryan

 

From: Marta Paes Moreira <[hidden email]>
Sent: Friday, April 3, 2020 5:25 AM
To: Salvador Vigo <[hidden email]>
Cc: user <[hidden email]>
Subject: RE: [EXTERNAL] Anomaly detection Apache Flink

 

CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.

 

Hi, Salvador.

You can find some more examples of real-time anomaly detection with Flink in these presentations from Microsoft [1] and Salesforce [2] at Flink Forward. This blogpost [3] also describes how to build that kind of application using Kinesis Data Analytics (based on Flink).

Let me know if these resources help!

[1] https://www.youtube.com/watch?v=NhOZ9Q9_wwI
[2] https://www.youtube.com/watch?v=D4kk1JM8Kcg
[3] https://towardsdatascience.com/real-time-anomaly-detection-with-aws-c237db9eaa3f

 

On Fri, Apr 3, 2020 at 11:37 AM Salvador Vigo <[hidden email]> wrote:

Hi there,

I am working in an approach to make some experiments related with anomaly detection in real time with Apache Flink. I would like to know if there are already some open issues in the community.

The only example I found was the one of Scott Kidder and the Mux platform, 2017. If any one is already working in this topic or know some related work or publication I will be grateful.

Best, 

Reply | Threaded
Open this post in threaded view
|

RE: Anomaly detection Apache Flink

Nienhuis, Ryan

Vigo,

 

I mean that the algorithm is a standalone piece of code. There are no examples that I am aware of for running it using Flink.

 

Ryan

 

From: Salvador Vigo <[hidden email]>
Sent: Saturday, April 4, 2020 12:26 AM
To: Marta Paes Moreira <[hidden email]>
Cc: Nienhuis, Ryan <[hidden email]>; user <[hidden email]>
Subject: RE: [EXTERNAL] Anomaly detection Apache Flink

 

CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.

 

Thanks for answer.

 

@Marta, First answer videos [1], [2]. It was interesting to see this two different approaches, although I was looking for some more specific implementation. Link number [3], I didn't know the existence of Kinesis, so maybe could be good for benchmarking and comparing my results with the Kinesis results. Then the approach of CEP, I am very related with this topic since my current work is based in the implementation of a CEP pipeline for monitoring. The only problem I see here is that you need in advance a predefined pattern. But it worth a try.

 

@Ryan, I see this idea of the random cut forest algorithm more close to the idea I am looking for. What do you mean when you say that doesn't work getting it works with Flink?

 

Best,

 

On Fri, Apr 3, 2020 at 8:47 PM Marta Paes Moreira <[hidden email]> wrote:

Forgot to mention that you might also want to have a look into Flink CEP [1], Flink's library for Complex Event Processing.

It allows you to define and detect event patterns over streams, which can come in pretty handy for anomaly detection.

 

 

On Fri, Apr 3, 2020 at 6:08 PM Nienhuis, Ryan <[hidden email]> wrote:

I would also have a look at the random cut forest algorithm. This is the base algorithm that is used for anomaly detection in several AWS services (Quicksight, Kinesis Data Analytics, etc.). It doesn’t help with getting it working with Flink, but may be a good place to start for an algorithm.

 

https://github.com/aws/random-cut-forest-by-aws

 

Ryan

 

From: Marta Paes Moreira <[hidden email]>
Sent: Friday, April 3, 2020 5:25 AM
To: Salvador Vigo <[hidden email]>
Cc: user <[hidden email]>
Subject: RE: [EXTERNAL] Anomaly detection Apache Flink

 

CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.

 

Hi, Salvador.

You can find some more examples of real-time anomaly detection with Flink in these presentations from Microsoft [1] and Salesforce [2] at Flink Forward. This blogpost [3] also describes how to build that kind of application using Kinesis Data Analytics (based on Flink).

Let me know if these resources help!

[1] https://www.youtube.com/watch?v=NhOZ9Q9_wwI
[2] https://www.youtube.com/watch?v=D4kk1JM8Kcg
[3] https://towardsdatascience.com/real-time-anomaly-detection-with-aws-c237db9eaa3f

 

On Fri, Apr 3, 2020 at 11:37 AM Salvador Vigo <[hidden email]> wrote:

Hi there,

I am working in an approach to make some experiments related with anomaly detection in real time with Apache Flink. I would like to know if there are already some open issues in the community.

The only example I found was the one of Scott Kidder and the Mux platform, 2017. If any one is already working in this topic or know some related work or publication I will be grateful.

Best, 

Reply | Threaded
Open this post in threaded view
|

Re: Anomaly detection Apache Flink

Salvador Vigo
Ok, thanks for the clarification. 


On Tue, Apr 7, 2020, 7:00 PM Nienhuis, Ryan <[hidden email]> wrote:

Vigo,

 

I mean that the algorithm is a standalone piece of code. There are no examples that I am aware of for running it using Flink.

 

Ryan

 

From: Salvador Vigo <[hidden email]>
Sent: Saturday, April 4, 2020 12:26 AM
To: Marta Paes Moreira <[hidden email]>
Cc: Nienhuis, Ryan <[hidden email]>; user <[hidden email]>
Subject: RE: [EXTERNAL] Anomaly detection Apache Flink

 

CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.

 

Thanks for answer.

 

@Marta, First answer videos [1], [2]. It was interesting to see this two different approaches, although I was looking for some more specific implementation. Link number [3], I didn't know the existence of Kinesis, so maybe could be good for benchmarking and comparing my results with the Kinesis results. Then the approach of CEP, I am very related with this topic since my current work is based in the implementation of a CEP pipeline for monitoring. The only problem I see here is that you need in advance a predefined pattern. But it worth a try.

 

@Ryan, I see this idea of the random cut forest algorithm more close to the idea I am looking for. What do you mean when you say that doesn't work getting it works with Flink?

 

Best,

 

On Fri, Apr 3, 2020 at 8:47 PM Marta Paes Moreira <[hidden email]> wrote:

Forgot to mention that you might also want to have a look into Flink CEP [1], Flink's library for Complex Event Processing.

It allows you to define and detect event patterns over streams, which can come in pretty handy for anomaly detection.

 

 

On Fri, Apr 3, 2020 at 6:08 PM Nienhuis, Ryan <[hidden email]> wrote:

I would also have a look at the random cut forest algorithm. This is the base algorithm that is used for anomaly detection in several AWS services (Quicksight, Kinesis Data Analytics, etc.). It doesn’t help with getting it working with Flink, but may be a good place to start for an algorithm.

 

https://github.com/aws/random-cut-forest-by-aws

 

Ryan

 

From: Marta Paes Moreira <[hidden email]>
Sent: Friday, April 3, 2020 5:25 AM
To: Salvador Vigo <[hidden email]>
Cc: user <[hidden email]>
Subject: RE: [EXTERNAL] Anomaly detection Apache Flink

 

CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.

 

Hi, Salvador.

You can find some more examples of real-time anomaly detection with Flink in these presentations from Microsoft [1] and Salesforce [2] at Flink Forward. This blogpost [3] also describes how to build that kind of application using Kinesis Data Analytics (based on Flink).

Let me know if these resources help!

[1] https://www.youtube.com/watch?v=NhOZ9Q9_wwI
[2] https://www.youtube.com/watch?v=D4kk1JM8Kcg
[3] https://towardsdatascience.com/real-time-anomaly-detection-with-aws-c237db9eaa3f

 

On Fri, Apr 3, 2020 at 11:37 AM Salvador Vigo <[hidden email]> wrote:

Hi there,

I am working in an approach to make some experiments related with anomaly detection in real time with Apache Flink. I would like to know if there are already some open issues in the community.

The only example I found was the one of Scott Kidder and the Mux platform, 2017. If any one is already working in this topic or know some related work or publication I will be grateful.

Best,