Flink ML with DataStream

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Flink ML with DataStream

Branham, Jeremy [IT]

Hello –

I’ve been successful working with Flink in Java, but have some trouble trying to leverage the ML library, specifically with KNN.

From my understanding, this is easier in Scala [1] so I’ve been converting my code.

 

One issue I’ve encountered is – How do I get a DataSet[Vector] from a DataStream[MyClass]?

I’ve attempted to use windowing, but scala is completely new to me and I may need a push in the right direction.

 

The below code executes properly, I’m just unsure of the next step.

 

 

I’ve also seen an example [2] that looks like something I need to implement – especially the PartialModelBuilder.

Am I on the right track?

Thoughts?

 

Thanks!

 

 

[1] - https://stackoverflow.com/questions/44039857/is-there-a-apache-flink-machine-learning-tutorial-in-java-language/44040819#44040819

[2] - https://github.com/apache/flink/blob/master/flink-examples/flink-examples-streaming/src/main/scala/org/apache/flink/streaming/scala/examples/ml/IncrementalLearningSkeleton.scala

 

 

 

Jeremy D. Branham

Technology Architect - Sprint
O: +1 (972) 405-2970 | M: +1 (817) 791-1627

[hidden email]

#gettingbettereveryday

 




This e-mail may contain Sprint proprietary information intended for the sole use of the recipient(s). Any use by others is prohibited. If you are not the intended recipient, please contact the sender and delete all copies of the message.
Reply | Threaded
Open this post in threaded view
|

Re: Flink ML with DataStream

Fabian Hueske-2
Hi,

unfortunately, it is not possible to convert a DataStream into a DataSet.
Flink's DataSet and DataStream APIs are distinct APIs that cannot be used together.

The FlinkML library is only available for the DataSet API.
There is some ongoing work to add a machine learning library for streaming use cases as well, but this is still in an early stage and mostly focusing on model serving on streams, i.e, applying an externally trained model on streaming data.

Best, Fabian


2017-07-19 19:07 GMT+02:00 Branham, Jeremy [IT] <[hidden email]>:

Hello –

I’ve been successful working with Flink in Java, but have some trouble trying to leverage the ML library, specifically with KNN.

From my understanding, this is easier in Scala [1] so I’ve been converting my code.

 

One issue I’ve encountered is – How do I get a DataSet[Vector] from a DataStream[MyClass]?

I’ve attempted to use windowing, but scala is completely new to me and I may need a push in the right direction.

 

The below code executes properly, I’m just unsure of the next step.

 

 

I’ve also seen an example [2] that looks like something I need to implement – especially the PartialModelBuilder.

Am I on the right track?

Thoughts?

 

Thanks!

 

 

[1] - https://stackoverflow.com/questions/44039857/is-there-a-apache-flink-machine-learning-tutorial-in-java-language/44040819#44040819

[2] - https://github.com/apache/flink/blob/master/flink-examples/flink-examples-streaming/src/main/scala/org/apache/flink/streaming/scala/examples/ml/IncrementalLearningSkeleton.scala

 

 

 

Jeremy D. Branham

Technology Architect - Sprint
O: <a href="tel:(972)%20405-2970" value="+19724052970" target="_blank">+1 (972) 405-2970 | M: <a href="tel:(817)%20791-1627" value="+18177911627" target="_blank">+1 (817) 791-1627

[hidden email]

#gettingbettereveryday

 




This e-mail may contain Sprint proprietary information intended for the sole use of the recipient(s). Any use by others is prohibited. If you are not the intended recipient, please contact the sender and delete all copies of the message.

Reply | Threaded
Open this post in threaded view
|

RE: Flink ML with DataStream

Branham, Jeremy [IT]

Thanks Fabian –

I’m interested in the early development of ML on streams.

Harshith and I plan on doing some prototyping for NRT anomaly detection leveraging the stream API.

It would be great if we could produce something reusable for the community.

 

 

From: Fabian Hueske [mailto:[hidden email]]
Sent: Wednesday, July 19, 2017 2:12 PM
To: Branham, Jeremy [IT] <[hidden email]>
Cc: [hidden email]
Subject: Re: Flink ML with DataStream

 

Hi,

unfortunately, it is not possible to convert a DataStream into a DataSet.

Flink's DataSet and DataStream APIs are distinct APIs that cannot be used together.


The FlinkML library is only available for the DataSet API.
There is some ongoing work to add a machine learning library for streaming use cases as well, but this is still in an early stage and mostly focusing on model serving on streams, i.e, applying an externally trained model on streaming data.

Best, Fabian

 

 

2017-07-19 19:07 GMT+02:00 Branham, Jeremy [IT] <[hidden email]>:

Hello –

I’ve been successful working with Flink in Java, but have some trouble trying to leverage the ML library, specifically with KNN.

From my understanding, this is easier in Scala [1] so I’ve been converting my code.

 

One issue I’ve encountered is – How do I get a DataSet[Vector] from a DataStream[MyClass]?

I’ve attempted to use windowing, but scala is completely new to me and I may need a push in the right direction.

 

The below code executes properly, I’m just unsure of the next step.

 

 

I’ve also seen an example [2] that looks like something I need to implement – especially the PartialModelBuilder.

Am I on the right track?

Thoughts?

 

Thanks!

 

 

[1] - https://stackoverflow.com/questions/44039857/is-there-a-apache-flink-machine-learning-tutorial-in-java-language/44040819#44040819

[2] - https://github.com/apache/flink/blob/master/flink-examples/flink-examples-streaming/src/main/scala/org/apache/flink/streaming/scala/examples/ml/IncrementalLearningSkeleton.scala

 

 

 

Jeremy D. Branham

Technology Architect - Sprint
O: <a href="tel:(972)%20405-2970" target="_blank">+1 (972) 405-2970 | M: <a href="tel:(817)%20791-1627" target="_blank"> +1 (817) 791-1627

[hidden email]

#gettingbettereveryday

 

 



This e-mail may contain Sprint proprietary information intended for the sole use of the recipient(s). Any use by others is prohibited. If you are not the intended recipient, please contact the sender and delete all copies of the message.

 

Reply | Threaded
Open this post in threaded view
|

Re: Flink ML with DataStream

Fabian Hueske-2
Hi Jeremy,

here are a few links about the recent efforts for ML on streams with Flink:

- Discussion on the dev mailing list [1]
- Announcement of a Slack channel [2]
- GDocs Design Doc [3]

IMO, anomaly detection is a great use case for ML on streams.

Cheers, Fabian

[1] https://lists.apache.org/thread.html/638fdee0c361a7fb362e050e8cc79ba1e8b4162b044bcbcca31d31ed@%3Cdev.flink.apache.org%3E
[2] https://lists.apache.org/thread.html/e2a1f974300bf1f1b3ff19317a6b7fc941ebedd013950307959cf830@%3Cdev.flink.apache.org%3E
[3] https://docs.google.com/document/d/1afQbvZBTV15qF3vobVWUjxQc49h3Ud06MIRhahtJ6dw

2017-07-21 21:57 GMT+02:00 Branham, Jeremy [IT] <[hidden email]>:

Thanks Fabian –

I’m interested in the early development of ML on streams.

Harshith and I plan on doing some prototyping for NRT anomaly detection leveraging the stream API.

It would be great if we could produce something reusable for the community.

 

 

From: Fabian Hueske [mailto:[hidden email]]
Sent: Wednesday, July 19, 2017 2:12 PM
To: Branham, Jeremy [IT] <[hidden email]>
Cc: [hidden email]
Subject: Re: Flink ML with DataStream

 

Hi,

unfortunately, it is not possible to convert a DataStream into a DataSet.

Flink's DataSet and DataStream APIs are distinct APIs that cannot be used together.


The FlinkML library is only available for the DataSet API.
There is some ongoing work to add a machine learning library for streaming use cases as well, but this is still in an early stage and mostly focusing on model serving on streams, i.e, applying an externally trained model on streaming data.

Best, Fabian

 

 

2017-07-19 19:07 GMT+02:00 Branham, Jeremy [IT] <[hidden email]>:

Hello –

I’ve been successful working with Flink in Java, but have some trouble trying to leverage the ML library, specifically with KNN.

From my understanding, this is easier in Scala [1] so I’ve been converting my code.

 

One issue I’ve encountered is – How do I get a DataSet[Vector] from a DataStream[MyClass]?

I’ve attempted to use windowing, but scala is completely new to me and I may need a push in the right direction.

 

The below code executes properly, I’m just unsure of the next step.

 

 

I’ve also seen an example [2] that looks like something I need to implement – especially the PartialModelBuilder.

Am I on the right track?

Thoughts?

 

Thanks!

 

 

[1] - https://stackoverflow.com/questions/44039857/is-there-a-apache-flink-machine-learning-tutorial-in-java-language/44040819#44040819

[2] - https://github.com/apache/flink/blob/master/flink-examples/flink-examples-streaming/src/main/scala/org/apache/flink/streaming/scala/examples/ml/IncrementalLearningSkeleton.scala

 

 

 

Jeremy D. Branham

Technology Architect - Sprint
O: <a href="tel:(972)%20405-2970" target="_blank">+1 (972) 405-2970 | M: <a href="tel:(817)%20791-1627" target="_blank"> +1 (817) 791-1627

[hidden email]

#gettingbettereveryday

 

 



This e-mail may contain Sprint proprietary information intended for the sole use of the recipient(s). Any use by others is prohibited. If you are not the intended recipient, please contact the sender and delete all copies of the message.