Dear
My name is Cederic Bosmans and I am a masters student at the Ghent University (Belgium). I am currently working on my masters dissertation which involves Apache Flink. I want to make predictions in the streaming environment based on a model trained in the batch environment. I trained my SVM-model this way: and saved the model: Then I have an incoming datastream in the streaming environment: Since the predict function does only support DataSet and not DataStream, on stackoverflow a flink contributor mentioned that this should be done using a map/flatMap function. It would be incredible for me if you could help me a little bit further! Kind regards and thanks in advance Cederic Bosmans |
Hi Cederic, I am not familiar with SVM or machine learning but I think we can work it out together. What problem have you met when you try to implement this function? From my point of view, we can rebuild the model in the flatMap function and use it to predict the input data. There are some flatMap documents here[1]. Best, Hequn On Sun, Jul 22, 2018 at 4:12 PM, Cederic Bosmans <[hidden email]> wrote:
|
Hi Cederic,
If the model is a simple function, you can just load it and make predictions using the map/flatMap function in the StreamEnvironment. But I’m afraid the model trained by Flink-ML should be a “batch job", whose predict method takes a Dataset as the parameter and outputs another Dataset as the result. That means you cannot easily apply the model on streams, at least for now. There are two options to solve this. (1) Train the dataset using another framework to produce a simple function. (2) Adjust your model serving as a series of batch jobs. Hope that helps, Xingcan
|
One option (which I haven't tried myself) would be to somehow get the model into PMML format, and then use https://github.com/FlinkML/flink-jpmml to score the model. You could either use another machine learning framework to train the model (i.e., a framework that directly supports PMML export), or convert the Flink model into PMML. Since SVMs are fairly simple to describe, that might not be terribly difficult. On Mon, Jul 23, 2018 at 4:18 AM Xingcan Cui <[hidden email]> wrote:
David Anderson | Training Coordinator | data Artisans -- Join Flink Forward - The Apache Flink Conference Stream Processing | Event Driven | Real Time |
Dear Cederic, I did something similar as yours a while ago along this work [1] but I've always been working within the batch context. I'm also the co-author of flink-jpmml and, since a flink2pmml model saver library doesn't exist currently, I'd suggest you a twofold strategy to tackle this problem: - if your model is relatively simple, take the batch evaluate method (it belongs to your SVM classifier) and attempt to translate it in a flatMap function (hopefully you can reuse some internal utilities, Flink exploits breeze vector library under the hoods [3]). - if your model is a complex one, you should export the model into PMML and employ then [2]. For a first overview, this [4] is the library you should adopt as to export your model and this [5] can help you with the related implementation. Hope it can help and good luck! Andrea [1] https://dl.acm.org/citation.cfm?id=3070612 [2] https://github.com/ [3] https://github.com/apache/flink/blob/7034e9cfcb051ef90c5bf0960bfb50a79b3723f0/flink-libraries/flink-ml/src/main/scala/org/apache/flink/ml/pipeline/Predictor.scala#L73 [4] https://github.com/jpmml/jpmml-model [5] https://github.com/jpmml/jpmml-sparkml 2018-07-24 13:29 GMT+02:00 David Anderson <[hidden email]>:
Andrea Spina Software Engineer @ Radicalbit Srl Via Borsieri 41, 20159, Milano - IT |
Free forum by Nabble | Edit this page |