Flink Scheduling and FlinkML

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Flink Scheduling and FlinkML

Fábio Dias
Hi to all,

I'm building a recommendation system to my application.
I have a set of logs (that contains the user info, the hour, the button that was clicked ect...) that arrive to my Flink by kafka, then I save every log in a HDFS (HADOOP), but know I have a problem, I want to apply ML to (all) my data.

I think in 2 scenarios:
First : Transform my DataStream in a DataSet and perform the ML task. It is possible?
Second : Preform a task in flink that get the data from Hadoop and perform the ML task.

What is the best way to do it?

I already check the IncrementalLearningSkeleton but I didn't understand how to apply that to an actual real case. Is there some complex example that I could look?

Another thing that I would like to ask is how to perform the second scenario, where I need to perform this task every hour, what it is the best way to do it?

Thanks,
Fábio Dias.
Reply | Threaded
Open this post in threaded view
|

Re: Flink Scheduling and FlinkML

Theodore Vasiloudis
Hello Fabio,

what you describe sounds very possible, the easiest way to do it would be to save your incoming data in HDFS as you already do if I understand correctly,
and then use the batch ALS algorithm [1] to create your recommendations from the static data, which you could do at regular intervals.

Regards,
Theodore

On Fri, Mar 31, 2017 at 4:10 PM, Fábio Dias <[hidden email]> wrote:
Hi to all,

I'm building a recommendation system to my application.
I have a set of logs (that contains the user info, the hour, the button that was clicked ect...) that arrive to my Flink by kafka, then I save every log in a HDFS (HADOOP), but know I have a problem, I want to apply ML to (all) my data.

I think in 2 scenarios:
First : Transform my DataStream in a DataSet and perform the ML task. It is possible?
Second : Preform a task in flink that get the data from Hadoop and perform the ML task.

What is the best way to do it?

I already check the IncrementalLearningSkeleton but I didn't understand how to apply that to an actual real case. Is there some complex example that I could look?

Another thing that I would like to ask is how to perform the second scenario, where I need to perform this task every hour, what it is the best way to do it?

Thanks,
Fábio Dias.