Fwd: Information on Flink

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Fwd: Information on Flink

Kostas Tzoumas
I am forwarding this here in case someone with better knowledge of genetic algorithms picks it up.

Kostas

---------- Forwarded message ----------
From: Andrea Ferranti <[hidden email]>
Date: Thu, Nov 13, 2014 at 4:46 PM
Subject: Re: Information on Flink
To: Kostas Tzoumas <[hidden email]>


Thanks very much for your reply.

First, can I forward this to the Flink user mailing list? Perhaps someone over there has a better answer.

Yes, of course.

Can you describe very briefly how fitness evaluation is computed in your algorithm?

My fitness evaluation is basically an evaluation of accuracy in a classification problem, so i must read every line of file(in which is present some attribute and a class) and verify if my classification work well.
So in each iteration of a genetic algorithm i change some chromosome and than evaluate the solution.

At the moment the entire program is written in C++ but I would take it in java using jMetal

Best regards, 
Andrea

Il giorno 13/nov/2014, alle ore 16:25, Kostas Tzoumas <[hidden email]> ha scritto:

Hey,

First, can I forward this to the Flink user mailing list? Perhaps someone over there has a better answer.

Can you describe very briefly how fitness evaluation is computed in your algorithm?

Kostas

On Thu, Nov 13, 2014 at 4:08 PM, Andrea Ferranti <[hidden email]> wrote:
Dear Kostas Tzoumas,
I'm Andrea Ferranti, a student of Computer Engineering at the University of Pisa.
In my thesis I would like to exploit Flink to parallelize a Evolutionary algorithm, in particular the fitness evaluation. My problem and algorithm are written in Java (jMetal).


Do you think that flink can be a good tool for the parallelization of fitness? In my problem the fitness is evaluate on very big datasets.

Best regards, 
Andrea



Reply | Threaded
Open this post in threaded view
|

Re: Information on Flink

Fabian Hueske-2
Hi Andrea,

Please correct me if I got something wrong.
You have a population of several classifiers that you want to evolve and improve.
In each iteration, you test each classifier with all your test data and evaluate the fitness of each classifier. Then, you create a new population of classifiers by removing bad classifiers and mutating (and crossing) the better ones.

I think you can do that with Flink as follows:
you use a Map over the test data and a broadcast set as the classifiers to check the classification of a single attribute. With a following reduce, you aggregate the fitness of each classifier and use a reduce all to build a new population of classifiers.
In the next iteration, the new population is broadcasted to the Map over the test data.

This is quite similar to what our KMeans example does. You should have a look at it.

Best, Fabian

From: [hidden email]
Sent: ‎Thursday‎, ‎13‎. ‎November‎, ‎2014 ‎17‎:‎11
To: [hidden email]
Cc: [hidden email]

I am forwarding this here in case someone with better knowledge of genetic algorithms picks it up.

Kostas

---------- Forwarded message ----------
From: Andrea Ferranti <[hidden email]>
Date: Thu, Nov 13, 2014 at 4:46 PM
Subject: Re: Information on Flink
To: Kostas Tzoumas <[hidden email]>


Thanks very much for your reply.

First, can I forward this to the Flink user mailing list? Perhaps someone over there has a better answer.

Yes, of course.

Can you describe very briefly how fitness evaluation is computed in your algorithm?

My fitness evaluation is basically an evaluation of accuracy in a classification problem, so i must read every line of file(in which is present some attribute and a class) and verify if my classification work well.
So in each iteration of a genetic algorithm i change some chromosome and than evaluate the solution.

At the moment the entire program is written in C++ but I would take it in java using jMetal

Best regards, 
Andrea

Il giorno 13/nov/2014, alle ore 16:25, Kostas Tzoumas <[hidden email]> ha scritto:

Hey,

First, can I forward this to the Flink user mailing list? Perhaps someone over there has a better answer.

Can you describe very briefly how fitness evaluation is computed in your algorithm?

Kostas

On Thu, Nov 13, 2014 at 4:08 PM, Andrea Ferranti <[hidden email]> wrote:
Dear Kostas Tzoumas,
I'm Andrea Ferranti, a student of Computer Engineering at the University of Pisa.
In my thesis I would like to exploit Flink to parallelize a Evolutionary algorithm, in particular the fitness evaluation. My problem and algorithm are written in Java (jMetal).


Do you think that flink can be a good tool for the parallelization of fitness? In my problem the fitness is evaluate on very big datasets.

Best regards, 
Andrea



Reply | Threaded
Open this post in threaded view
|

Re: Information on Flink

Andrea Ferranti
More or less as you described.

I exploit jMetal to build the problem and the algorithm is it really possible to integrate Flink in jMetal?

Best, 
Andrea

Il giorno 13/nov/2014, alle ore 21:44, <[hidden email]> <[hidden email]> ha scritto:

Hi Andrea,

Please correct me if I got something wrong.
You have a population of several classifiers that you want to evolve and improve.
In each iteration, you test each classifier with all your test data and evaluate the fitness of each classifier. Then, you create a new population of classifiers by removing bad classifiers and mutating (and crossing) the better ones.

I think you can do that with Flink as follows:
you use a Map over the test data and a broadcast set as the classifiers to check the classification of a single attribute. With a following reduce, you aggregate the fitness of each classifier and use a reduce all to build a new population of classifiers.
In the next iteration, the new population is broadcasted to the Map over the test data.

This is quite similar to what our KMeans example does. You should have a look at it.

Best, Fabian

From: [hidden email]
Sent: ‎Thursday‎, ‎13‎. ‎November‎, ‎2014 ‎17‎:‎11
To: [hidden email]
Cc: [hidden email]

I am forwarding this here in case someone with better knowledge of genetic algorithms picks it up.

Kostas

---------- Forwarded message ----------
From: Andrea Ferranti <[hidden email]>
Date: Thu, Nov 13, 2014 at 4:46 PM
Subject: Re: Information on Flink
To: Kostas Tzoumas <[hidden email]>


Thanks very much for your reply.

First, can I forward this to the Flink user mailing list? Perhaps someone over there has a better answer.

Yes, of course.

Can you describe very briefly how fitness evaluation is computed in your algorithm?

My fitness evaluation is basically an evaluation of accuracy in a classification problem, so i must read every line of file(in which is present some attribute and a class) and verify if my classification work well.
So in each iteration of a genetic algorithm i change some chromosome and than evaluate the solution.

At the moment the entire program is written in C++ but I would take it in java using jMetal

Best regards, 
Andrea

Il giorno 13/nov/2014, alle ore 16:25, Kostas Tzoumas <[hidden email]> ha scritto:

Hey,

First, can I forward this to the Flink user mailing list? Perhaps someone over there has a better answer.

Can you describe very briefly how fitness evaluation is computed in your algorithm?

Kostas

On Thu, Nov 13, 2014 at 4:08 PM, Andrea Ferranti <[hidden email]> wrote:
Dear Kostas Tzoumas,
I'm Andrea Ferranti, a student of Computer Engineering at the University of Pisa.
In my thesis I would like to exploit Flink to parallelize a Evolutionary algorithm, in particular the fitness evaluation. My problem and algorithm are written in Java (jMetal).


Do you think that flink can be a good tool for the parallelization of fitness? In my problem the fitness is evaluate on very big datasets.

Best regards, 
Andrea

Reply | Threaded
Open this post in threaded view
|

Re: Information on Flink

Fabian Hueske-2
Unfortunately, I am not familiar with JMetal.
I had a brief look at its website but did not figure out how it actually works.

What you certainly can do is to call jMetal from Flink code (e.g., within a Map or Reduce function). This would mean you use Flink for parallelization and call jMetal on subsets of the data. You would then need to combine the results of the distributed jMetal calls, which might be possible or not (depending on the algorithm). It is really up to the algorithm whether such a “naive” way of parallelization works. For evolutionary algorithms, it might actually work well (their randomized anyway), if you find a way to combine the partial results.

I am not sure, if it is possible to call Flink from jMetal.

Best, Fabian

From: [hidden email]
Sent: ‎Friday‎, ‎14‎. ‎November‎, ‎2014 ‎09‎:‎54
To: [hidden email]

More or less as you described.

I exploit jMetal to build the problem and the algorithm is it really possible to integrate Flink in jMetal?

Best, 
Andrea

Il giorno 13/nov/2014, alle ore 21:44, <[hidden email]> <[hidden email]> ha scritto:

Hi Andrea,

Please correct me if I got something wrong.
You have a population of several classifiers that you want to evolve and improve.
In each iteration, you test each classifier with all your test data and evaluate the fitness of each classifier. Then, you create a new population of classifiers by removing bad classifiers and mutating (and crossing) the better ones.

I think you can do that with Flink as follows:
you use a Map over the test data and a broadcast set as the classifiers to check the classification of a single attribute. With a following reduce, you aggregate the fitness of each classifier and use a reduce all to build a new population of classifiers.
In the next iteration, the new population is broadcasted to the Map over the test data.

This is quite similar to what our KMeans example does. You should have a look at it.

Best, Fabian

From: [hidden email]
Sent: ‎Thursday‎, ‎13‎. ‎November‎, ‎2014 ‎17‎:‎11
To: [hidden email]
Cc: [hidden email]

I am forwarding this here in case someone with better knowledge of genetic algorithms picks it up.

Kostas

---------- Forwarded message ----------
From: Andrea Ferranti <[hidden email]>
Date: Thu, Nov 13, 2014 at 4:46 PM
Subject: Re: Information on Flink
To: Kostas Tzoumas <[hidden email]>


Thanks very much for your reply.

First, can I forward this to the Flink user mailing list? Perhaps someone over there has a better answer.

Yes, of course.

Can you describe very briefly how fitness evaluation is computed in your algorithm?

My fitness evaluation is basically an evaluation of accuracy in a classification problem, so i must read every line of file(in which is present some attribute and a class) and verify if my classification work well.
So in each iteration of a genetic algorithm i change some chromosome and than evaluate the solution.

At the moment the entire program is written in C++ but I would take it in java using jMetal

Best regards, 
Andrea

Il giorno 13/nov/2014, alle ore 16:25, Kostas Tzoumas <[hidden email]> ha scritto:

Hey,

First, can I forward this to the Flink user mailing list? Perhaps someone over there has a better answer.

Can you describe very briefly how fitness evaluation is computed in your algorithm?

Kostas

On Thu, Nov 13, 2014 at 4:08 PM, Andrea Ferranti <[hidden email]> wrote:
Dear Kostas Tzoumas,
I'm Andrea Ferranti, a student of Computer Engineering at the University of Pisa.
In my thesis I would like to exploit Flink to parallelize a Evolutionary algorithm, in particular the fitness evaluation. My problem and algorithm are written in Java (jMetal).


Do you think that flink can be a good tool for the parallelization of fitness? In my problem the fitness is evaluate on very big datasets.

Best regards, 
Andrea