Broadcasted Variable not updated in a Iterative Data Set

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Broadcasted Variable not updated in a Iterative Data Set

Ventura Del Monte
Hello,

I am writing a Flink application whose purpose is to train neural networks. I have implemented a version of SDG based on mini batches using the IterativeDataSet class. In a nutshell, my code groups the input features in a set called mini batch (i have as many mini batches as the parallelism degree), then it calculates a gradient over each mini batch in a map tasks and then those gradients are merged in a reduce task. 
Unfortunately it seems the trained param is not broadcasted at the end of each epoch and it looks like that it starts the next epoch using the initial parameter. I know it is probably a programming error which I cannot see at the moment, but I was wondering if there were some guidelines for broadcasting an object in the correct way.
Is it asking too much if you could have a look at my code, please? Thank you in advance.

Regards,
Ventura
Reply | Threaded
Open this post in threaded view
|

Re: Broadcasted Variable not updated in a Iterative Data Set

Stephan Ewen
Can you share the respective part of the code, then we can have a look?

In general, a DataSet is re-broadcasted in each iteration, if it depends on the IterativeDataSet.
You have to re-grab it from the RuntimeContext in each superstep. If you call getBroadcastSet() in the open() method, then it will get executed in each superstep (open gets called at the beginning of each superstep).

Greetings,
Stephan


On Mon, May 11, 2015 at 6:36 PM, Ventura Del Monte <[hidden email]> wrote:
Hello,

I am writing a Flink application whose purpose is to train neural networks. I have implemented a version of SDG based on mini batches using the IterativeDataSet class. In a nutshell, my code groups the input features in a set called mini batch (i have as many mini batches as the parallelism degree), then it calculates a gradient over each mini batch in a map tasks and then those gradients are merged in a reduce task. 
Unfortunately it seems the trained param is not broadcasted at the end of each epoch and it looks like that it starts the next epoch using the initial parameter. I know it is probably a programming error which I cannot see at the moment, but I was wondering if there were some guidelines for broadcasting an object in the correct way.
Is it asking too much if you could have a look at my code, please? Thank you in advance.

Regards,
Ventura

Reply | Threaded
Open this post in threaded view
|

Re: Broadcasted Variable not updated in a Iterative Data Set

Ventura Del Monte
Thank you for your clarification. At the end I figured out, it was a silly programming error, now it just works fine. Thank you again for your explanation. 


2015-05-11 22:31 GMT+02:00 Stephan Ewen <[hidden email]>:
Can you share the respective part of the code, then we can have a look?

In general, a DataSet is re-broadcasted in each iteration, if it depends on the IterativeDataSet.
You have to re-grab it from the RuntimeContext in each superstep. If you call getBroadcastSet() in the open() method, then it will get executed in each superstep (open gets called at the beginning of each superstep).

Greetings,
Stephan


On Mon, May 11, 2015 at 6:36 PM, Ventura Del Monte <[hidden email]> wrote:
Hello,

I am writing a Flink application whose purpose is to train neural networks. I have implemented a version of SDG based on mini batches using the IterativeDataSet class. In a nutshell, my code groups the input features in a set called mini batch (i have as many mini batches as the parallelism degree), then it calculates a gradient over each mini batch in a map tasks and then those gradients are merged in a reduce task. 
Unfortunately it seems the trained param is not broadcasted at the end of each epoch and it looks like that it starts the next epoch using the initial parameter. I know it is probably a programming error which I cannot see at the moment, but I was wondering if there were some guidelines for broadcasting an object in the correct way.
Is it asking too much if you could have a look at my code, please? Thank you in advance.

Regards,
Ventura


Reply | Threaded
Open this post in threaded view
|

Re: Broadcasted Variable not updated in a Iterative Data Set

Stephan Ewen
Happy to hear that it works!

On Wed, May 13, 2015 at 12:42 AM, Ventura Del Monte <[hidden email]> wrote:
Thank you for your clarification. At the end I figured out, it was a silly programming error, now it just works fine. Thank you again for your explanation. 


2015-05-11 22:31 GMT+02:00 Stephan Ewen <[hidden email]>:
Can you share the respective part of the code, then we can have a look?

In general, a DataSet is re-broadcasted in each iteration, if it depends on the IterativeDataSet.
You have to re-grab it from the RuntimeContext in each superstep. If you call getBroadcastSet() in the open() method, then it will get executed in each superstep (open gets called at the beginning of each superstep).

Greetings,
Stephan


On Mon, May 11, 2015 at 6:36 PM, Ventura Del Monte <[hidden email]> wrote:
Hello,

I am writing a Flink application whose purpose is to train neural networks. I have implemented a version of SDG based on mini batches using the IterativeDataSet class. In a nutshell, my code groups the input features in a set called mini batch (i have as many mini batches as the parallelism degree), then it calculates a gradient over each mini batch in a map tasks and then those gradients are merged in a reduce task. 
Unfortunately it seems the trained param is not broadcasted at the end of each epoch and it looks like that it starts the next epoch using the initial parameter. I know it is probably a programming error which I cannot see at the moment, but I was wondering if there were some guidelines for broadcasting an object in the correct way.
Is it asking too much if you could have a look at my code, please? Thank you in advance.

Regards,
Ventura