python - Clarification in the Theano tutorial -


i reading this tutorial provided on home page of theano documentation

i not sure code given under gradient descent section.

enter image description here

i have doubts loop.

if initialize 'param_update' variable zero.

param_update = theano.shared(param.get_value()*0., broadcastable=param.broadcastable) 

and update value in remaining 2 lines.

updates.append((param, param - learning_rate*param_update)) updates.append((param_update, momentum*param_update + (1. - momentum)*t.grad(cost, param))) 

why need it?

i guess getting wrong here. can guys me!

the initialization of 'param_update' using theano.shared() tells theano reserve variable used theano functions. initialization code called once, , not used later on reset value of 'param_update' 0.

the actual value of 'param_update' updated according last line

updates.append((param_update, momentum*param_update + (1. - momentum)*t.grad(cost, param))) 

when 'train' function constructed having update dictionary argument ([23] in tutorial):

train = theano.function([mlp_input, mlp_target], cost,                         updates=gradient_updates_momentum(cost, mlp.params, learning_rate, momentum)) 

each time 'train' called, theano compute gradient of 'cost' w.r.t. 'param' , update 'param_update' new update direction according momentum rule. then, 'param' updated following update direction saved in 'param_update' appropriate 'learning_rate'.


Comments

Popular posts from this blog

javascript - Jquery show_hide, what to add in order to make the page scroll to the bottom of the hidden field once button is clicked -

javascript - Highcharts multi-color line -

javascript - Enter key does not work in search box -