python - Clarification in the Theano tutorial -

April 15, 2010

i reading this tutorial provided on home page of theano documentation

i not sure code given under gradient descent section.

enter image description here

i have doubts loop.

if initialize 'param_update' variable zero.

param_update = theano.shared(param.get_value()*0., broadcastable=param.broadcastable)

and update value in remaining 2 lines.

updates.append((param, param - learning_rate*param_update)) updates.append((param_update, momentum*param_update + (1. - momentum)*t.grad(cost, param)))

why need it?

i guess getting wrong here. can guys me!

the initialization of 'param_update' using theano.shared() tells theano reserve variable used theano functions. initialization code called once, , not used later on reset value of 'param_update' 0.

the actual value of 'param_update' updated according last line

updates.append((param_update, momentum*param_update + (1. - momentum)*t.grad(cost, param)))

when 'train' function constructed having update dictionary argument ([23] in tutorial):

train = theano.function([mlp_input, mlp_target], cost,                         updates=gradient_updates_momentum(cost, mlp.params, learning_rate, momentum))

each time 'train' called, theano compute gradient of 'cost' w.r.t. 'param' , update 'param_update' new update direction according momentum rule. then, 'param' updated following update direction saved in 'param_update' appropriate 'learning_rate'.

Search This Blog

O9

python - Clarification in the Theano tutorial -

Comments

Post a Comment

Popular posts from this blog

Error while updating a record in APEX screen -

javascript - Jquery show_hide, what to add in order to make the page scroll to the bottom of the hidden field once button is clicked -

ios - Xcode 5 "No such file or directory" -