python - Clarification in the Theano tutorial -
i reading this tutorial provided on home page of theano documentation
i not sure code given under gradient descent section.
i have doubts loop.
if initialize 'param_update' variable zero.
param_update = theano.shared(param.get_value()*0., broadcastable=param.broadcastable)
and update value in remaining 2 lines.
updates.append((param, param - learning_rate*param_update)) updates.append((param_update, momentum*param_update + (1. - momentum)*t.grad(cost, param)))
why need it?
i guess getting wrong here. can guys me!
the initialization of 'param_update' using theano.shared() tells theano reserve variable used theano functions. initialization code called once, , not used later on reset value of 'param_update' 0.
the actual value of 'param_update' updated according last line
updates.append((param_update, momentum*param_update + (1. - momentum)*t.grad(cost, param)))
when 'train' function constructed having update dictionary argument ([23] in tutorial):
train = theano.function([mlp_input, mlp_target], cost, updates=gradient_updates_momentum(cost, mlp.params, learning_rate, momentum))
each time 'train' called, theano compute gradient of 'cost' w.r.t. 'param' , update 'param_update' new update direction according momentum rule. then, 'param' updated following update direction saved in 'param_update' appropriate 'learning_rate'.
Comments
Post a Comment