What is a deep learning optimizer

Is there a momentum option for Adam Optimizer in Keras? - Optimization, machine learning, neural network, deep learning, Keras

The question says it all. Since Adam does well on most of the data sets, I want to try optimizing the Adam Optimizer. So far I only have one momentum option for SGD in Keras


3 for the answer № 1

Short answer: Noneither in Keras nor not tensor flow [EDIT: see UPDATE at the end]

Long answer: As mentioned in the comments, Adam already has something like momentum. Here is a relevant confirmation:

Of the highly recommended An overview of the gradient descent optimization algorithms (also as paper):

In addition to storing an exponentially decaying average of past gradient gradients u [t] such as Adadelta and RMSprop, Adam also keeps an exponentially decaying average of past gradients m [t], similar to momentum

From Stanford CS231n: CNNs for visual recognition:

Adam is a recently proposed update that looks a bit like RMSProp with pizzazz

Note that some frameworks actually have a parameter for Adam, but that is actually the parameter; here is CNTK:

Momentum (Float, list, output of) - schedule. Note that this is the Beta1 parameter in the Adam article. For more information, see this CNTK Wiki article.

That is, there is an ICLR 2016 work entitled Inclusion of the Nesterov Impulse in Adam together with an implementation skeleton in Tensorflow by the author - but cannot give an opinion on this.

TO UPDATE (according to a comment by Yu-Yang below): Keras now includes an optimizer called, based on the above ICLR 2016 paper; from the documents:

Just as Adam is essentially RMSprop with impulse, Nadam Adam is RMSprop with Nesterov impulse.

It is also included as a module in Tensorflow.