Natural gradient descent (NGD) can be seen as a preconditioned
update where parameter changes are driven by a functional
perspective. In a spirit similar to Newton's method, the NGD update
uses, instead of the Hessian, the Gram matrix of the generating system
of the tangent space to the approximation manifold at the current
iterate, with respect to a suitable metric. Although its assemblage and
inversion is prohibitively expensive in the context of large machine
learning models, it becomes not only feasible but necessary when we
look at scientific machine learning problems using models not requiring
as many parameters. Sill, both gradient and natural gradient descent
will get stuck at any local minima. Furthermore, when the loss function
is other than L^2 distance even the natural gradient might yield
non-optimal directions at each step. The talk will focus on how we can
tackle these situations by introducing a Natural version of classical
inertial dynamic methods like Nestorov.