Natural gradient descent (NGD) can be seen as a preconditioned update where parameter changes are driven by a functional perspective. In a spirit similar to Newton's method, the NGD update uses, instead of the Hessian, the Gram matrix of the generating system of the tangent space to the approximation manifold at the current iterate, with respect to a suitable metric. Although its assemblage and inversion is prohibitively expensive in the context of large machine learning models, it becomes not only feasible but necessary when we look at scientific machine learning problems using models not requiring as many parameters. Sill, both gradient and natural gradient descent will get stuck at any local minima. Furthermore, when the loss function is other than L^2 distance even the natural gradient might yield non-optimal directions at each step. The talk will focus on how we can tackle these situations by introducing a Natural version of classical inertial dynamic methods like Nestorov.