Abstract
In modern machine learning and generative AI applications, it often involves a computational task of generating samples satisfying a given probability density function (PDF). One particular sampling scheme, called Langevin Monte Carlo scheme, can be obtained by approximating the Langevin stochastic differential equation (SDE), which is related to Brownian motion. For many matrix manifolds without explicit coordinates, efficient Riemannian optimization schemes have been well studied yet it is quite difficult to design efficient Langevin Monte Carlo schemes by approximating Brownian motion on manifolds, due to the fact that the Langevin SDE on manifolds are in Stratonovich form. An efficient scheme can be obtained once the SDE is converted from Stratonovich form to Ito form, which was rarely discussed or used for defining Brownian motion in probability literature, since the Ito stochastic term on manifolds are only local martingales. In 1980s, it was shown by J. Lewis that Stratonovich-Ito conversion on surfaces gives precisely the mean curvature. We are able to extend this result to a parallelizable submanifold, i.e., the Stratonovich-Ito conversion on a submanifold is equal to a mean curvature normal vector, which is a concept not very often used in geometry literature. The assumption of parallelizable manifolds is quite restricive but can be relaxed via partition of unity with each piece in the partition is parallelizable. In 2000s, Stroock also mentioned mean curvature normal vector when giving an extrinsic definition of Brownian motion on submanifolds but Stroock’s results are not easy to use in this context. For Riemannian submersion and quotient manifolds, we are also able to show a similar result: the Brownian motion on a quotient manifold is related to Brownian motion on its total space via the mean curvature normal vector of each fiber/orbit or equivalent class, which is also equal to the gradient of log of the volume of each fiber/orbit for compact manifolds. We apply these two main results on the manifold of positive semi-definite matrices of fixed rank, to obtain two efficient Riemannian Langevin Monte Carlo schemes: one for its embedded geometry, and the other one is for its quotient geometry with the Bures-Wasserstein metric. For studying the convergence rate to equilibrium, it is well known that the SDE solution is related to Fokker-Planck equation, for which the exponential convergence rate to equilibrium is related to Bakry-Emery-Ricci tensor. For a special PDF, we also give some preliminary estimates of the Bakry-Emery-Ricci tensor, which suggests a difference in convergence rate between the two schemes. Numerical examples (arXiv:2309.04072) including Monte Carlo numerical integration on manifolds will be shown. This is based on joint work with Tianmin Yu (student) and Govind Menon at Brown University, Jianfeng Lu at Duke University, and Shixin Zheng (student) at Purdue University.