Random Gradient-Free Optimization in Infinite Dimensional Spaces
Episode

Random Gradient-Free Optimization in Infinite Dimensional Spaces

Dec 23, 202512:52
Optimization and ControlMachine Learning
No ratings yet

Abstract

In this paper, we propose a random gradient-free method for optimization in infinite dimensional Hilbert spaces, applicable to functional optimization in diverse settings. Though such problems are often solved through finite-dimensional gradient descent over a parametrization of the functions, such as neural networks, an interesting alternative is to instead perform gradient descent directly in the function space by leveraging its Hilbert space structure, thus enabling provable guarantees and fast convergence. However, infinite-dimensional gradients are often hard to compute in practice, hindering the applicability of such methods. To overcome this limitation, our framework requires only the computation of directional derivatives and a pre-basis for the Hilbert space domain, i.e., a linearly-independent set whose span is dense in the Hilbert space. This fully resolves the tractability issue, as pre-bases are much more easily obtained than full orthonormal bases or reproducing kernels -- which may not even exist -- and individual directional derivatives can be easily computed using forward-mode scalar automatic differentiation. We showcase the use of our method to solve partial differential equations à la physics informed neural networks (PINNs), where it effectively enables provable convergence.

Summary

This paper addresses the challenge of functional optimization in infinite-dimensional Hilbert spaces, a common task in various fields like machine learning and PDE solving. While traditional approaches rely on finite-dimensional parameterizations (e.g., neural networks), these can suffer from limited representation capacity and non-convex optimization landscapes. Direct gradient descent in the function space is desirable but hindered by the difficulty of computing infinite-dimensional gradients. To overcome this, the authors propose a novel random gradient-free method inspired by techniques used in finite dimensions. Their framework only requires the computation of directional derivatives and a pre-basis for the Hilbert space. This resolves the tractability issue, as pre-bases are easier to obtain than orthonormal bases. The method involves sampling random finite-dimensional subspaces and estimating the gradient within these subspaces using directional derivatives. A key contribution is a preconditioning strategy that ensures convergence guarantees in the convex case. The authors demonstrate the effectiveness of their method by solving partial differential equations à la physics-informed neural networks (PINNs), showcasing its ability to achieve provable convergence where standard gradient descent may fail.

Key Insights

  • Novel Random Gradient-Free Method: The core contribution is a new algorithm for gradient-free optimization in infinite-dimensional Hilbert spaces, suitable for minimizing Fréchet differentiable risk functionals.
  • Pre-basis Requirement: The method cleverly requires only a pre-basis, rather than a full orthonormal basis or reproducing kernel, significantly expanding its applicability.
  • Preconditioning Strategy: A crucial aspect is the preconditioning of the gradient descent process, using a diagonal operator C, to control the variance of gradient estimates. This is essential for achieving finite second moment, a challenge in infinite-dimensional spaces.
  • Distribution Design: The authors designed a specific conditional distribution P(v|k) for sampling random directions within the finite-dimensional subspace B_k, ensuring that the marginal distribution P(v) has an identity covariance. This guarantees preconditioned unbiasedness.
  • Convergence Bound: The paper provides a convergence bound (Theorem 3) that depends on the initial error (||h-h1||_C), the Lipschitz constant of the gradient (G), and the learning rate.
  • Counterintuitive Result: Without preconditioning (λ_k = 1), a necessary condition for finite second moment is ∑⟨g_n, e_i⟩^2 / t_i < +∞ (Corollary 1), which is more restrictive than simply requiring g_n to be in the Hilbert space.
  • Performance: With λ_k = t_k and M_k = ⌈k/c⌉, the method achieves a second moment bound of E[||bg_n||_C^2] ≤ 2(1 + 2c)||g_n||^2.

Practical Implications

  • Solving PDEs: The method provides a practical alternative to PINNs for solving PDEs, particularly in scenarios where preserving the convexity of the risk functional is desired.
  • Functional Optimization: Researchers and practitioners working on functional optimization problems in areas like machine learning, inverse problems, and control theory can benefit from this framework.
  • Implementation: The algorithm is relatively straightforward to implement, requiring only directional derivative computations (easily done with automatic differentiation) and a pre-basis.
  • Future Research: The paper opens up avenues for future research, including exploring constrained, mirror, and accelerated versions of the algorithm, as well as extending it to non-convex settings and different types of boundary conditions in PDE solving.
  • Broader Applicability: The method can be applied to various scenarios where functional optimization is needed, such as statistical inverse problems and nonparametric instrumental variable regression.

Links & Resources

Authors