Papers
arxiv:2310.01769

How Over-Parameterization Slows Down Gradient Descent in Matrix Sensing: The Curses of Symmetry and Initialization

Published on Oct 3, 2023
Authors:
,
,

Abstract

This paper rigorously shows how over-parameterization changes the convergence behaviors of gradient descent (GD) for the matrix sensing problem, where the goal is to recover an unknown low-rank ground-truth matrix from near-isotropic linear measurements. First, we consider the symmetric setting with the symmetric parameterization where M^* in R^{n times n} is a positive semi-definite unknown matrix of rank r ll n, and one uses a symmetric parameterization XX^top to learn M^*. Here X in R^{n times k} with k > r is the factor matrix. We give a novel Omega (1/T^2) lower bound of randomly initialized GD for the over-parameterized case (k >r) where T is the number of iterations. This is in stark contrast to the exact-parameterization scenario (k=r) where the convergence rate is exp (-Omega (T)). Next, we study a<PRE_TAG>symmetric setting</POST_TAG> where M^* in R^{n_1 times n_2} is the unknown matrix of rank r ll min{n_1,n_2}, and one uses an a<PRE_TAG>symmetric parameterization</POST_TAG> FG^top to learn M^* where F in R^{n_1 times k} and G in R^{n_2 times k}. Building on prior work, we give a global exact convergence result of randomly initialized GD for the exact-parameterization case (k=r) with an exp (-Omega(T)) rate. Furthermore, we give the first global exact convergence result for the over-parameterization case (k>r) with an exp(-Omega(alpha^2 T)) rate where alpha is the initialization scale. This linear convergence result in the over-parameterization case is especially significant because one can apply the a<PRE_TAG>symmetric parameterization</POST_TAG> to the symmetric setting to speed up from Omega (1/T^2) to linear convergence. On the other hand, we propose a novel method that only modifies one step of GD and obtains a convergence rate independent of alpha, recovering the rate in the exact-parameterization case.

Community

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2310.01769 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2310.01769 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2310.01769 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.