Previously we looked at dot{x}=f(x)

, $f(x)=big(A-(x^{top}Ax)I_n big)x$ - and its resulting flow - as the result from mapping dot{x}=Ax

to the sphere. However, we saw that for $Ain mathcal{S}^n_{++}$ this flow convergences to pm v_1

, with $Av_1=lambda_{mathrm{max}}(A)v_1$ such that for g(x)=|Ax|_2

we have $lim_{tto infty}gbig(x(t,x_0) big)=|A|_2$ $forall;x_0in mathbf{S}^{n-1}setminus{v_2,dots,v_n}$ . Hence, it is interesting to look at the flow from an optimization point of view: $|A|_2=sup_{xin mathbf{S}^{n-1}}|Ax|_2$ .

A fair question would be, is our flow not simply Riemannian gradient ascent for this problem? As common with these kind of problems, such a hypothesis is something you just feel.

Now, using the tools from (CU1994, p.311) we can compute the gradient of g(x)

(on the sphere) via $mathrm{grad}_{mathbf{S}^2}(g|_{mathbf{S}^2})=mathrm{grad}_{mathbf{R}^3}(g)-mathrm{d}g(xi)xi$ , where

is a vector field normal to $mathbf{S}^2$ , e.g., xi=(x,y,z)

. From there we obtain

$mathrm{grad}_{mathbf{S}^2}(g|_{mathbf{S}^2}) = left[ begin{array}{lll} (1-x^2) & -xy & -xz -xy & (1-y^2) & -yz -xz & -yz & (1-z^2) end{array}right] left[ begin{array}{l}partial_x g partial_y g partial_z g end{array}right]=G(x,y,z)nabla g.$

To make our life a bit easier, use instead of

the map

. Moreover, set $s=(x,y,z)in mathbf{R}^3$ . Then it follows that

$begin{array}{ll} mathrm{grad}_{mathbf{S}^2}(h|_{mathbf{S}^2}) &= left(I_3-mathrm{diag}(s)left[begin{array}{l} s^{top}s^{top}s^{top} end{array}right]right) 2A^{top}As &= 2left(A^{top}A-(s^{top}A^{top}Asright)I_3)s. end{array}$

Of course, to make the computation cleaner, we changed

, but the relation between $mathrm{grad}_{mathbf{S}^2}(h|_{mathbf{S}^2})$ and

is beautiful. Somehow, mapping trajectories of dot{x}=Ax

, for some $Ain mathcal{S}^n_{++}$ , to the sphere corresponds to (Riemannian) gradient ascent applied to the problem $sup_{xin S^{2}}|Ax|_2$ .

Now, we formalize the previous analysis a bit and show how fast we converge. Assume that the eigenvectors are ordered such that eigenvector q_1

corresponds to the largest eigenvalue of $Ain mathcal{S}^n_{++}$ . Then, the solution to $dot{x}=big(A-(x^{top}Ax)I_nbig)x$ is given by

$x(t) = frac{exp({At})x_0}{|exp({At})x_0|_2}.$

Let $x_0=sum^{n}_{i=1}c_iq_i$ be x_0

expressed in eigenvector coordinates, with all $q_iin mathbf{S}^{n-1}$ (normalized). Moreover, assume all eigenvalues are distinct. Then, to measure if x(t)

is near

, we compute 1-|langle x(t),q_1 rangle|

, which is

if and only if q_1

is parallel to x(t)

. To simplify the analysis a bit, we look at 1-langle x(t),q_1 rangle ^2=varepsilon

, for some perturbation $varepsilonin mathbf{R}_{geq 0}$ , this yields

$frac{c_1^2 exp({2lambda_1 t})}{sum^{n}_{i=1}c_i^2 exp({2lambda_i t})} = 1-varepsilon.$

$2log (|c_1|)+2lambda_1 t - log left( sum^{n}_{i=1}expleft{2lambda_it + 2log(|c_i|)right}right) = log(1-varepsilon).$

This log-sum-exp terms are hard to deal with, but we can apply the so-called ‘‘log-sum-exp trick’’:

$log sum_{iin I}exp (x_i) = x^{star}+log sum_{iin I}exp(x_i-x^{star}), quad forall x^{star}in mathbf{R}^n.$

$-log left(sum^{n}_{i=1}exp left{2 left[(lambda_i-lambda_1)t + logleft(frac{|c_i|}{|c_1|} right) right] right}+1 right) = log(1-varepsilon).$

We clearly observe that for tto infty

the LHS approaches

from below, which means that varepsilonto 0

from above, like intended. Of course, we also observe that the mentioned method is not completely general, we already assume distinct eigenvalues, but there is more. We do also not convergence when $x_0 in mathrm{span}{q_2,dots,q_n}$ , which is however a set of measure

on the sphere $mathbf{S}^{n-1}$ .

More interestingly, we see that the convergence rate is largely dictated by the ‘‘spectral gap/eigengap’’ lambda_2-lambda_1

. Specifically, to have a particular projection error varepsilon

, such that 1gg varepsilon>0

, we need

$t approx log left( frac{varepsilon}{1-varepsilon}right)frac{1}{lambda_2-lambda_1}.$

Comparing this to the resulting flow from $dot{s}=2left(A^{top}A-(s^{top}A^{top}Asright)I_3)s$ , $s(0)in mathbf{S}^2$ , we see that we have the same flow, but with Amapsto 2 A^2

. This is interesting, since

and

have the same eigenvectors, yet a different (scaled) spectrum. With respect to the convergence rate, we have to compare (lambda_2-lambda_1)

and

for any $lambda_1,lambda_2in mathbf{R}_{> 0}$ with lambda_1>lambda_2

(the additional

is not so interesting).

It is obvious what we will happen, the crux is, is lambda

larger or smaller than

? Can we immediately extend this to a Newton-type algorithm? Well, this fails (globally) since we work in $mathbf{R}^3$ instead of purely with $mathbf{S}^2$ . To be concrete, $mathrm{det}(G(x,y,z))|_{mathbf{S}^2}=0$ , we never have

degrees of freedom.

Of course, these observations turned out to be far from new, see for example (AMS2008, sec. 4.6).

(AMS2008) P.A. Absil, R. Mahony and R. Sepulchre: ‘‘Optimization Algorithms on Matrix Manifolds’’, 2008 Princeton University Press.
(CU1994) Constantin Udriste: ‘‘Convex Functions and Optimization Methods on Riemannian Manifolds’’, 1994 Kluwer Academic Publishers.

Riemannian Gradient Flow |5 Nov. 2019|tags: math.DG, math.DS

Riemannian Gradient Flow |5 Nov. 2019|
`tags: math.DG, math.DS`