Spectral Radius vs Spectral Norm |28 Febr. 2020|
`tags: math.LA, math.DS`

Lately, linear discrete-time control theory has start to appear in areas far beyond the ordinary. As a byproduct, papers start to surface which claim that stability of linear discrete-time systems

$x_{k+1}=Ax_k$

is characterized by |A|_2<1 . The confunsion is complete by calling this object the spectral norm of a matrix $Ain mathbf{R}^{ntimes n}$ . Indeed, for fixed coordinates, stability of is not characterized by $|A|_2:=sup_{xin mathbf{S}^{n-1}}|Ax|_2$ , but by $rho(A):=max_{iin [n]}|lambda_i(A)|$ . If rho(A)<1 , then, all absolute eigenvalues of are strictly smaller than one, and hence for x_k = A^k x_0 , $lim_{kto infty}x_k = 0$ . This follows from the Jordan form of in combination with the simple observation that for lambda in mathbf{C} , $lim_{kto infty} lambda^k = 0 iff |lambda|<1$ .

To continue, define two sets; $mathcal{A}_{rho}:={Ain mathbf{R}^{ntimes n};:;rho(A)<1}$ and $mathcal{A}_{ell_2}:={Ain mathbf{R}^{ntimes n};:;|A|_2<1}$ . Since $^{[1]}$ rho(A)leq |A|_p we have $mathcal{A}_{ell_2}subset mathcal{A}_{rho}$ . Now, the main question is, how much do we lose by approximating $mathcal{A}_{rho}$ with $mathcal{A}_{ell_2}$ ? Motivation to do so is given by the fact that |cdot|_2 is a norm and hence a convex function $^{[2]}$ , i.e., when given a convex polytope mathcal{P} with vertices A_i , if ${A_i}_isubset mathcal{A}_{ell_2}$ , then $mathcal{P}subset mathcal{A}_{ell_2}$ . Note that rho(A) is not a norm, rho(A) can be without A=0 (consider an upper-triangular matrix with zeros on the main diagonal), the triangle inequality can easily fail as well. For example, let

$A_1 = left[begin{array}{ll} 0 & 1 0 & 0 end{array}right], quad A_2 = left[begin{array}{ll} 0 & 0 1 & 0 end{array}right],$

Then rho(A_1)=rho(A_2)=0 , but rho(A_1+A_2)=1 , hence rho(A_1+A_2)leq rho(A_1)+rho(A_2) fails to hold. A setting where the aforementioned property of |cdot|_2 might help is Robust Control, say we want to assess if some algorithm rendered a compact convex set mathcal{K} , filled with 's, stable. As highlighted before, we could just check if all extreme points of are members of $A_{ell_2}$ , which might be a small and finite set. Thus, computationally, it appears to be attractive to consider $mathcal{A}_{ell_2}$ over the generic $mathcal{A}_{rho}$ .

As a form of critique, one might say that $mathcal{A}_{rho}$ is a lot larger than $mathcal{A}_{ell_2}$ . Towards such an argument, one might recall that $|A|_2=sqrt{lambda_{mathrm{max}}(A^{top}A)}$ . Indeed, |A|_2=rho(A) if $^{[3]}$ $Ain mathsf{Sym}_n$ , but $mathsf{Sym}_n simeq mathbf{R}^{n(n+1)/2}$ . Therefore, it seems like the set of for which considering |cdot|_2 over rho(cdot) is reasonable, is negligibly small. To say a bit more, since $^{[4]}$ |A|_2leq |A|_F we see that we can always find a ball with non-zero volume fully contained in $mathcal{A}_{rho}$ . Hence, $mathcal{A}_{rho}$ is at least locally dense in $mathbf{R}^{ntimes n}$ . So in principle we could try to investigate $mathrm{vol}(mathcal{A}_{ell_2})/mathrm{vol}(mathcal{A}_{rho})$ . For n=1 , the sets agree, which degrades asymptotically. However, is this the way to go? Lets say we consider the set $widehat{mathcal{A}}_{rho,N}:={Ain mathbf{R}^{ntimes n};:;rho(A)<N/(N+1)}$ . Clearly, the sets $widehat{mathcal{A}}_{rho,N}$ and $mathcal{A}_{rho}$ are different, even in volume, but for sufficiently large Nin mathbf{N} , should we care? The behaviour they parametrize is effectively the same.

We will stress that by approximating $mathcal{A}_{rho}$ with $mathcal{A}_{ell_2}$ , regardless of their volumetric difference, we are ignoring a full class of systems and miss out on a non-neglible set of behaviours. To see this, any system described by $mathcal{A}_{ell_2}$ is contractive in the sense that |Ax_k|_2leq |x_k|_2 , while systems in $mathcal{A}_{rho}$ are merely asymptotically stable. They might wander of before they return, i.e., there is no reason why for all kin mathbf{N} we must have $|x_{k+1}|_pleq |x_{k}|_p$ . We can do a quick example, consider the matrices

$A_2 = left[begin{array}{lll} 0 & -0.9 & 0.1 0.9 & 0 & 0 0 & 0 & -0.9 end{array}right], quad A_{rho} = left[begin{array}{lll} 0.1 & -0.9 & 1 0.9 & 0 & 0 0 & 0 & -0.9 end{array}right].$

Then $A_2in mathcal{A}_{ell_2}$ , $A_{rho}in mathcal{A}_{rho}setminus{mathcal{A}_{ell_2}}$ and both A_2 and $A_{rho}$ have |lambda_i|=0.9 forall i . We observe that indeed A_2 is contractive, for any initial condition on $mathbf{S}^2$ , we move strictly inside the sphere, whereas for $A_{rho}$ , when starting from the same initial condition, we take a detour outside of the sphere before converging to . So although A_2 and $A_{rho}$ have the same spectrum, they parametrize different systems.

In our deterministic setting this would mean that we would confine our statespace to a (solid) sphere with radius |x_0|_2 , instead of $mathbf{R}^n$ . Moreover, in linear optimal control, the resulting closed-loop system is usually not contractive. Think of the infamous pendulum on a cart. Being energy efficient has usually nothing to do with taking the shortest, in the Euclidean sense, path.

(update June 06): As suggested by Pedro Zattoni, we would like to add some nuance here. As highlighted in the first 12 minutes of this lecture by Pablo Parrilo, if is stable, then, one can always find a (linear) change of coordinates such that the transformed system matrix is contractive. Although a very neat observation, you have to be careful, since your measurements might not come from this transformed system. So either you assume merely that is stable, or you assume that is contractive, but then you might need to find an additional map, mapping your states to your measurements.

[1] : Recall, $|A|_p:=sup_{xin mathbf{S}^{n-1}}|Ax|_p$ . Then, let $xin mathbf{S}^{n-1}$ be some eigenvector of . Now we have |A|_pgeq |Ax|_p = |lambda x|_p = |lambda | . Since this eigenvalue is arbitrary it follows that |A|_pgeq rho(A) .
[2] : Let $(A_1,A_2)in mathcal{A}_{ell_2}$ then |theta A_1 + (1-theta)A_2|_2 leq |theta||A_1|_2 + |1-theta||A_2|_2 < 1 . This follows from the |cdot|_2 being a norm.
[3] : Clearly, if $Ain mathsf{Sym}_n$ , we have $sqrt{lambda_{mathrm{max}}(A^{top}A)}=sqrt{lambda_{mathrm{max}}(A^2)}=rho(A)$ . Now, when rho(A)=|A|_2 , does this imply that $Ain mathsf{Sym}_n$ ? The answer is no, consider

A' = left[begin{array}{lll} 0.9 & 0 & 0 0 & 0.1 & 0 0 & 0.1 & 0.1 end{array}right].

Then, $A'notin mathsf{Sym}_n$ , yet, rho(A)=|A|_2=0.9 . For the full set of conditions on such that |A|_2=rho(A) see this paper by Goldberg and Zwas.
[4] : Recall that $|A|_F = sqrt{mathrm{Tr}(A^{top}A)}=sqrt{sum_{i=1}^n lambda_i(A^{top}A)}$ . This expression is clearly larger or equal to $sqrt{lambda_{mathrm{max}}(A^{top}A)}=|A|_2$ .

Spectral Radius vs Spectral Norm |28 Febr. 2020|tags: math.LA, math.DS

Spectral Radius vs Spectral Norm |28 Febr. 2020|
`tags: math.LA, math.DS`