Spectral Radius vs Spectral Norm |28 Febr. 2020|
tags: math.LA, math.DS

Lately, linear discrete-time control theory has start to appear in areas far beyond the ordinary. As a byproduct, papers start to surface which claim that stability of linear discrete-time systems


is characterized by |A|_2<1. The confunsion is complete by calling this object the spectral norm of a matrix Ain mathbf{R}^{ntimes n}. Indeed, for fixed coordinates, stability of A is not characterized by |A|_2:=sup_{xin mathbf{S}^{n-1}}|Ax|_2, but by rho(A):=max_{iin [n]}|lambda_i(A)|. If rho(A)<1, then, all absolute eigenvalues of A are strictly smaller than one, and hence for x_k = A^k x_0, lim_{kto infty}x_k = 0. This follows from the Jordan form of A in combination with the simple observation that for lambda in mathbf{C}, lim_{kto infty} lambda^k = 0 iff |lambda|<1.

To continue, define two sets; mathcal{A}_{rho}:={Ain mathbf{R}^{ntimes n};:;rho(A)<1} and mathcal{A}_{ell_2}:={Ain mathbf{R}^{ntimes n};:;|A|_2<1}. Since^{[1]} rho(A)leq |A|_p we have mathcal{A}_{ell_2}subset mathcal{A}_{rho}. Now, the main question is, how much do we lose by approximating mathcal{A}_{rho} with mathcal{A}_{ell_2}? Motivation to do so is given by the fact that |cdot|_2 is a norm and hence a convex function^{[2]}, i.e., when given a convex polytope mathcal{P} with vertices A_i, if {A_i}_isubset mathcal{A}_{ell_2}, then mathcal{P}subset mathcal{A}_{ell_2}. Note that rho(A) is not a norm, rho(A) can be 0 without A=0 (consider an upper-triangular matrix with zeros on the main diagonal), the triangle inequality can easily fail as well. For example, let

 A_1 = left[begin{array}{ll} 0 & 1 0 & 0  end{array}right], quad A_2 = left[begin{array}{ll} 0 & 0 1 & 0  end{array}right],

Then rho(A_1)=rho(A_2)=0, but rho(A_1+A_2)=1, hence rho(A_1+A_2)leq rho(A_1)+rho(A_2) fails to hold. A setting where the aforementioned property of |cdot|_2 might help is Robust Control, say we want to assess if some algorithm rendered a compact convex set mathcal{K}, filled with A's, stable. As highlighted before, we could just check if all extreme points of mathcal{K} are members of A_{ell_2}, which might be a small and finite set. Thus, computationally, it appears to be attractive to consider mathcal{A}_{ell_2} over the generic mathcal{A}_{rho}.

As a form of critique, one might say that mathcal{A}_{rho} is a lot larger than mathcal{A}_{ell_2}. Towards such an argument, one might recall that |A|_2=sqrt{lambda_{mathrm{max}}(A^{top}A)}. Indeed, |A|_2=rho(A) if^{[3]} Ain mathsf{Sym}_n, but mathsf{Sym}_n simeq mathbf{R}^{n(n+1)/2}. Therefore, it seems like the set of A for which considering |cdot|_2 over rho(cdot) is reasonable, is negligibly small. To say a bit more, since^{[4]} |A|_2leq |A|_F we see that we can always find a ball with non-zero volume fully contained in mathcal{A}_{rho}. Hence, mathcal{A}_{rho} is at least locally dense in mathbf{R}^{ntimes n}. So in principle we could try to investigate mathrm{vol}(mathcal{A}_{ell_2})/mathrm{vol}(mathcal{A}_{rho}). For n=1, the sets agree, which degrades asymptotically. However, is this the way to go? Lets say we consider the set widehat{mathcal{A}}_{rho,N}:={Ain mathbf{R}^{ntimes n};:;rho(A)<N/(N+1)}. Clearly, the sets widehat{mathcal{A}}_{rho,N} and mathcal{A}_{rho} are different, even in volume, but for sufficiently large Nin mathbf{N}, should we care? The behaviour they parametrize is effectively the same.

We will stress that by approximating mathcal{A}_{rho} with mathcal{A}_{ell_2}, regardless of their volumetric difference, we are ignoring a full class of systems and miss out on a non-neglible set of behaviours. To see this, any system described by mathcal{A}_{ell_2} is contractive in the sense that |Ax_k|_2leq |x_k|_2, while systems in mathcal{A}_{rho} are merely asymptotically stable. They might wander of before they return, i.e., there is no reason why for all kin mathbf{N} we must have |x_{k+1}|_pleq |x_{k}|_p. We can do a quick example, consider the matrices

 A_2 = left[begin{array}{lll} 0 & -0.9 & 0.1 0.9 & 0 & 0  0 & 0 & -0.9 end{array}right], quad A_{rho} = left[begin{array}{lll} 0.1 & -0.9 & 1 0.9 & 0 & 0  0 & 0 & -0.9 end{array}right].

Then A_2in mathcal{A}_{ell_2}, A_{rho}in mathcal{A}_{rho}setminus{mathcal{A}_{ell_2}} and both A_2 and A_{rho} have |lambda_i|=0.9 forall i. We observe that indeed A_2 is contractive, for any initial condition on mathbf{S}^2, we move strictly inside the sphere, whereas for A_{rho}, when starting from the same initial condition, we take a detour outside of the sphere before converging to 0. So although A_2 and A_{rho} have the same spectrum, they parametrize different systems.

Simulated link 

In our deterministic setting this would mean that we would confine our statespace to a (solid) sphere with radius |x_0|_2, instead of mathbf{R}^n. Moreover, in linear optimal control, the resulting closed-loop system is usually not contractive. Think of the infamous pendulum on a cart. Being energy efficient has usually nothing to do with taking the shortest, in the Euclidean sense, path.

(update June 06): As suggested by Pedro Zattoni, we would like to add some nuance here. As highlighted in the first 12 minutes of this lecture by Pablo Parrilo, if A is stable, then, one can always find a (linear) change of coordinates such that the transformed system matrix is contractive. Although a very neat observation, you have to be careful, since your measurements might not come from this transformed system. So either you assume merely that A is stable, or you assume that A is contractive, but then you might need to find an additional map, mapping your states to your measurements.

[1] : Recall, |A|_p:=sup_{xin mathbf{S}^{n-1}}|Ax|_p. Then, let xin mathbf{S}^{n-1} be some eigenvector of A. Now we have |A|_pgeq |Ax|_p = |lambda x|_p = |lambda |. Since this eigenvalue is arbitrary it follows that |A|_pgeq rho(A).
[2] : Let (A_1,A_2)in mathcal{A}_{ell_2} then |theta A_1 + (1-theta)A_2|_2 leq |theta||A_1|_2 + |1-theta||A_2|_2 < 1. This follows from the |cdot|_2 being a norm.
[3] : Clearly, if Ain mathsf{Sym}_n, we have sqrt{lambda_{mathrm{max}}(A^{top}A)}=sqrt{lambda_{mathrm{max}}(A^2)}=rho(A). Now, when rho(A)=|A|_2, does this imply that Ain mathsf{Sym}_n? The answer is no, consider

 A' = left[begin{array}{lll} 0.9 & 0 & 0 0 & 0.1 & 0  0 & 0.1 & 0.1 end{array}right].

Then, A'notin mathsf{Sym}_n, yet, rho(A)=|A|_2=0.9. For the full set of conditions on A such that |A|_2=rho(A) see this paper by Goldberg and Zwas.
[4] : Recall that |A|_F = sqrt{mathrm{Tr}(A^{top}A)}=sqrt{sum_{i=1}^n lambda_i(A^{top}A)}. This expression is clearly larger or equal to sqrt{lambda_{mathrm{max}}(A^{top}A)}=|A|_2.