Recently, DeepMind published new work on ‘‘discovering’’ numerical linear algebra routines. This is remarkable, but I leave the assesment of this contribution to others. Rather, I like to highlight the earlier discovery of Strassen. Namely, how to improve upon the obvious matrix multiplication algorithm.

To get some intuition, consider multiplying two complex numbers $(a_1+ib_1),(a_2+ib_2)in mathbf{C}times mathbf{C}$ , that is, construct a_1a_2-b_1b_2+i(b_1a_2+a_1b_2)

. In this case, the obvious thing to do would lead to 4 multiplications, namely a_1cdot a_2

and

. However, consider using only (a_1+b_1)cdot (a_2+b_2)

and

. These multiplications suffice to construct (a_1+ib_1)cdot (a_2+ib_2)

(subtract the latter two from the first one). The emphasis is on multiplications as it seems to be folklore knowledge that multiplications (and especially divisions) are the most costly. However, note that we do use more additive operations. Are these operations not equally ‘‘costly’’? The answer to this question is delicate and still changing!

When working with integers, additions are indeed generally ‘‘cheaper’’ than multiplications. However, oftentimes we work with floating points. Roughly speaking, these are numbers fin mathbf{Q}

characterized by some (signed) mantissa

, a base

and some (signed) exponent

, specifically, f=mb^e

. For instance, consider the addition $1.2345cdot 10^{-1}+1.2345cdot 10^{-4}$ . To perform the addition, one normalizes the two numbers, e.g., one writes $1.2345cdot 10^{-1}+0.0012345cdot 10^{-1}$ . Then, to perform the addition one needs to take the appropriate precision into account, indeed, the result is $1.2357cdot 10^{-1}$ . All this example should convey is that we have sequential steps. Now consider multiplying the same numbers, that is, $(1.2345cdot 10^{-1})cdot(1.2345cdot 10^{-4})$ . In this case, one can proceed with two steps in parallel, on the one hand (1.2345)^2

and on the other hand $10^{-1-4}$ . Again, one needs to take the correct precision into account in the end, but we see that by no means multiplication must be significantly slower in general!

$A = left[begin{array}{ll} A_{11} & A_{12} A_{21} & A_{22} end{array}right],quad B = left[begin{array}{ll} B_{11} & B_{12} B_{21} & B_{22} end{array}right],$

$C = left[begin{array}{ll} C_{11} & C_{12} C_{21} & C_{22} end{array}right]= left[begin{array}{ll} A_{11}B_{11}+A_{12}B_{21} & A_{11}B_{12}+A_{12}B_{22} A_{21}B_{11}+A_{22}B_{21} & A_{21}B_{12}+A_{22}B_{22} end{array}right].$

Indeed, the most straightforward matrix multiplication algorithm for $Ain mathbf{R}^{ntimes n}$ and $Bin mathbf{R}^{ntimes n}$ costs you n^3

multiplications and n^2(n-1)

additions. Using the intuition from the complex multiplication we might expect we can do with fewer multiplications. In 1969 (Numer. Math. 13 354–356) Volker Strassen showed that this is indeed true.

Following his short, but highly influential, paper, let $mathrm{I}=(A_{11}+A_{22})(B_{11}+B_{22})$ , $mathrm{II}=(A_{21}+A_{22})B_{11}$ , $mathrm{III}=A_{11}(B_{12}-B_{22})$ , $mathrm{IV}=A_{22}(-B_{11}+B_{21})$ , $mathrm{V}=(A_{11}+A_{12})B_{22}$ , $mathrm{VI}=(-A_{11}+A_{21})(B_{11}+B_{12})$ and $mathrm{VII}=(A_{12}-A_{22})(B_{21}+B_{22})$ . Then, $C_{11}=mathrm{I}+mathrm{IV}-mathrm{V}+mathrm{VII}$ , $C_{12}=mathrm{II}+mathrm{IV}$ , $C_{12}=mathrm{III}+mathrm{V}$ and $C_{22}=mathrm{I}+mathrm{III}-mathrm{II}+mathrm{VI}$ . See that we need

multiplications and

additions to construct

for both

and

being

-dimensional. The ‘‘obvious’’ algorithm would cost

multiplications and

additions. Indeed, one can show now that for matrices of size 2^k

one would need 7^k

multiplications (via induction). Differently put, one needs $n^{log_2(7)}approx n^{2.8}$ multiplications. This bound prevails when including all operations, but only asymptotically, in the sense that $O(n^{2.8})<O(n^3)$ . In practical algorithms, not only multiplications, but also memory and indeed additions play a big role. I am particularly interested how the upcoming DeepMind algorithms will take numerical stability into account.

Endomorphisms, Tensors in Disguise |18 August 2020|
tags: math.LA

Unfortunately the radio silence remained, the good news is that even more cool stuff is on its way*!

For now, we briefly mention a very beautiful interpretation of (1,1)

-tensors. Given some vectorspace mathcal{V}

, let

be the set of linear endomorphisms from mathcal{V}

. Indeed, for finite-dimensional vector spaces one can just think of these maps as being parametrized by matrices. Now, we recall that (1,1)

-tensors are multilinear maps which take an element of mathcal{V}

and its dual space $mathcal{V}^{star}$ and map it some real number, that is, $T^1_1:mathcal{V}times mathcal{V}^{star}to mathbf{R}$ .

The claim is that for finite-dimensional mathcal{V}

the set

is isomorphic to the set of (1,1)

-tensors $mathcal{T}_1^1(mathcal{V})$ . This is rather intuitive to see, let vin mathcal{V}

and $v^{star}in mathcal{V}^{star}$ , then, we can write $langle v^{star},T_1^1 vrangle$ and think of $T_1^1in mathcal{T}^1_1(mathcal{V})$ as a matrix. This can be formalized, as is done here by showing there is a bijective map $phi:mathrm{End}(V)to mathcal{T}^1_1(mathcal{V})$ . So, we can identify any T_1^1

with some $Ain mathbf{R}^{ntimes n}$ , that is $T_1^1(v,v^{star})=v^{star}(Av)=langle v^{star},Av rangle$ . Similarly, we can fix $v^{star}$ and identify T_1^1

with a linear map in the dual space, that is $T_1^1(v^{star})(v)=(A^{star}v^{star})(v)=langle A^{star}v^{star},vrangle$ . Regardless, both represent the same tensor evaluation, hence

$langle v^{star},Av rangle = langle A^{star}v^{star},v rangle,$

Now we use this relation, recall that discrete-time linear systems are nothing more then maps vmapsto Av

, that is, linear endomorphims, usually on $mathbf{R}^n$ . Using the tensor interpretation we recover what is sometimes called the ‘‘dual system’’ to vmapsto Av

, namely $v^{star}mapsto A^{top}v^{star}$ (suggestive notation indeed). Using more traditional notation for the primal: $x_{k+1}=Ax_k$ and dual: $z_{k+1}=A^{top}z_k$ system, we recover that $langle z_{k+1},x_k rangle = langle z_k, x_{k+1}rangle forall kin mathbf{Z}$ . Then, let z_0=w

and

for

and

being left- and right-eigenvector of

, respectively. We see that this implies that mu_j^k langle w_j, v_i rangle = lambda_i^k langle w_j, v_i rangle

, for

the eigenvalue corresponding to w_j

and

the eigenvalue corresponding to v_i

. Hence, in general, the eigenspaces of

and $A^{top}$ are generically orthogonal.

In other words, we recover the geometric interpretation of mapping

-volumes under

versus $A^{top}$

Optimal coordinates |24 May 2020|
tags: math.LA, math.OC, math.DS

There was a bit of radio silence, but as indicated here, some interesting stuff is on its way.

In this post we highlight this 1976 paper by Mullis and Roberts on ’Synthesis of Minimum Roundoff Noise Fixed Point Digital Filters’.

It is known that the input-output behaviour of any sigma

, that is, the map from u(t)

is invariant under a similarity transform. To be explicit, the tuples (A,b,c,d)

and $(TAT^{-1},Tb,cT^{-1},d)$ , which correspond to the change of coordinates z(t)=Tx(t)

for some $Tin mathsf{GL}_n$ , give rise to the same input-output behaviour. Hence, one can define the equivalence relation sigma sim sigma'

by imposing that the input-output maps of sigma

and

are equivalent. By doing so we can consider the quotient $Sigma / mathsf{GL}_n$ . However, in practice, given a sigma

, is any

such that

equally useful? For example, the following

and

are similar, but clearly,

is preferred from a numerical point of view:

$A = left[begin{array}{ll} 0.5 & 10^9 0 & 0.5 end{array}right],quad A' = left[begin{array}{ll} 0.5 & 1 0 & 0.5 end{array}right].$

In what follows, we highlight the approach from Mullis and Roberts and conclude by how to optimally select $Tin mathsf{GL}_n$ . Norms will be interpreted in the ell_2

sense, that is, for ${x(t)}_{tin mathbf{N}}$ , $|x|^2=sum^{infty}_{t=0}|x(t)|_2$ . Also, in what follows we assume that x(0)=0

plus that

is stable, which will mean rho(A)<1

and that

corresponds to a minimal realization.

The first step is to quantify the error. Assume we have allocated m_i

bits at our disposal to present the $i^{mathrm{th}}$ component of x(t)

. These values for m_i

can differ, but we constrain the average by $sum^n_{i=1}m_i = nm$ for some

. Let

be a 'step-size’, such that our dynamic range of x_i(t)

is bounded by $pm mu 2^{m_i-1}$ (these are our possible representations). Next we use the modelling choices from Mullis and Roberts, of course, they are debatable, but still, we will infer some nice insights.

First, to bound the dynamic range, consider solely the effect of an input, that is, define f_i(t)

by $x(t)=sum^{infty}_{k=1}A^{k-1}b u(t-k)$ , $x_i(t)=sum^{infty}_{k=1}f_i(t)u(t-k)$ . Then we will impose the bound pm delta |f_i|

. In combination with the step size (quantization), this results in $delta^2 |f_i|^2 = (mu 2^{m_i-1})^2$ . Let

be a sequence of i.i.d. sampled random variables from mathcal{N}(0,1)

. Then we see that $mathrm{var}big(x_i(t)big)= |f_i|^2$ . Hence, one can think of delta

as scaling parameter related to the probability with which this dynamic range bound is true.

Next, assume that all the round-off errors are independent and have a variance of mu^2

. Hence, the worst-case variance of computating x_i(t+1)

. To evaluate the effect of this error on the output, assume for the moment that u(t)

is identically

. Then,

. Similar to f_i(t)

, define

as the $i^{mathrm{th}}$ component of cA^t

. As before we can compute the variance, this time of the full output signal, which yields $sigma_y^2:=mathrm{var}(y) = mu^2(n+1)sum^n_{i=1}|g_i|^2$ . Note, these expressions hinge on the indepedence assumption.

$begin{array}{lll} W^{(c)} &= AW^{(c)}A^{top} + bb^{top} &= sum^{infty}_{k=0} A^k bb^{top} (A^k)^{top}, W^{(o)} &= A^{top}W^{(o)} A + c^{top}c &=sum^{infty}_{k=0} (A^k)^{top}c^{top}c A. end{array}$

If we assume that the realization is not just minimal, but that (A,b)

is a controllable pair and that (A,c)

is an observable pair, then, $W^{(c)}succ 0$ and $W^{(o)}succ 0$ . Now the key observation is that $W^{(c)}_{ij} = langle f_i, f_j rangle$ and similarly $W^{(o)}_{ij} = langle g_i, g_j rangle$ . Hence, we can write $delta^2 |f_i|^2 = (mu 2^{m-1})^2$ as $delta^2 K_ii = (mu 2^{m-1})^2$ and indeed $sigma_y^2 = mu^2(n+1)sum^n_{i=1}W_{ii}$ . Using these Lyapunov equations we can say goodbye to the infinite-dimensional objects.

To combine these error terms, let say we have apply a coordinate transformation x'=Tx

, for some $Tin mathsf{GL}_n$ . Specifically, let

be diagonal and defined by $T_{ii}=2delta sqrt{K_{ii}}/(mu 2^m)$ . Then one can find that $K'_{ii}=K_{ii}/T_{ii}^2$ , $W'_{ii}=W_{ii}T_{ii}^2$ . Where the latter expressions allows to express the output error (variance) by

$sigma_y^2 = (n+1)delta^2 sum^n_{i=1}frac{W^{(c)}_{ii}W^{(o)}_{ii}}{(2^{m_i})^2}.$

Now we want to minimize sigma_y^2

over all

plus some optimal selection of m_i

. At this point it looks rather painful. To make it work we first reformulate the problem using the well-known arithmetic-geometric mean inequality $^{[1]}$ for non-negative sequences ${a_i}_{iin [n]}$ :

$frac{1}{n} sum^n_{i=1}a_i geq left( prod^n_{i=1}a_i right)^{1/n}.$

$sigma_y^2 = n(n+1)delta^2 frac{1}{n}sum^n_{i=1}frac{W^{(c)}_{ii}W^{(o)}_{ii}}{(2^{m_i})^2}geq n(n+1)left(frac{delta}{2^m}right)^2left(prod^n_{i=1}{W^{(c)}_{ii}W^{(o)}_{ii}}right)^{1/n}qquad (1).$

See that the right term is independent of m_i

, hence this is a lower-bound with respect to minimization over m_i

. To achieve this (to make the inequality from (1)

an equality), we can select

$m_i = m + frac{1}{2} left(log_2 W_{ii}^{(c)}W_{ii}^{(o)} - frac{1}{2}sum^n_{j=1}log_2 W_{ii}^{(c)}W_{ii}^{(o)} right).$

Indeed, as remarked in the paper, m_i

is not per se an integer. Nevertheless, by this selection we find the clean expression from (1)

to minimize over systems equivalent to sigma

, that is, over some transformation matrix

. Define a map $f:mathcal{S}^n_{succ 0}to (0,1]$ by

$f(P) = left( frac{mathrm{det}(P)}{prod^n_{i=1}P_{ii}}right)^{1/2}.$

It turns out that f(P)=1

if and only if

is diagonal. This follows $^{[2]}$ from Hadamard's inequality. We can use this map to write

$sigma_y^2 = n(n+1) left(frac{delta}{2^m} right)^2 frac{mathrm{det}(W^{(c)}W^{(o)})^{1/n}}{big(f(W^{(c)})f(W^{(o)})big)^{2/n}} geq n(n+1) left(frac{delta}{2^m}right)^2 mathrm{det}(W^{(c)}W^{(o)})^{1/n}.$

Since the term $mathrm{det}(W^{(c)}W^{(o)})$ is invariant under a transformation

, we can only optimize sigma_y^2

over a structural change in realization tuple (A,b,c,d)

, that is, we need to make $W^{(c)}$ and $W^{(o)}$ simultaneous diagonal! It turns out that this numerically 'optimal’ realization, denoted $sigma^{star}$ , is what is called a principal axis realization.

To compute it, diagonalize $W^{(o)}$ , $W^{(o)}=Q_o Lambda_o Q_o^{top}$ and define $T_1^{-1}:=Lambda_o^{1/2}Q_o^{top}$ . Next, construct the diagonalization $T_1^{-1}W^{(c)}T_1^{-top}=Q_{1,c}Lambda_{1,c}Q_{1,c}^{top}$ . Then our desired transformation is $T:=T_1 Q_{1,c}$ . First, recall that under any $Tin mathsf{GL}_n$ the pair $(W^{(c)},W^{(o)})$ becomes $(T^{-1}W^{(c)}T^{-top},T^{top}W^{(o)}T)$ . Plugging in our map

yields the transformed matrices $(Lambda_{1,c},I_n)$ , which are indeed diagonal.

: To show this AM-GM inequality, we can use Jensen's inequality for concave functions, that is, mathbf{E}[g(x)]leq g(mathbf{E}[x])

. We can use the logarithm as our prototypical concave function on $mathbf{R}_{>0}$ and find $logbig(frac{1}{n}sum^n_{i=1}x_ibig)geq frac{1}{n}sum^n_{i=1}log(x_i) = logbig((prod^n_{i=1}x_i)^{1/n} big)$ . Then, the result follows.
[2]

: The inequality attributed to Hadamard is slightly more difficult to show. In its general form the statement is that $|mathrm{det}(A)|leq prod^n_{i=1}|a_i|_2$ , for a_i

the $i^{mathrm{th}}$ column of

. The inequality becomes an equality when the columns are mutually orthogonal. The intuition is clear if one interprets the determinant as the signed volume spanned by the columns of

, which. In case $Ain mathcal{S}^n_{succ 0}$ , we know that there is

such that $A=L^{top}L$ , hence, by this simple observation it follows that

$mathrm{det}(A)=mathrm{det}(L)^2 leq prod^n_{i=1}|ell_i|_2^2 = prod^n_{i=1}a_{ii}.$

Equality holds when the columns are orthogonal, so $AA^{top}$ must be diagonal, but $A=A^{top}$ must also hold, hence, A^2

must be diagonal, and thus

must be diagonal, which is the result we use.

A Peculiar Lower Bound to the Spectral Radius |7 March 2020|
tags: math.LA

As a follow up to the previous post, we discuss a lower bound, given as exercise P7.1.14 in (GvL13). I have never seen this bound before, especially not with the corresponding proof, so here we go. Given any $Ain mathbf{R}^{ntimes n}$ , we have the following lower bound:

$rho(A)geq (sigma_1cdots sigma_n)^{1/n},$

which can be thought of as a sharpening of the bound $|lambda_i(A)|geq sigma_{mathrm{min}}(A)$ . In the case of a singular

, the bound is trivial (rho(A)geq 0)

, but in the general setting it is less obvious. To prove that this holds we need a few ingredients. First, we need that for any $X,Yin mathbf{R}^{ntimes n}$ we have mathrm{det}(XY)=mathrm{det}(X)mathrm{det}(Y)

mathrm{det}(XY)=mathrm{det}(X)mathrm{det}(Y)

. Then, we construct two decompositions of

, (1) the Jordan Normal form $A=TJT^{-1}$ , and (2) the Singular Value Decomposition $A=USigma V^{top}$ . Using the multiplicative property of the determinant, we find that

$|mathrm{det}(A)| = left|prod^n_{i=1}lambda_i(A) right| = left| prod^n_{i=1}sigma_i(A) right|.$

Hence, since $rho(A)=max_i{|lambda_i(A)|}$ it follows that $rho(A)^n geq |prod^n_{i=1}sigma_i(A)|$ and thus we have the desired result.

(GvL13) G.H. Golub and C.F. Van Loan: ‘‘Matrix Computations’’, 2013 John Hopkins University Press.

Spectral Radius vs Spectral Norm |28 Febr. 2020|
tags: math.LA, math.DS

Lately, linear discrete-time control theory has start to appear in areas far beyond the ordinary. As a byproduct, papers start to surface which claim that stability of linear discrete-time systems

is characterized by |A|_2<1

. The confunsion is complete by calling this object the spectral norm of a matrix $Ain mathbf{R}^{ntimes n}$ . Indeed, stability of

is not characterized by $|A|_2:=sup_{xin mathbf{S}^{n-1}}|Ax|_2$ , but by $rho(A):=max_{iin [n]}|lambda_i(A)|$ . If rho(A)<1

, then all absolute eigenvalues

of are strictly smaller than one, and hence for x_k = A^k x_0

, $lim_{kto infty}x_k = 0$ . This follows from the Jordan form of

in combination with the simple observation that for lambda in mathbf{C}

, $lim_{kto infty} lambda^k = 0 iff |lambda|<1$ .

To continue, define two sets; $mathcal{A}_{rho}:={Ain mathbf{R}^{ntimes n};:;rho(A)<1}$ and $mathcal{A}_{ell_2}:={Ain mathbf{R}^{ntimes n};:;|A|_2<1}$ . Since $^{[1]}$ rho(A)leq |A|_p

we have $mathcal{A}_{ell_2}subset mathcal{A}_{rho}$ . Now, the main question is, how much do we lose by approximating $mathcal{A}_{rho}$ with $mathcal{A}_{ell_2}$ ? Motivation to do so is given by the fact that |cdot|_2

is a norm and hence a convex function $^{[2]}$ , i.e., when given a convex polytope mathcal{P}

with vertices A_i

, if ${A_i}_isubset mathcal{A}_{ell_2}$ , then $mathcal{P}subset mathcal{A}_{ell_2}$ . Note that rho(A)

is not a norm, rho(A)

can be

without

(consider an upper-triangular matrix with zeros on the main diagonal), the triangle inequality can easily fail as well. For example, let

$A_1 = left[begin{array}{ll} 0 & 1 0 & 0 end{array}right], quad A_2 = left[begin{array}{ll} 0 & 0 1 & 0 end{array}right],$

Then

, but

, hence

fails to hold. A setting where the aforementioned property of |cdot|_2

might help is Robust Control, say we want to assess if some algorithm rendered a compact convex set mathcal{K}

, filled with

's, stable. As highlighted before, we could just check if all extreme points of mathcal{K}

are members of $A_{ell_2}$ , which might be a small and finite set. Thus, computationally, it appears to be attractive to consider $mathcal{A}_{ell_2}$ over the generic $mathcal{A}_{rho}$ .

As a form of critique, one might say that $mathcal{A}_{rho}$ is a lot larger than $mathcal{A}_{ell_2}$ . Towards such an argument, one might recall that $|A|_2=sqrt{lambda_{mathrm{max}}(A^{top}A)}$ . Indeed, |A|_2=rho(A)

if $^{[3]}$ $Ain mathsf{Sym}_n$ , but $mathsf{Sym}_n simeq mathbf{R}^{n(n+1)/2}$ . Therefore, it seems like the set of

for which considering |cdot|_2

over

is reasonable, is negligibly small. To say a bit more, since $^{[4]}$ |A|_2leq |A|_F

we see that we can always find a ball with non-zero volume fully contained in $mathcal{A}_{rho}$ . Hence, $mathcal{A}_{rho}$ is at least locally dense in $mathbf{R}^{ntimes n}$ . The same holds for $mathcal{A}_2$ (which is more obvious). So in principle we could try to investigate $mathrm{vol}(mathcal{A}_{ell_2})/mathrm{vol}(mathcal{A}_{rho})$ . For n=1

, the sets agree, which degrades asymptotically. However, is this the way to go? Lets say we consider the set $widehat{mathcal{A}}_{rho,N}:={Ain mathbf{R}^{ntimes n};:;rho(A)<N/(N+1)}$ . Clearly, the sets $widehat{mathcal{A}}_{rho,N}$ and $mathcal{A}_{rho}$ are different, even in volume, but for sufficiently large Nin mathbf{N}

, should we care? The behaviour they parametrize is effectively the same.

We will stress that by approximating $mathcal{A}_{rho}$ with $mathcal{A}_{ell_2}$ , regardless of their volumetric difference, we are ignoring a full class of systems and miss out on a non-neglible set of behaviours. To see this, any system described by $mathcal{A}_{ell_2}$ is contractive in the sense that |Ax_k|_2leq |x_k|_2

, while systems in $mathcal{A}_{rho}$ are merely asymptotically stable. They might wander of before they return, i.e., there is no reason why for all kin mathbf{N}

we must have $|x_{k+1}|_pleq |x_{k}|_p$ . We can do a quick example, consider the matrices

$A_2 = left[begin{array}{lll} 0 & -0.9 & 0.1 0.9 & 0 & 0 0 & 0 & -0.9 end{array}right], quad A_{rho} = left[begin{array}{lll} 0.1 & -0.9 & 1 0.9 & 0 & 0 0 & 0 & -0.9 end{array}right].$

Then $A_2in mathcal{A}_{ell_2}$ , $A_{rho}in mathcal{A}_{rho}setminus{mathcal{A}_{ell_2}}$ and both A_2

and $A_{rho}$ have |lambda_i|=0.9 forall i

. We observe that indeed A_2

is contractive, for any initial condition on $mathbf{S}^2$ , we move strictly inside the sphere, whereas for $A_{rho}$ , when starting from the same initial condition, we take a detour outside of the sphere before converging to

. So although A_2

and $A_{rho}$ have the same spectrum, they parametrize different systems.

In our deterministic setting this would mean that we would confine our statespace to a (solid) sphere with radius |x_0|_2

, instead of $mathbf{R}^n$ . Moreover, in linear optimal control, the resulting closed-loop system is usually not contractive. Think of the infamous pendulum on a cart. Being energy efficient has usually nothing to do with taking the shortest, in the Euclidean sense, path.

: Recall, $|A|_p:=sup_{xin mathbf{S}^{n-1}}|Ax|_p$ . Then, let $xin mathbf{S}^{n-1}$ be some eigenvector of

. Now we have |A|_pgeq |Ax|_p = |lambda x|_p = |lambda |

. Since this eigenvalue is arbitrary it follows that |A|_pgeq rho(A)

: Let $(A_1,A_2)in mathcal{A}_{ell_2}$ then |theta A_1 + (1-theta)A_2|_2 leq |theta||A_1|_2 + |1-theta||A_2|_2 < 1

. This follows from the |cdot|_2

being a norm.
[3]

: Clearly, if $Ain mathsf{Sym}_n$ , we have $sqrt{lambda_{mathrm{max}}(A^{top}A)}=sqrt{lambda_{mathrm{max}}(A^2)}=rho(A)$ . Now, when rho(A)=|A|_2

, does this imply that $Ain mathsf{Sym}_n$ ? The answer is no, consider

Then, $A'notin mathsf{Sym}_n$ , yet, rho(A)=|A|_2=0.9

. For the full set of conditions on

such that

see this paper by Goldberg and Zwas.
[4]

: Recall that $|A|_F = sqrt{mathrm{Tr}(A^{top}A)}=sqrt{sum_{i=1}^n lambda_i(A^{top}A)}$ . This expression is clearly larger or equal to $sqrt{lambda_{mathrm{max}}(A^{top}A)}=|A|_2$ .

A Special Group, Volume Preserving Feedback |16 Nov. 2019|
tags: math.LA, math.DS

This short note is inspired by the beautiful visualization techniques from Duistermaat and Kolk (DK1999). Let's say we have a

-dimensional linear discrete-time dynamical system $x_{k+1}=Ax_k$ , which preserves the volume of the cube [-1,1]^2

under the dynamics, i.e. $mathrm{Vol}([-1,1]^2)=mathrm{Vol}(A^k[-1,1]^2)$ for any kin mathbf{Z}

Formally put, this means that

is part of a certain group, specifically, consider some field

and define the Special Linear group by

Now, assume that we are only interested in matrices Ain mathsf{SL}(2,mathbf{R})

such that the cube [-1,1]^2

remains bounded under the dynamics, i.e., $lim_{kto infty}A^k[-1,1]^2$ is bounded. In this scenario we restrict our attention to mathsf{SO}(2,mathbf{R})subset mathsf{SL}(2,mathbf{R})

mathsf{SO}(2,mathbf{R})subset mathsf{SL}(2,mathbf{R})

. To see why this is needed, consider A_1

and

both with determinant

$A_1 = left[ begin{array}{ll} 2 & 0 0 & frac{1}{2} end{array}right], quad A_2 = left[ begin{array}{ll} 1 & 2 0 & 1 end{array}right],$

If we look at the image of [-1,1]^2

under both these maps (for several iterations), we see that volume is preserved, but also clearly that the set is extending indefinitely in the horizontal direction.

To have a somewhat interesting problem, assume we are given $x_{k+1}=(A+BK)x_k$ with Ain mathsf{SL}

while it is our task to find a

such that

, hence, find feedback which not only preserves the volume, but keeps any set of initial conditions (say [-1,1]^2

) bounded over time.

Towards a solution, we might ask, given any Ain mathsf{SL}(2,mathbf{R})

what is the nearest (in some norm) rotation matrix? This turns out to be a classic question, starting from orthogonal matrices, the solution to $mathcal{P}:min_{widehat{A}in mathsf{O}(n,mathbf{R})}|A-widehat{A}|_F^2$ is given by $widehat{A}=UV^{top}$ , where (U,V)

follows from the SVD of

, $A=USigma V^{top}$ . Differently put, when we use a polar decomposition of

, then the solution is given by widetilde{U}

. See this note for a quick multiplier proof. An interesting sidenote, problem mathcal{P}

is well-defined since mathsf{O}(n,mathbf{R})

is compact. To see this, recall that for any Qin mathsf{O}(n,mathbf{R})

we have $|Q|_F = sqrt{n}$ , hence the set is bounded, plus mathsf{O}(n,mathbf{R})

is the pre-image of I_n

under $varphi:Amapsto A^{top}A$ , Ain mathsf{GL}(n,mathbf{R})

, hence the set is closed as well.

This is great, however, we would like to optimize over mathsf{SO}subset mathsf{O}

instead. To do so, one usually resorts to simply checking the sign - and if necessary - flipping it via finding the closest matrix with a change in determinant sign. We will see that by selecting appropriate coordinates we can find the solution without this checking of signs.

For today we look at mathsf{SL}(2,mathbf{R})

and largely follow (DK1999), for such a

-dimensional matrix, the determinant requirement translates to ad-bc=1

. Under the following (invertible) linear change of coordinates

becomes

, i.e., for any pair $(q,r)in mathbf{R}^2$ the point (p,s)

runs over a circle with radius $(q^2+r^2+1)^{1/2}$ . Hence, we obtain the diffeomorphism $mathsf{SL}(2,mathbf{R})simeq mathbf{S}^1 times mathbf{R}^2$ . We can use however a more pretty diffeomorphism of the sphere and an open set in $mathbf{R}^2$ . To that end, use the diffeomorphism:

$(theta,(u,v)) mapsto frac{1}{sqrt{1-u^2-v^2}} left[ begin{array}{ll} cos(theta)+u & -sin(theta)+v sin(theta)+v & cos(theta)-u end{array}right] in mathsf{SL}(2,mathbf{R}),$

for

(formal way of declaring that theta

should not be any real number) and the pair (u,v)

being part of $mathbf{D}_o:={(u,v)in mathbf{R}^2;|; u^2+v^2<1}$ (the open unit-disk). To see this, recall that the determinant is not a linear operator. Since we have a

-dimensional example we can explicitly compute the eigenvalues of any Ain mathsf{SL}(2,mathbf{R})

(plug

into the characteristic equation) and obtain:

$lambda_{1,2} = frac{mathrm{Tr}(A)pmsqrt{mathrm{Tr}(A)^2-4}}{2},quad mathrm{Tr}(A) = frac{2cos(theta)}{sqrt{1-u^2-v^2}}.$

At this point, the only demand on the eigenvalues is that lambda_1 cdot lambda_2 =1

. Therefore, when we would consider

within the context of a discrete linear dynamical system $x_{k+1}=Ax_k$ ,

is either a saddle-point, or we speak of a marginally stable system ( $|lambda_{1,2}|=1$ ). We can visualize the set of all marginally stable Ain mathsf{SL}

, called

, defined by all $big(theta,(u,v)big)in mathbf{R}/2pimathbf{Z}times mathbf{D}_o$ satisfying

$left|frac{cos(theta)}{sqrt{1-u^2-v^2}}pm left(frac{1-u^2-v^2-sin(theta)^2}{1-u^2-v^2} right)^{1/2} right|=1$

(DK1999) J.J. Duistermaat and J.A.C. Kolk : ‘‘Lie Groups’’, 1999 Springer.

Simple Flow Algorithm to Compute the Operator Norm |4 Nov. 2019|
tags: math.LA

Using the ideas from the previous post, we immediately obtain a dynamical system to compute |A|_2

for any $Ain mathcal{S}^n_{++}$ (symmetric positive definite matrices). Here we use the vector field dot{x}=f(x)

, defined by $f(x)=big(A-(x^{top}Ax)I_nbig)x$ and denote a solution by x(t,x_0)

, where

is the initial condition. Moreover, define a map $g:mathbf{R}^nto mathbf{R}_{geq 0}$ by g(x)=|Ax|_2

. Then $lim_{tto infty}gbig(x(t,x_0) big)=|A|_2$ $forall;x_0in mathbf{S}^{n-1}setminus{v_2,dots,v_n}$ , where v_1

is the eigenvector corresponding to the largest eigenvalue.

Finite Calls to Stability | 31 Oct. 2019 |
tags: math.LA, math.DS

Consider an unknown dynamical system $x_{k+1}=Ax_k$ parametrized by Ain mathsf{Sym}(n,mathbf{R})

. In this short note we look at a relatively simple, yet neat method to assess stability of

via finite calls to a particular oracle. Here, we will follow the notes by Roman Vershynin (RV2011).

Assume we have no knowledge of

, but we are able to obtain langle Ay, y rangle

for any desired $yin mathbf{R}^n$ . It turns out that we can use this oracle to bound rho(A)

from above.

First, recall that since

is symmetric, $|A|_2=sigma_{mathrm{max}}(A)=|lambda_{mathrm{max}}(A)|=rho(A)$ . Now, the common definition of |A|_2

is $sup_{xin mathbf{R}^nsetminus{0}}|Ax|_2/|x|_2$ . The constraint set $mathbf{R}^nsetminus{0}$ is not per se convenient from a computational point of view. Instead, we can look at a compact constraint set: $|A|_2=sup_{xin mathbf{S}^{n-1}} |Ax|_2$ , where $mathbf{S}^{n-1}$ is the sphere in $mathbf{R}^n$ . The latter formulation has an advantage, we can discretize these compact constraints sets and tame functions over the individual chunks.

To do this formally, we can use varepsilon

-nets, which are convenient in quantitative analysis on compact metric spaces (X,d)

. Let $mathcal{N}_{varepsilon}subseteq X$ be a

-net for

if $forall xin X;exists yin mathcal{N}_{varepsilon}:d(x,y)leq varepsilon$ . Hence, we measure the amount of balls to cover

For example, take $(mathbf{S}^{n-1},ell^n_2)$ as our metric space where we want to bound $|mathcal{N}_{varepsilon}|$ (the cardinality, i.e., the amount of balls). Then, to construct a minimal set of balls, consider an

-net of balls $B_{varepsilon/2}(y)$ , with $d(y_1,y_2)geq varepsilon;forall y_1,y_2in mathcal{N}_{varepsilon}$ . Now the insight is simply that $mathrm{vol}(B_{varepsilon/2})|mathcal{N}_{varepsilon}|leq mathrm{vol}(B_{1+varepsilon/2})-mathrm{vol}(B_{1-varepsilon/2})$ . Luckily, we understand the volume of

-dimensional Euclidean balls well such that we can establish

$|mathcal{N}_{varepsilon}| leq frac{mathrm{vol}(B_{1+varepsilon/2})-mathrm{vol}(B_{1-varepsilon/2})}{mathrm{vol}(B_{varepsilon/2})}= (1+2/varepsilon)^n-(1-2/varepsilon)^n.$

Next, we recite Lemma 5.4 from (RV2011). Given an varepsilon

-net for $mathbf{S}^{n-1}$ with varepsilonin[0,1)

we have

$|A|_2 = sup_{xin mathbf{S}^{n-1}}|langle Ax,xrangle | leq frac{1}{(1-2varepsilon)}sup_{xin mathcal{N}_{varepsilon}}|langle Ax,x rangle |$

This can be shown using the Cauchy-Schwartz inequality. Now, it would be interesting to see how sharp this bound is. To that end, we do a

-dimensional example ( x=(x_1,x_2)

). Note, here we can easily find an optimal grid using polar coordinates. Let the unknown system be parametrized by

such that

. In the figures below we compute the upper-bound to |A|_2

for several nets $mathcal{N}_{varepsilon}$ . Indeed, for finer nets, we see the upper-bound approaching 0.85

. However, we see that the convergence is slow while being inversely related to the exponential growth in $|mathcal{N}_{varepsilon}|$ .

Overall, we see that after about a 100 calls to our oracle we can be sure about

being exponentially stable. Here we just highlighted an example from (RV2011), at a later moment we will discuss its use in probability theory.

Help the (skinny) Hedgehogs | 31 Oct. 2019 |
tags: math.LA, other.Nature

In this first post we will see that the ‘‘almost surely’’ in the title is not only hinting at stochasticity.

For a long time this picture - as taken from Gabriel Peyré his outstanding twitter feed - has been on a wall in our local wildlife (inc. hedgehogs) shelter. This has lead to a few fun discussions and here we try to explain the picture a bit more.

Let $B^{ell_1^n}_1 := {xin mathbf{R}^n;:;sum^n_{i=1}|x_i|leq 1}$ . Now, we want to find the largest ball $B^{ell_2^n}_r:={xin mathbf{R}^n;:; sum^n_{i=1}x_i^2leq r^2}$ within $B^{ell_1^n}_1$ . We can find that $min_{xin partial B^{ell_1^n}_1}|x|_2$ is given by $|x_i^{star}|=n^{-1};forall iin overline{n}$ with $|x^{star}|_2=sqrt{1/n}$ . This follows the most easily from geometric intuition.

Besides being a fun visualization of concentration, this picture means a bit more to me. For years I have been working at the Wildlife shelter in Delft and the moment has come where the lack of financial support from surrounding municipalities is becoming critical. As usual, this problem can be largely attributed to bureaucracy and the exorbitant amount of managers this world has. Nevertheless, the team in Delft and their colleagues around the country do a fantastic job in spreading the word (see Volkskrant, RTLnieuws). Lets hope Delft, and it neighbours, finally see value in protecting the local wildlife we still have.

Posts (9) containing the 'math.LA’ (Linear Algebra) tag:

On the complexity of matrix multiplications |10 October 2022|
`tags: math.LA`

Endomorphisms, Tensors in Disguise |18 August 2020|
`tags: math.LA`

Optimal coordinates |24 May 2020|
`tags: math.LA, math.OC, math.DS`

A Peculiar Lower Bound to the Spectral Radius |7 March 2020|
`tags: math.LA`

Spectral Radius vs Spectral Norm |28 Febr. 2020|
`tags: math.LA, math.DS`

A Special Group, Volume Preserving Feedback |16 Nov. 2019|
`tags: math.LA, math.DS`

Simple Flow Algorithm to Compute the Operator Norm |4 Nov. 2019|
`tags: math.LA`

Finite Calls to Stability | 31 Oct. 2019 |
`tags: math.LA, math.DS`

Help the (skinny) Hedgehogs | 31 Oct. 2019 |
`tags: math.LA, other.Nature`

Posts (9) containing the 'math.LA’ (Linear Algebra) tag:

On the complexity of matrix multiplications |10 October 2022|tags: math.LA

Endomorphisms, Tensors in Disguise |18 August 2020|tags: math.LA

Optimal coordinates |24 May 2020|tags: math.LA, math.OC, math.DS

A Peculiar Lower Bound to the Spectral Radius |7 March 2020|tags: math.LA

Spectral Radius vs Spectral Norm |28 Febr. 2020|tags: math.LA, math.DS

A Special Group, Volume Preserving Feedback |16 Nov. 2019|tags: math.LA, math.DS

Simple Flow Algorithm to Compute the Operator Norm |4 Nov. 2019|tags: math.LA

Finite Calls to Stability | 31 Oct. 2019 |tags: math.LA, math.DS

Help the (skinny) Hedgehogs | 31 Oct. 2019 |tags: math.LA, other.Nature

On the complexity of matrix multiplications |10 October 2022|
`tags: math.LA`

Endomorphisms, Tensors in Disguise |18 August 2020|
`tags: math.LA`

Optimal coordinates |24 May 2020|
`tags: math.LA, math.OC, math.DS`

A Peculiar Lower Bound to the Spectral Radius |7 March 2020|
`tags: math.LA`

Spectral Radius vs Spectral Norm |28 Febr. 2020|
`tags: math.LA, math.DS`

A Special Group, Volume Preserving Feedback |16 Nov. 2019|
`tags: math.LA, math.DS`

Simple Flow Algorithm to Compute the Operator Norm |4 Nov. 2019|
`tags: math.LA`

Finite Calls to Stability | 31 Oct. 2019 |
`tags: math.LA, math.DS`

Help the (skinny) Hedgehogs | 31 Oct. 2019 |
`tags: math.LA, other.Nature`