Optimal coordinates |24 May 2020|
tags: math.LA, math.OC, math.DS

There was a bit of radio silence, but as indicated here, some interesting stuff is on its way.

In this post we highlight this 1976 paper by Mullis and Roberts on ’Synthesis of Minimum Roundoff Noise Fixed Point Digital Filters’.

Let us be given some single-input single-output (SISO) dynamical system sigma in Sigma

 sigma :left{ begin{array}{ll} x(t+1) &= Ax(t) + bu(k) y(t) &= cx(t) + du(k)  end{array}right. .

It is known that the input-output behaviour of any sigma, that is, the map from u(t) to y(t) is invariant under a similarity transform. To be explicit, the tuples (A,b,c,d) and (TAT^{-1},Tb,cT^{-1},d), which correspond to the change of coordinates z(t)=Tx(t) for some Tin mathsf{GL}_n, give rise to the same input-output behaviour. Hence, one can define the equivalence relation sigma sim sigma' by imposing that the input-output maps of sigma and sigma' are equivalent. By doing so we can consider the quotient Sigma / mathsf{GL}_n. However, in practice, given a sigma, is any sigma' such that sigma'sim sigma equally useful? For example, the following A and A' are similar, but clearly, A' is preferred from a numerical point of view:

 A = left[begin{array}{ll} 0.5 & 10^9 0 & 0.5  end{array}right],quad  A' = left[begin{array}{ll} 0.5 & 1 0 & 0.5  end{array}right].

In what follows, we highlight the approach from Mullis and Roberts and conclude by how to optimally select Tin mathsf{GL}_n. Norms will be interpreted in the ell_2 sense, that is, for {x(t)}_{tin mathbf{N}}, |x|^2=sum^{infty}_{t=0}|x(t)|_2. Also, in what follows we assume that x(0)=0 plus that A is stable, which will mean rho(A)<1 and that sigma corresponds to a minimal realization.

The first step is to quantify the error. Assume we have allocated m_i bits at our disposal to present the i^{mathrm{th}} component of x(t). These values for m_i can differ, but we constrain the average by sum^n_{i=1}m_i = nm for some m. Let mull 1 be a 'step-size’, such that our dynamic range of x_i(t) is bounded by pm mu 2^{m_i-1} (these are our possible representations). Next we use the modelling choices from Mullis and Roberts, of course, they are debatable, but still, we will infer some nice insights.

First, to bound the dynamic range, consider solely the effect of an input, that is, define f_i(t) by x(t)=sum^{infty}_{k=1}A^{k-1}b u(t-k), x_i(t)=sum^{infty}_{k=1}f_i(t)u(t-k). Then we will impose the bound pm delta |f_i| on x_i(t). In combination with the step size (quantization), this results in delta^2 |f_i|^2 = (mu 2^{m_i-1})^2. Let u be a sequence of i.i.d. sampled random variables from mathcal{N}(0,1). Then we see that mathrm{var}big(x_i(t)big)= |f_i|^2. Hence, one can think of delta as scaling parameter related to the probability with which this dynamic range bound is true.

Next, assume that all the round-off errors are independent and have a variance of mu^2. Hence, the worst-case variance of computating x_i(t+1) is mu^2(n+1). To evaluate the effect of this error on the output, assume for the moment that u(t) is identically 0. Then, y(t) = cA^t x(0). Similar to f_i(t), define g_i(t) as the i^{mathrm{th}} component of cA^t. As before we can compute the variance, this time of the full output signal, which yields sigma_y^2:=mathrm{var}(y) = mu^2(n+1)sum^n_{i=1}|g_i|^2. Note, these expressions hinge on the indepedence assumption.

Next, define the (infamous) matrices W_c, W_o by

 begin{array}{lll} W^{(c)} &= AW^{(c)}A^{top} + bb^{top} &= sum^{infty}_{k=0} A^k bb^{top} (A^k)^{top},  W^{(o)} &= A^{top}W^{(o)} A + c^{top}c &=sum^{infty}_{k=0} (A^k)^{top}c^{top}c A.  end{array}

If we assume that the realization is not just minimal, but that (A,b) is a controllable pair and that (A,c) is an observable pair, then, W^{(c)}succ 0 and W^{(o)}succ 0. Now the key observation is that W^{(c)}_{ij} = langle f_i, f_j rangle and similarly W^{(o)}_{ij} = langle g_i, g_j rangle. Hence, we can write delta^2 |f_i|^2 = (mu 2^{m-1})^2 as delta^2 K_ii = (mu 2^{m-1})^2 and indeed sigma_y^2 = mu^2(n+1)sum^n_{i=1}W_{ii}. Using these Lyapunov equations we can say goodbye to the infinite-dimensional objects.

To combine these error terms, let say we have apply a coordinate transformation x'=Tx, for some Tin mathsf{GL}_n. Specifically, let T be diagonal and defined by T_{ii}=2delta sqrt{K_{ii}}/(mu 2^m). Then one can find that K'_{ii}=K_{ii}/T_{ii}^2, W'_{ii}=W_{ii}T_{ii}^2. Where the latter expressions allows to express the output error (variance) by

 sigma_y^2 = (n+1)delta^2 sum^n_{i=1}frac{W^{(c)}_{ii}W^{(o)}_{ii}}{(2^{m_i})^2}.

Now we want to minimize sigma_y^2 over all sigma'sim sigma plus some optimal selection of m_i. At this point it looks rather painful. To make it work we first reformulate the problem using the well-known arithmetic-geometric mean inequality^{[1]} for non-negative sequences {a_i}_{iin [n]}:

 frac{1}{n} sum^n_{i=1}a_i geq left( prod^n_{i=1}a_i right)^{1/n}.

This inequality yields

 sigma_y^2 = n(n+1)delta^2 frac{1}{n}sum^n_{i=1}frac{W^{(c)}_{ii}W^{(o)}_{ii}}{(2^{m_i})^2}geq n(n+1)left(frac{delta}{2^m}right)^2left(prod^n_{i=1}{W^{(c)}_{ii}W^{(o)}_{ii}}right)^{1/n}qquad (1).

See that the right term is independent of m_i, hence this is a lower-bound with respect to minimization over m_i. To achieve this (to make the inequality from (1) an equality), we can select

 m_i = m + frac{1}{2} left(log_2 W_{ii}^{(c)}W_{ii}^{(o)} - frac{1}{2}sum^n_{j=1}log_2 W_{ii}^{(c)}W_{ii}^{(o)} right).

Indeed, as remarked in the paper, m_i is not per se an integer. Nevertheless, by this selection we find the clean expression from (1) to minimize over systems equivalent to sigma, that is, over some transformation matrix T. Define a map f:mathcal{S}^n_{succ 0}to (0,1] by

 f(P) = left( frac{mathrm{det}(P)}{prod^n_{i=1}P_{ii}}right)^{1/2}.

It turns out that f(P)=1 if and only if P is diagonal. This follows^{[2]} from Hadamard's inequality. We can use this map to write

 sigma_y^2 = n(n+1) left(frac{delta}{2^m} right)^2 frac{mathrm{det}(W^{(c)}W^{(o)})^{1/n}}{big(f(W^{(c)})f(W^{(o)})big)^{2/n}} geq n(n+1) left(frac{delta}{2^m}right)^2 mathrm{det}(W^{(c)}W^{(o)})^{1/n}.

Since the term mathrm{det}(W^{(c)}W^{(o)}) is invariant under a transformation T, we can only optimize sigma_y^2 over a structural change in realization tuple (A,b,c,d), that is, we need to make W^{(c)} and W^{(o)} simultaneous diagonal! It turns out that this numerically 'optimal’ realization, denoted sigma^{star}, is what is called a principal axis realization.

To compute it, diagonalize W^{(o)}, W^{(o)}=Q_o Lambda_o Q_o^{top} and define T_1^{-1}:=Lambda_o^{1/2}Q_o^{top}. Next, construct the diagonalization T_1^{-1}W^{(c)}T_1^{-top}=Q_{1,c}Lambda_{1,c}Q_{1,c}^{top}. Then our desired transformation is T:=T_1 Q_{1,c}. First, recall that under any Tin mathsf{GL}_n the pair (W^{(c)},W^{(o)}) becomes (T^{-1}W^{(c)}T^{-top},T^{top}W^{(o)}T). Plugging in our map T yields the transformed matrices (Lambda_{1,c},I_n), which are indeed diagonal.

Errors 

At last, we do a numerical test in Julia. Consider a linear system sigma defined by the tuple (A,b,c,d):

 A = left[begin{array}{ll} 0.8 & 0.001  0 & -0.5 end{array} right],quad b = left[begin{array}{l} 10  0.1 end{array} right],
 c=left[begin{array}{ll} 10 & 0.1 end{array} right],quad d=0.

To simulate numerical problems, we round the maps f(x,u)=Ax+bu and h(x)=cx to the closest integer where u(t) is a noisy sinusoidal input. We compare a perfect (no numerical errors) realization sigma to the rounded, denoted by [cdot], naive realization [sigma] and to the optimized one [sigma^{star}]. We do 100 experiments and show the mean plus the full confidence interval (of the input-output behaviour), the optimized representation is remarkably better. Numerically we observe that for sigma^{star} we have A^{star}_{21}neq 0, which is precisely where the naive A struggles.

[1]: To show this AM-GM inequality, we can use Jensen's inequality for concave functions, that is, mathbf{E}[g(x)]leq g(mathbf{E}[x]). We can use the logarithm as our prototypical concave function on mathbf{R}_{>0} and find logbig(frac{1}{n}sum^n_{i=1}x_ibig)geq frac{1}{n}sum^n_{i=1}log(x_i) = logbig((prod^n_{i=1}x_i)^{1/n} big). Then, the result follows.
[2]: The inequality attributed to Hadamard is slightly more difficult to show. In its general form the statement is that |mathrm{det}(A)|leq prod^n_{i=1}|a_i|_2, for a_i the i^{mathrm{th}} column of A. The inequality becomes an equality when the columns are mutually orthogonal. The intuition is clear if one interprets the determinant as the signed volume spanned by the columns of A, which. In case Ain mathcal{S}^n_{succ 0}, we know that there is L such that A=L^{top}L, hence, by this simple observation it follows that

 mathrm{det}(A)=mathrm{det}(L)^2 leq prod^n_{i=1}|ell_i|_2^2 = prod^n_{i=1}a_{ii}.

Equality holds when the columns are orthogonal, so AA^{top} must be diagonal, but A=A^{top} must also hold, hence, A^2 must be diagonal, and thus A must be diagonal, which is the result we use.