# Optimal coordinates |24 May 2020| tags: math.LA, math.OC, math.DS

There was a bit of radio silence, but as indicated here, some interesting stuff is on its way.

In this post we highlight this 1976 paper by Mullis and Roberts on ’Synthesis of Minimum Roundoff Noise Fixed Point Digital Filters’.

Let us be given some single-input single-output (SISO) dynamical system

It is known that the input-output behaviour of any , that is, the map from to is invariant under a similarity transform. To be explicit, the tuples and , which correspond to the change of coordinates for some , give rise to the same input-output behaviour. Hence, one can define the equivalence relation by imposing that the input-output maps of and are equivalent. By doing so we can consider the quotient . However, in practice, given a , is any such that equally useful? For example, the following and are similar, but clearly, is preferred from a numerical point of view:

In what follows, we highlight the approach from Mullis and Roberts and conclude by how to optimally select . Norms will be interpreted in the sense, that is, for , . Also, in what follows we assume that plus that is stable, which will mean and that corresponds to a minimal realization.

The first step is to quantify the error. Assume we have allocated bits at our disposal to present the component of . These values for can differ, but we constrain the average by for some . Let be a 'step-size’, such that our dynamic range of is bounded by (these are our possible representations). Next we use the modelling choices from Mullis and Roberts, of course, they are debatable, but still, we will infer some nice insights.

First, to bound the dynamic range, consider solely the effect of an input, that is, define by , . Then we will impose the bound on . In combination with the step size (quantization), this results in . Let be a sequence of i.i.d. sampled random variables from . Then we see that . Hence, one can think of as scaling parameter related to the probability with which this dynamic range bound is true.

Next, assume that all the round-off errors are independent and have a variance of . Hence, the worst-case variance of computating is . To evaluate the effect of this error on the output, assume for the moment that is identically . Then, . Similar to , define as the component of . As before we can compute the variance, this time of the full output signal, which yields . Note, these expressions hinge on the indepedence assumption.

Next, define the (infamous) matrices , by

If we assume that the realization is not just minimal, but that is a controllable pair and that is an observable pair, then, and . Now the key observation is that and similarly . Hence, we can write as and indeed . Using these Lyapunov equations we can say goodbye to the infinite-dimensional objects.

To combine these error terms, let say we have apply a coordinate transformation , for some . Specifically, let be diagonal and defined by . Then one can find that , . Where the latter expressions allows to express the output error (variance) by

Now we want to minimize over all plus some optimal selection of . At this point it looks rather painful. To make it work we first reformulate the problem using the well-known arithmetic-geometric mean inequality for non-negative sequences :

This inequality yields

See that the right term is independent of , hence this is a lower-bound with respect to minimization over . To achieve this (to make the inequality from an equality), we can select

Indeed, as remarked in the paper, is not per se an integer. Nevertheless, by this selection we find the clean expression from to minimize over systems equivalent to , that is, over some transformation matrix . Define a map by

It turns out that if and only if is diagonal. This follows from Hadamard's inequality. We can use this map to write

Since the term is invariant under a transformation , we can only optimize over a structural change in realization tuple , that is, we need to make and simultaneous diagonal! It turns out that this numerically 'optimal’ realization, denoted , is what is called a principal axis realization.

To compute it, diagonalize , and define . Next, construct the diagonalization . Then our desired transformation is . First, recall that under any the pair becomes . Plugging in our map yields the transformed matrices , which are indeed diagonal.

 At last, we do a numerical test in Julia. Consider a linear system defined by the tuple : To simulate numerical problems, we round the maps and to the closest integer where is a noisy sinusoidal input. We compare a perfect (no numerical errors) realization to the rounded, denoted by , naive realization and to the optimized one . We do 100 experiments and show the mean plus the full confidence interval (of the input-output behaviour), the optimized representation is remarkably better. Numerically we observe that for we have , which is precisely where the naive struggles.

: To show this AM-GM inequality, we can use Jensen's inequality for concave functions, that is, . We can use the logarithm as our prototypical concave function on and find . Then, the result follows.
: The inequality attributed to Hadamard is slightly more difficult to show. In its general form the statement is that , for the column of . The inequality becomes an equality when the columns are mutually orthogonal. The intuition is clear if one interprets the determinant as the signed volume spanned by the columns of , which. In case , we know that there is such that , hence, by this simple observation it follows that

Equality holds when the columns are orthogonal, so must be diagonal, but must also hold, hence, must be diagonal, and thus must be diagonal, which is the result we use.