Posts (6) containing the 'math.DS’ (Dynamical Systems) tag:Optimal coordinates 24 May 2020

At last, we do a numerical test in Julia. Consider a linear system defined by the tuple : To simulate numerical problems, we round the maps and to the closest integer where is a noisy sinusoidal input. We compare a perfect (no numerical errors) realization to the rounded, denoted by , naive realization and to the optimized one . We do 100 experiments and show the mean plus the full confidence interval (of the inputoutput behaviour), the optimized representation is remarkably better. Numerically we observe that for we have , which is precisely where the naive struggles. 
: To show this AMGM inequality, we can use Jensen's inequality for concave functions, that is, . We can use the logarithm as our prototypical concave function on and find . Then, the result follows.
: The inequality attributed to Hadamard is slightly more difficult to show. In its general form the statement is that , for the column of . The inequality becomes an equality when the columns are mutually orthogonal. The intuition is clear if one interprets the determinant as the signed volume spanned by the columns of , which. In case , we know that there is such that , hence, by this simple observation it follows that
Equality holds when the columns are orthogonal, so must be diagonal, but must also hold, hence, must be diagonal, and thus must be diagonal, which is the result we use.
Lately, linear discretetime control theory has start to appear in areas far beyond the ordinary. As a byproduct, papers start to surface which claim that stability of linear discretetime systems
is characterized by . The confunsion is complete by calling this object the spectral norm of a matrix . Indeed, stability of is not characterized by , but by . If , then all absolute eigenvalues of are strictly smaller than one, and hence for , . This follows from the Jordan form of in combination with the simple observation that for , .
To continue, define two sets; and . Since we have . Now, the main question is, how much do we lose by approximating with ? Motivation to do so is given by the fact that is a norm and hence a convex function, i.e., when given a convex polytope with vertices , if , then . Note that is not a norm, can be without (consider an uppertriangular matrix with zeros on the main diagonal), the triangle inequality can easily fail as well. For example, let
Then , but , hence fails to hold. A setting where the aforementioned property of might help is Robust Control, say we want to assess if some algorithm rendered a compact convex set , filled with 's, stable. As highlighted before, we could just check if all extreme points of are members of , which might be a small and finite set. Thus, computationally, it appears to be attractive to consider over the generic .
As a form of critique, one might say that is a lot larger than . Towards such an argument, one might recall that . Indeed, if , but . Therefore, it seems like the set of for which considering over is reasonable, is negligibly small. To say a bit more, since we see that we can always find a ball with nonzero volume fully contained in . Hence, is at least locally dense in . The same holds for (which is more obvious). So in principle we could try to investigate . For , the sets agree, which degrades asymptotically. However, is this the way to go? Lets say we consider the set . Clearly, the sets and are different, even in volume, but for sufficiently large , should we care? The behaviour they parametrize is effectively the same.
We will stress that by approximating with , regardless of their volumetric difference, we are ignoring a full class of systems and miss out on a nonneglible set of behaviours. To see this, any system described by is contractive in the sense that , while systems in are merely asymptotically stable. They might wander of before they return, i.e., there is no reason why for all we must have . We can do a quick example, consider the matrices
Then , and both and have . We observe that indeed is contractive, for any initial condition on , we move strictly inside the sphere, whereas for , when starting from the same initial condition, we take a detour outside of the sphere before converging to . So although and have the same spectrum, they parametrize different systems.
In our deterministic setting this would mean that we would confine our statespace to a (solid) sphere with radius , instead of . Moreover, in linear optimal control, the resulting closedloop system is usually not contractive. Think of the infamous pendulum on a cart. Being energy efficient has usually nothing to do with taking the shortest, in the Euclidean sense, path.
: Recall, . Then, let be some eigenvector of . Now we have . Since this eigenvalue is arbitrary it follows that .
: Let then . This follows from the being a norm.
: Clearly, if , we have . Now, when , does this imply that ? The answer is no, consider
Then, , yet, . For the full set of conditions on such that see this paper by Goldberg and Zwas.
: Recall that . This expression is clearly larger or equal to .
This short note is inspired by the beautiful visualization techniques from Duistermaat and Kolk (DK1999). Let's say we have a dimensional linear discretetime dynamical system , which preserves the volume of the cube under the dynamics, i.e. for any .
Formally put, this means that is part of a certain group, specifically, consider some field and define the Special Linear group by
Now, assume that we are only interested in matrices such that the cube remains bounded under the dynamics, i.e., is bounded. In this scenario we restrict our attention to . To see why this is needed, consider and both with determinant :
If we look at the image of under both these maps (for several iterations), we see that volume is preserved, but also clearly that the set is extending indefinitely in the horizontal direction.
To have a somewhat interesting problem, assume we are given with while it is our task to find a such that , hence, find feedback which not only preserves the volume, but keeps any set of initial conditions (say ) bounded over time.
Towards a solution, we might ask, given any what is the nearest (in some norm) rotation matrix? This turns out to be a classic question, starting from orthogonal matrices, the solution to is given by , where follows from the SVD of , . Differently put, when we use a polar decomposition of , , then the solution is given by . See this note for a quick multiplier proof. An interesting sidenote, problem is welldefined since is compact. To see this, recall that for any we have , hence the set is bounded, plus is the preimage of under , , hence the set is closed as well.
This is great, however, we would like to optimize over instead. To do so, one usually resorts to simply checking the sign  and if necessary  flipping it via finding the closest matrix with a change in determinant sign. We will see that by selecting appropriate coordinates we can find the solution without this checking of signs.
For today we look at and largely follow (DK1999), for such a dimensional matrix, the determinant requirement translates to . Under the following (invertible) linear change of coordinates
becomes , i.e., for any pair the point runs over a circle with radius . Hence, we obtain the diffeomorphism . We can use however a more pretty diffeomorphism of the sphere and an open set in . To that end, use the diffeomorphism:
for (formal way of declaring that should not be any real number) and the pair being part of (the open unitdisk). To see this, recall that the determinant is not a linear operator. Since we have a dimensional example we can explicitly compute the eigenvalues of any (plug into the characteristic equation) and obtain:
At this point, the only demand on the eigenvalues is that . Therefore, when we would consider within the context of a discrete linear dynamical system , is either a saddlepoint, or we speak of a marginally stable system (). We can visualize the set of all marginally stable , called , defined by all satisfying
(DK1999) J.J. Duistermaat and J.A.C. Kolk : ‘‘Lie Groups’’, 1999 Springer.
Previously we looked at ,  and its resulting flow  as the result from mapping to the sphere. However, we saw that for this flow convergences to , with such that for we have . Hence, it is interesting to look at the flow from an optimization point of view: .
A fair question would be, is our flow not simply Riemannian gradient ascent for this problem? As common with these kind of problems, such a hypothesis is something you just feel.
Now, using the tools from (CU1994, p.311) we can compute the gradient of (on the sphere) via , where is a vector field normal to , e.g., . From there we obtain
To make our life a bit easier, use instead of the map . Moreover, set . Then it follows that
Of course, to make the computation cleaner, we changed to , but the relation between and is beautiful. Somehow, mapping trajectories of , for some , to the sphere corresponds to (Riemannian) gradient ascent applied to the problem .
To do an example, let be given by We can compare with . We see that the gradient flow under the quadratic function takes a clearly ‘‘shorter’’ path. 
Now, we formalize the previous analysis a bit a show how fast we convergence. Assume that the eigenvectors are ordered such that eigenvector corresponds to the largest eigenvalue of . Then, the solution to is given by
Let be expressed in eigenvector coordinates, with all (normalized). Moreover, assume all eigenvalues are distinct. Then, to measure if is near , we compute , which is if and only if is parallel to . To simplify the analysis a bit, we look at , for some perturbation , this yields
Next, take the the (natural) logarithm on both sides:
This logsumexp terms are hard to deal with, but we can apply the socalled ‘‘logsumexp trick’’:
In our case, we set and obtain
We clearly observe that for the LHS approaches from below, which means that from above, like intended. Of course, we also observe that the mentioned method is not completely general, we already assume distinct eigenvalues, but there is more. We do also not convergence when , which is however a set of measure on the sphere .
More interestingly, we see that the convergence rate is largely dictated by the ‘‘spectral gap/eigengap’’ . Specifically, to have a particular projection error , such that , we need
Comparing this to the resulting flow from , , we see that we have the same flow, but with . This is interesting, since and have the same eigenvectors, yet a different (scaled) spectrum. With respect to the convergence rate, we have to compare and for any with (the additional is not so interesting).
It is obvious what we will happen, the crux is, is larger or smaller than ? Can we immediately extend this to a Newtontype algorithm? Well, this fails (globally) since we work in instead of purely with . To be concrete, , we never have degrees of freedom.
Of course, these observations turned out to be far from new, see for example (AMS2008, sec. 4.6).
(AMS2008) P.A. Absil, R. Mahony and R. Sepulchre: ‘‘Optimization Algorithms on Matrix Manifolds’’, 2008 Princeton University Press.
(CU1994) Constantin Udriste: ‘‘Convex Functions and Optimization Methods on Riemannian Manifolds’’, 1994 Kluwer Academic Publishers.
In this post we start a bit with control of systems living on compact metric spaces. This is interesting since we have to rethink what we want, cannot even occur. Let a great arc (circle) on be parametrized by the normal of a hyperplane through , i.e., , is the axis of rotation for . 
To start, consider a linear dynamical systems , , for which the solution is given by . We can map this solution to the sphere via . The corresponding vector field for is given by
The beauty is the following, to find the equilibria, consider . Of course, is not valid since . However, take any eigenvector of , then is still an eigenvector, plus it lives on . If we plug such a scaled (from now on consider all eigenvectors to live on ) into we get , which is rather cool. The eigenvectors of are (at least) the equilibria of . Note, if , then so is , each eigenvector gives rise to two equilibria. We say at least, since if is nontrivial, then for any , we get .
Let , then what can be said about the stability of the equilibrium points? Here, we must look at , where the dimensional system is locally a dimensional linear system. To do this, we start by computing the standard Jacobian of , which is given by
As said before, we do not have a dimensional system. To that end, we assume with diagonalization (otherwise peform a GramSchmidt procedure) and perform a change of basis . Then if we are interested in the qualitative stability properties of (the eigenvector of ), we can simply look at the eigenvalues of the matrix resulting from removing the row and column in , denoted .
We do two examples. In both figures are the initial conditions indicated with a dot while the equilibrium points are stars.
First, let Clearly, for we have and let be . Then using the procedure from before, we find that is stable (attracting), while are locally unstable in one direction while in the other (along the great arc). Hence, this example gives rise to the two stable poles. 
Consider an unknown dynamical system parametrized by . In this short note we look at a relatively simple, yet neat method to assess stability of via finite calls to a particular oracle. Here, we will follow the notes by Roman Vershynin (RV2011).
Assume we have no knowledge of , but we are able to obtain for any desired . It turns out that we can use this oracle to bound from above.
First, recall that since is symmetric, . Now, the common definition of is . The constraint set is not per se convenient from a computational point of view. Instead, we can look at a compact constraint set: , where is the sphere in . The latter formulation has an advantage, we can discretize these compact constraints sets and tame functions over the individual chunks.
To do this formally, we can use nets, which are convenient in quantitative analysis on compact metric spaces . Let be a net for if . Hence, we measure the amount of balls to cover .
For example, take as our metric space where we want to bound (the cardinality, i.e., the amount of balls). Then, to construct a minimal set of balls, consider an net of balls , with . Now the insight is simply that . Luckily, we understand the volume of dimensional Euclidean balls well such that we can establish
Which is indeed sharper than the bound from Lemma 5.2 in (RV2011).
Next, we recite Lemma 5.4 from (RV2011). Given an net for with we have
This can be shown using the CauchySchwartz inequality. Now, it would be interesting to see how sharp this bound is. To that end, we do a dimensional example (). Note, here we can easily find an optimal grid using polar coordinates. Let the unknown system be parametrized by
such that . In the figures below we compute the upperbound to for several nets . Indeed, for finer nets, we see the upperbound approaching . However, we see that the convergence is slow while being inversely related to the exponential growth in .
Overall, we see that after about a 100 calls to our oracle we can be sure about being exponentially stable. Here we just highlighted an example from (RV2011), at a later moment we will discuss its use in probability theory.