Posts (3) containing the 'math.OC’ (Optimization and Control) tag:Optimal coordinates 24 May 2020

At last, we do a numerical test in Julia. Consider a linear system defined by the tuple : To simulate numerical problems, we round the maps and to the closest integer where is a noisy sinusoidal input. We compare a perfect (no numerical errors) realization to the rounded, denoted by , naive realization and to the optimized one . We do 100 experiments and show the mean plus the full confidence interval (of the inputoutput behaviour), the optimized representation is remarkably better. Numerically we observe that for we have , which is precisely where the naive struggles. 
: To show this AMGM inequality, we can use Jensen's inequality for concave functions, that is, . We can use the logarithm as our prototypical concave function on and find . Then, the result follows.
: The inequality attributed to Hadamard is slightly more difficult to show. In its general form the statement is that , for the column of . The inequality becomes an equality when the columns are mutually orthogonal. The intuition is clear if one interprets the determinant as the signed volume spanned by the columns of , which. In case , we know that there is such that , hence, by this simple observation it follows that
Equality holds when the columns are orthogonal, so must be diagonal, but must also hold, hence, must be diagonal, and thus must be diagonal, which is the result we use.
One of the most famous problems in linear control theory is that of designing a control law which minimizes some cost being quadratic in input and state. We all know this as the Linear Quadratic Regulator (LQR) problem. There is however one problem (some would call it a blessing) with this formulation once the dynamics contain some zeromean noise: the control law is independent of this stochasticity. This is easy in proofs, easy in practice, but does it work well? Say, your control law is rather slow, but bringing you back to in the end of time. In the noiseless case, no problem. But now say there is a substantial amount of noise, do you still want such a slow control law? The answer is most likely no since you quickly drift away. The classical LQR formulation does not differentiate between noise intensities and hence can be rightfully called naive as Bertsekas did (Bert76). Unfortunately, this name did not stuck.
Can we do better? Yes. Define the cost function
and consider the problem of finding the minimizing policy in:
Which is precisely the classical LQR problem, but now with the cost wrapped in an exponential utility function parametrized by . This problem was pioneered by most notably Peter Whittle (Wh90).
Before we consider solving this problem, let us interpret the risk parameter . For we speak of a risksensitive formulation while for we are riskseeking. This becomes especially clear when you solve the problem, but a quick way to see this is to consider the approximation of near , which yields , so relates to pessimism and to optimism indeed. Here we skipped a few scaling factors, but the idea remains the same, for a derivation, consider cumulant generating functions and have a look at our problem.
As with standard LQR, we would like to obtain a stabilizing policy and to that end we will be mostly bothered with solving . However, instead of immediately trying to solve the infinite horizon average cost Bellman equation it is easier to consider for finite first. Then, when we can prove monotonicty and upperbound , the infinite horizon optimal policy is given by . The reason being that monotonic sequences which are uniformly bounded converge.
The main technical tool towards finding the optimal policy is the following Lemma similar to one in the Appendix of (Jac73):
Lemma
Consider a noisy linear dynamical system defined by with and let be shorthand notation for . Then, if holds we have
where .
Proof
Let and recall that the shorthand notation for is , then:
Here, the first step follows directly from being a zeromean Gaussian. In the second step we plug in . Then, in the third step we introduce a variable with the goal of making the covariance matrix of a Gaussian with mean . We can make this work for
and additionally
Using this approach we can integrate the latter part to and end up with the final expression. Note that in this case the random variable needs to be Gaussian, since the second to last expression in equals by being a Gaussian probability distribution integrated over its entire domain.
What is the point of doing this? Let and assume that represents the costtogo from stage and state . Then consider
Note, since we work with a sum within the exponent, we must multiply within the righthandside of the Bellman equation. From there it follows that , for
The key trick in simplifying your expressions is to apply the logarithm after minimizing over such that the fraction of determinants becomes a stateindependent affine term in the cost. Now, using a matrix inversion lemma and the pushthrough rule we can remove and construct a map :
such that . See below for the derivations, many (if not all) texts skip them, but if you have never applied the pushthrough rule they are not that obvious.
As was pointed out for the first time by Jacobson (Jac73), these equations are precisely the ones we see in (noncooperative) Dynamic Game theory for isotropic and appropriately scaled .
Especially with this observation in mind there are many texts which show that is welldefined and finite, which relates to finite cost and a stabilizing control law . To formalize this, one needs to assume that is a minimal realization for defined by . Then you can appeal to texts like (BB95).
Numerical Experiment and Robustness Interpretation
To show what happens, we do a small dimensional example. Here we want to solve the RiskSensitive ) and the Riskneutral () infinitehorizon average cost problem for
, , . There is clearly a lot of noise, especially on the second signal, which also happens to be critical for controlling the first state. This makes it interesting. We compute and . Given the noise statistics, it would be reasonable to not take the certainty equivalence control law since you control the first state (which has little noise on its line) via the second state (which has a lot of noise on its line). Let be the state under and the state under .
We see in the plot below (for some arbitrary initial condition) typical behaviour, does take the noise into account and indeed we see the induces a smaller variance.
So, is more robust than in a particular way. It turns out that this can be neatly explained. To do so, we have to introduce the notion of Relative Entropy (KullbackLeibler divergence). We will skip a few technical details (see the references for full details). Given a measure on , then for any other measure , being absolutely continuous with respect to (), define the Relative Entropy as:
Now, for any measurable function on , being bounded from below, it can be shown (see (DP96)) that
For the moment, think of as your standard finitehorizon LQR cost with product measure , then we see that an exponential utility results in the understanding that a control law which minimizes is robust against adversarial noise generated by distributions sufficiently close (measured by ) to the reference .
Here we skipped over a lot of technical details, but the intuition is beautiful, just changing the utility to the exponential function gives a wealth of deep distributional results we just touched upon in this post.
Simplification steps in
We only show to get the simpler representation for the input, the approach to obtain is very similar.
First, using the matrix inversion lemma:
we rewrite into:
Note, we factored our since we cannot assume that is invertible. Our next tool is called the pushthrough rule. Given and we have
You can check that indeed . Now, to continue, plug this expression for into the input expression:
Indeed, we only used factorizations and the pushthrough rule to arrive here.
(Bert76) Dimitri P. Bertsekas : ‘‘Dynamic Programming and Stochastic Control’’, 1976 Academic Press.
(Jac73) D. Jacobson : ‘‘Optimal stochastic linear systems with exponential performance criteria and their relation to deterministic differential games’’, 1973 IEEE TAC.
(Wh90) Peter Whittle : ‘‘Risksensitive Optimal Control’’, 1990 Wiley.
(BB95) Tamer Basar and Pierre Bernhard : ‘‘Optimal Control and Related Minimax Design Problems A Dynamic Game Approach’’ 1995 Birkhauser.
(DP96) Paolo Dai Pra, Lorenzo Meneghini and Wolfgang J. Runggaldier :‘‘Connections Between Stochastic Control and Dynamic Games’’ 1996 Math. Control Signals Systems.
Theo Jansen, designer of the infamous ‘‘strandbeesten’’ explains in a relatively recent series of videos that the design of his linkage system is the result of what could be called an early implementation of genetic programming. Specifically, he wanted the profile that the foot follows to be flat on the bottom. He does not elaborate on this too much, but the idea is that in this way his creatures flow over the surface instead of the more bumpy motion we make. Moreover, you can imagine that in this way the center of mass remains more or less at a constant height, which is indeed energy efficient. Now, many papers have been written on understanding his structure, but I find most of them rather vague, so lets try a simple example.
To that end, consider just 1 leg, 1 linkage system as in the figure(s) below. We can write down an explicit expression for where and . See below for an explicit derivation. Now, with his motivation in mind, what would be an easy and relevant cost function? An idea is to maximize the ratio , of course, subject to still having a functioning leg. To that end we consider the cost function (see the schematic below for notation):
Of course, this is a heuristic, but the intuition is that this should approximate . Let be the set of parameters as given in Jansen his video, then we apply a few steps of (warmstart) gradient ascent: , .
In the figure on the left we compare the linkage as given by Jansen (in black) with the result of just two steps of gradient ascent (in red). You clearly see that the trajectory of the foot becomes more shallow. Nevertheless, we see that the profile starts to become curved on the bottom. It is actually interesting to note that , which is still far from but already remarkably small for . 
Can we conclude anything from here? I would say that we need dynamics, not just kinematics, but more importantly, we need to put the application in the optimization and only Theo Jansen understands the constraints of the beach.
Still, it is fun to see for yourself how this works and like I did you can build a (baby) strandbeest yourself using nothing more than some PVC tubing and a source of heat.
Derivation of :
Essentially, all that you need is the cosinerule plus some feeling for how to make the equations continuous.
To start, let and (just the standard unit vectors), plus let be a standard (counterclockwise) rotation matrix acting on . To explain the main approach, when we want to find the coordinates of for example point , we try to compute the angle such that when we rotate the unit vector with length emanating from point , we get point . See the schematic picture below.
Then we easily obtain , and . Now, to obtain and compute the length of the diagonal, and using the cosine rule , . Then, the trick is to use to compute . To avoid angles larger than , consider an inner product with and add the remaining ; . From there we get , , , . The point is not missing, to check your code it is convenient to compute and check that agrees with for all . In a similar fashion, , , . Then, for , again compute a diagonal plus , , such that for we have . At last, , such that for .