Almost Surely Investigations in Optimal Control and Dynamical Systems,

Hi, my name is Wout and I am a researcher broadly interested in the mathematical side of systems & control theory. This page functions as a (web)log of results I find interesting.
Feel free to reach me in case of any hints, tips, tricks, mistakes, ideas or any other comment.

Since August 2024, I joined the Applied Mathematics department (INMA) at UCLouvain, as part of the larger Information and Communication Technologies, Electronics and Applied Mathematics (ICTEAM) institute, working with Raphael Jungers as a postdoctoral fellow.

‘‘He had one big idea and … that was the idea of relating many kinds of mathematical and scientific problems to the study of vector fields on manifolds and the properties of the related flows. That was his central theme and central mechanism.’’ p. 494 Browder on Smale (1993)

Communities |July 02, 2024|
tags: math.OC

Roughly 1 year ago I got the wonderful advice from Roy Smith that whatever you do, you should be part of a community. This stayed with me as four years ago I wrote on my NCCR Automation profile page that “Control problems of the future cannot be solved alone, you need to work together.” and I finally realized — may it be in retrospect — what this really means, I was being kindly welcomed to several communities.

Just last week at the European Control Conference (ECC24), Angela Fontan and Matin Jafarian organized a lunch session on peer review in our community and as Antonella Ferrara emphasized throughout, you engage in the process to contribute to the community, your community. As such, a high quality — and sustainable — conference is intimately connected to the feeling of community. Especially since the conference itself feeds back into our feeling of community (as I am sure you know, everything is just feedback loops..).

ECC24 was a fantastic example when it comes to strengthening that feeling of community and other examples that come to mind are the inControl podcast by Alberto Padoan, Autonomy Talks by Gioele Zardini, the IEEE CSS magazine, the IEEE CSS road map 2030 and recently, the KTH reading group on Control Theory: Twenty-Five Seminal Papers (1932-1981) organized by Miguel Aguiar. The historical components are important not only to better understand our community (the why, how and what), but to always remember we build upon our predecessors, we are part of the same community. Of course, the NCCR Automation contributed enormously and far beyond Switzerland (fellowships, symposia, workshops, …), but even closer to home, ever since the previous century, the Dutch Institute of Systems and Control (DISC) has done a remarkable job bringing and keeping all researchers and students together (MSc programs, a national graduate school, Benelux meetings, …).

The importance of community is sometimes linked to being in need of recognition and to working more effective when in competition. I am sure this is true for some and I am even more sure this is enforced in fields where research is expensive and funding is scarce. Yet, that vast majority of members of the scientific community are not PIs, but students, and for us the community is critically important to create and enforce a sense of belonging. The importance of belonging is widely known — not only in the scientific community — and most of the problems we have here on earth can be directly linked to a lack of belonging and community (as beautifully formulated in Pale Blue Dot by Carl Sagan).

Celebrating personal successes is of great importance, we need our heros, but I want to be part of a scientific community where we focus on progress of the community, what did we solve, what are our open problems, and how do we go about solving them? Looking back, I am happy to see it seems I am part of such a community indeed. Thank you!

Weekend read |May 14, 2023|
tags: math.OC

The ‘‘Control for Societal-scale Challenges: Road Map 2030’’ by the IEEE Control Systems Society (Eds. A. M. Annaswamy, K. H. Johansson, and G. J. Pappas) was published a week ago. This roadmap is a remarkable piece of work, over 250 pages, an outstanding list of authors and coverage of almost anything you can imagine, and more.

If you somehow found your way to this website, then I can only strongly recommend reading this document. Despite many of us being grounded in traditional engineering disciplines, I do agree with the sentiment of this roadmap that the most exciting (future) work is interdisciplinary, this is substantiated by many examples from biology. Better yet, it is stressed that besides familiarizing yourself with the foundations, it is of quintessential importance (and fun I would say) to properly dive into the field where you hope to apply these tools.

‘‘Just because you can formulate and solve an optimization problem does not mean that you have the correct or best cost function.’’ p. 32

Section 4.A on Learning and Data-Driven Control also contains many nice pointers, sometimes alluding to a slight disconnect between practice and theory.

‘‘Sample complexity may only be a coarse measure in this regard, since it is not just the number of samples but the “quality” of the samples that matters.’’ p. 142

The section on safety is also inspiring, just stability will not be enough anymore. However, the most exciting part for me was Chapter 6 on education. The simple goal of making students excited early-on is just great. Also, the aspiration to design the best possible material as a community is more than admirable.

Standing on the shoulders of giants |April 3, 2023|
tags: math.OC

One of the most illuminating activities one can undertake is to go back in time and see how the giants in (y)our field shaped the present. Sadly, several of our giants passed away recently and I want to highlight one of them in particular: Roger W. Brockett.

It is very rare to find elegant and powerful theories in sytems and control that are not related to Brockett somehow, better yet, many powerful theories emerged directly from his work. Let me highlight five topics/directions I find remarkably beautiful:

References (all by Brockett).
|1|: Geometric Methods in System Theory - Proceedings of the NATO Advanced Study Institute held at London, England, August 27-Septernber 7, 1973, ed. with Mayne, D. Reidel Publishing Company (1973).
|1.2|: Chapter: Lie Algebras and Lie groups in Control Theory in |1|.
|2|: The early days of geometric nonlinear control, Automatica (2014).
|3|: Asymptotic stability and feedback stabilization, Differential Geometric Control Theory, ed. with Millman and Sussmann, Birkhäuser (1983).
|4|: Control theory and analytical mechanics, Geometric Control Theory, Lie Groups: History, Frontiers and Applications, Vol. VII, ed. Martin and Hermann, Math Sci Press, (1976).
|5|: Lie Theory and Control Systems Defined on Spheres, SIAM Journal on Applied Mathematics (1973).
|6|: Dynamical systems that sort lists and solve linear programming problems, IEEE CDC (1988).

See also this 2022 interview (video) with John Baillieul and the foreword to this book for more on the person behind the researcher.

On Critical Transitions in Nature and Society |29 January 2023|
tags: math.DS

The aim of this post is to highlight the 2009 book Critical Transitions in Nature and Society by Prof. Scheffer, which is perhaps more timely than ever and I strongly recommend this book to essentially anyone.

Far too often we (maybe not you) assume that systems are simply reversible, e.g. based on overly simplified models (think of a model corresponding to a flow). The crux of this book is that for many ecosystems such a reversal might be very hard or even practically impossible. Evidently, this observation has many serious ramifications with respect to understanding and tackling climate change.

Mathematically, this can be understood via bifurcation theory, in particular, via folds and cusps, introducing a form of hysteresis. The hysteresis is important here, as this means that simply restoring some parameters of an ecosystem does not directly imply that the state of the ecosystem will return to what these parameters were previously corresponding to.

The book makes a good point that catastrophe- and chaos theory (old friends of bifurcation theory) themselves went through a critical transition, that is, populair media (and some authors) blew-up those theories. To that end, see the following link for a crystal clear introduction to catastrophe theory by Zeeman (including some controversy), as well as this talk by Ralph Abraham on the history (locally) of chaos theory. One might argue that some populair fields these days are also trapped in a similar situation.

Scheffer stays away from drawing overly strong conclusions and throughout the book one can find many discussions on how and if to relate these models to reality. Overall, Scheffer promotes system theoretic thinking. For example, at first sight we might be inclined to link big (stochastic) events, e.g. a meteor hitting the earth, as the cause of some disruptive event, while actually something else might been slowly degrading, e.g. some basin of attraction shrunk, and this stochastic event was just the final push. This makes the book very timely, climate change is slowly changing ecosystems around us, we are just waiting for a detrimental push. To give another example, in Chapter 5 we find an interesting discussion on the relation between biodiversity and stability. Scheffer highlights two points: (i) robustness (there is plently back-up) and (ii) complementation (if many can perform a task, some of them will be very good at it). Overall, the discussions on ‘‘why we have so many animals’’ are very interesting.

One of the main concepts in the book is that of resilience, which is (re)-defined several times (as part of the storyline in the book), in particular, one page 103 we find it being defined as ‘‘the capacity of a system to absorb disturbance and reorganize while undergoing change so as to still retain essentially the same function, structure, identity, and feedbacks.’’ Qualitatively, this is precisely a combination of having an attractor of some system being structurally stable. However, the goal here is to quantify this structural stability to some extent. Indeed, one enters bifurcation theory, or if you like, topological dynamical systems theory.

Throughout, Scheffer provides incredibly many examples of great inspiration to anyone in dynamical system and control theory, e.g. the basin of attraction, multistability and time-separations are recurring themes. My favourite simple example (of positive feedback) being the ice-Albedo feedback, (too) simply put, less ice means less reflection of light and hence more heat absorption, resulting in even less ice (as such, the positive feedback). More details can be found in Chapter 8. Another detailed series of examples is contained in Chapter 7 on shallow lakes. This chapter explains through a qualitative lens why restoring lakes is inherently difficult. Again, (too) simply put, as with ice, if plants disappear in lakes, turbidity is promoted, making it more difficult for plants to return (they need sunlight and thus prefer clear water). For this lake model, one can find some equations in Appendix 12, which is essentially a Lotka-Volterra model. As such, some readers might be unsatisfied when it comes to the mathematical content*. Moreover, at times Scheffer himself describes bifurcations as exotic dynamical systems jargon and throughout the book one might get the feeling that the dynamical systems community has no clue how to handle time-varying models. Luckily, the latter is not completely true. Perhaps the crux here is that we can use more cross-contamination between scientific communities. Especially recognizing these bifurcations (critical transitions), not just in models, but in reality is still a big and pressing open problem.

*A fantastic and easy-going short book on catastrophe theory is written, of course, by Arnold.

The aim of this book was to have a widely accesible mathematical reference. The introduction to this book reads as: ‘‘This booklet explains what catastrophe theory is about and why it arouses such controversy. It also contains non-controversial results from the mathematical theories of singularities and bifurcation.’’

whereas Arnold ends with: ‘‘A qualitative peculiarity of Thom's papers on catastrophe theory is their original style: as one who presenses the direction of future investigations, Thom not only does not have proofs available, but does not even dispose of the precise formulations of his results. Zeeman, an ardent admirer of this style, observes that the meaning of Thom's words becomes clear only after inserting 99 lines of your own between every two of Thom's.’’

Counterexamples 1/N |9 January 2023|
tags: math.An

As nicely felt through the 1964 book Counterexamples in Analysis by Gelbaum and Olmsted, mathematics is not all about proving theorems, equally important is a large collection of (counter)examples to sharpen our understanding of these theorems. This attitude towards mathematics can also be found in books by Arnold, Halmos and many others.

This short post highlights a few of my favourite examples from the book by Gelbaum and Olmsted.

A function

is locally bounded when for each point in its domain one can find a neighbourhood

such that

is bounded on

. Now consider the function f:mathbf{R}tomathbf{R}

defined by

Now assume that we could locally bound

(some open neighbourhood in mathbf{R}

), that means that

is bounded on such a neighbourhood and hence, so is

. However, that would imply that

only contains finitely many rational numbers! We plot

(approximately) below over [0,1]

for just

points and observe precisely what one would expect.

This example is not that surprising, but it nicely illustrates the pitfall of thinking of sets as ‘‘blobs’’ (in Dutch we would say potatoes) in the plane. Indeed, the counterexample can be interpreted as an asymptotic result. Consider A={(x,y):xy=1}

, this set is closed as (x,y)mapsto xy

is continuous. Then, take B={(x,y):y=0}

, this set is again closed and clearly

and

are disjoint. However, since we can write y=1/x

, then as $lim_{xto infty}1/x=0$ , we find that

approaches

arbitrarily closely. One can create many more examples of this form, e.g., using logarithms.

This last example is again not that surprising if you are used to go beyond Euclidean spaces. Let (X,d)

be a metric space under the metric

Now assume that

is not simply a singleton. Then, let O={yin X:d(x,y)<1}

be the open metric ball of radius

centered at

and similarly, let C={yin X:d(x,y)leq 1}

be the closed metric ball of radius

centered at

. Indeed,

whereas

. Now to construct the closure of

, we must first understand the topology induced by the metric

. In other words, we must understand the limit points of

. Indeed,

is the discrete metric and this metric induces the discrete topology on

. Hence, it follows — which is perhaps the counterintuitive part, that mathrm{cl}(O)=O

and hence

. This example also nicely shows a difference between metrics and norms. As norms satisfy |alpha x|=|alpha||x|

(absolute homogeneity) one cannot have the discontinuous behaviour from the example above.

Further examples I particularly enjoyed are a discontinuous linear function, a function that is discontinuous in two variables, yet continuous in both variables separately and all examples related to Cantor. This post is inspired by TAing for Analysis 1 and will be continued in the near future.

On ‘‘the art of the state’’ |29 December 2022|
tags: math.DS

In the last issue of the IEEE Control Systems Magazine, Prof. Sepulchre wrote an exhilarating short piece on the art of the state, or as he (also) puts it ‘‘the definite victory of computation over modeling in science and engineering’’. As someone with virtually no track record, I am happy to also shed my light on the matter.

We owe dynamical systems theory as we know it today largely to Henri Poincaré (1854-1912), whom started this in part being motivated by describing the solar system. Better yet, the goal was to eventually understand its stability. However, we can be a bit more precise regarding what was driving all this work. In the introduction of Les méthodes nouvelles de la mécanique céleste (EN: The new methods of celestial mechanics) - Vol I by Poincaré, published in 1892 (HP92), we find (loosely translated): ‘‘The ultimate goal of Celestial Mechanics is to resolve this great question of whether Newton's law alone explains all astronomical phenomena; the only way to achieve this is to make observations as precise as possible and then compare them with the results of the calculation.’’ and then after a few words on divergent series: ‘‘these developments should attract the attention of geometers, first for the reasons I have just explained and also for the following: the goal of Celestial Mechanics is not reached when one has calculated more or less approximate ephemerides without being able to realize the degree of approximation obtained. If we notice, in fact, a discrepancy between these ephemerides and the observations, we must be able to recognize if Newton's law is at fault or if everything can be explained by the imperfection of the theory.’’ It is perhaps interesting to note that his 1879 thesis entitled: Sur les propriétés des fonctions définies par les équations aux différences partielles (EN: On the properties of functions defined by partial difference equations) was said to lack examples (p.109 FV12).

In the 3 Volumes and well over a 1000 pages of the new celestial mechanics Poincaré develops a host of tools towards the goal sketched above. Indeed, this line of work contributed to the popularizing of the infamous equation

$frac{mathrm{d}}{mathrm{d}t}x(t) = f(x(t)),quad (1)$

or as used in control theory: x^+=f(x,u)

, for

some ‘‘update’’ operator. However, I would like to argue that the main issue with Equation (1)

is not that it is frequently used, but how it is used, moving from the qualitative to the quantitative.

Unfortunately we cannot ask him, but I believe Poincaré would wholeheartedly agree with Sepulchre. Inspired by the beautiful book by Prof. Verhulst (FV12) I will try to briefly illustrate this claim. As put by Verhulst (p.80 FV12): ‘‘In his writings, Poincaré was interested primarily in obtaining insight into physical reality and mathematical reality — two different things.’’ In fact, the relation between analysis and physics was also the subject of Poincaré’s plenary lecture at the first International Congress of Mathematicians, in Zurich (1897). (p.91 FV12).

In what follows we will have a closer look at La Valeur de la Science (EN: The value of science) by Poincaré, published in 1905 (HP05). This is one of his more philosophical works. We already saw how Poincaré was largely motivated by a better understanding of physics. We can further illustrate how he — being one of the founders of many abstract mathematical fields — thought of physics. To that end, consider the following two quotes:

’’when physicists ask of us the solution of a problem, it is not a duty-service they impose upon us, it is on the contrary we who owe them thanks.’’ (p.82 HP05)

‘‘Fourier's series is a precious instrument of which analysis makes continual use, it is by this means that it has been able to represent discontinuous functions; Fourier invented it to solve a problem of physics relative to the propagation of heat. If this problem had not come up naturally, we should never have dared to give discontinuity its rights; we should still long have regarded continuous functions as the only true functions.’’ (p.81 HP05)

What is more, this monograph contains a variety of discussions of the form: ’’If we assume A, B and C, what does this imply?’’ Both mathematically and physically. For instance, after a discussion on Carnot's Theorem (principle) — or if you like the ramifications of a global flow — Poincaré writes the following:

‘‘On this account, if a physical phenomenon is possible, the inverse phenomenon should be equally so, and one should be able to reascend the course of time. Now, it is not so in nature, …’’ (p.96 HP05)

However, Poincaré continues and argues that although notions like friction seem to lead to an irreversible process this is only so because we look at the process through the wrong lens.

‘‘For those who take this point of view, Carnot's principle is only an imperfect principle, a sort of concession to the infirmity of our senses; it is because our eyes are too gross that we do not distinguish the elements of the blend; it is because our hands are too gross that we can not force them to separate; the imaginary demon of Maxwell, who is able to sort the molecules one by one, could well constrain the world to return backward.’’ (p.97 HP05)

Indeed, the section is concluded by stating that perhaps we should look through the lens of statistical mechanics. All to say that if the physical reality is ought the be described, this must be on the basis of solid principles. Let me highlight another principle: conservation of energy. Poincaré writes the following:

‘‘I do not say: Science is useful, because it teaches us to construct machines. I say: Machines are useful, because in working for us, they will some day leave us more time to make science.’’ (p.88 HP05)

‘‘Well, in regard to the universe, the principle of the conservation of energy is able to render us the same service. The universe is also a machine, much more complicated than all those of industry, of which almost all the parts are profoundly hidden from us; but in observing the motion of those that we can see, we are able, by the aid of this principle, to draw conclusions which remain true whatever may be the details of the invisible mechanism which animates them.’’ (p.94 HP05)

‘‘At least, the principle of the conservation of energy yet remained to us, and this seemed more solid. Shall I recall to you how it was in its turn thrown into discredit?’’ (p.104 HP05)

Poincaré his viewpoint is to some extent nicely captured by the following piece in the book by Verhulst:

‘‘Consider a mature part of physics, classical mechanics (Poincaré 02). It is of interest that there is a difference in style between British and continental physicists. The former consider mechanics an experimental science, while the physicists in continental Europe formulate classical mechanics as a deductive science based on a priori hypotheses. Poincaré agreed with the English viewpoint, and observed that in the continental books on mechanics, it is never made clear which notions are provided by experience, which by mathematical reasoning, which by conventions and hypotheses.’’ (p.89 FV12)

In fact, for both mathematics and physics Poincaré spoke of conventions, to be understood as a convenience in physics. Overall, we see that Poincaré his attitude to applying mathematical tools to practical problems is significantly more critical than what we see nowadays in some communities. For instance:

‘‘Should we simply deduce all the consequences, and regard them as intangible realities? Far from it; what they should teach us above all is what can and what should be changed.’’ (p.79 HP05)

A clear example of this viewpoint appears in his 1909 lecture series at Göttingen on relativity:

‘‘Let us return to the old mechanics that admitted the principle of relativity. Instead of being founded on experiment, its laws were derived from this fundamental principle. These considerations were satisfactory for purely mechanical phenomena, but the old mechanics didn’t work for important parts of physics, for instance optics.’’ (p.209 FV12)

Now it is tempting to believe that Poincaré would have liked to get rid of ‘‘the old’’, but this is far from the truth.

‘‘We should not have to regret having believed in the principles, and even, since velocities too great for the old formulas would always be only exceptional, the surest way in practise would be still to act as if we continued to believe in them. They are so useful, it would be necessary to keep a place for them. To determine to exclude them altogether would be to deprive oneself of a precious weapon.’’ (p.111 HP05)

The previous was about the ‘‘old’’ and ‘‘new’’ mechanics and how differences in mathematical and physical realities might emerge. Most and for all, the purpose of the above is to show that one of the main figures in abstract dynamical system theory — someone that partially found the field motivated by a computational question — greatly cared about correct modelling.

This is one part of the story. The second part entails the recollection that Poincaré his work was largely qualitative. Precisely this qualitative viewpoint allows for imperfections in the modelling process. However, this also means conclusions can only be qualitative. To provide two examples; based on the Hartman-Grobman theorem and the genericity of hyperbolic equilibrium points it is perfectly reasonable to study linear dynamical systems, however, only to understand local qualitative behaviour of the underlying nonlinear dynamical system. The aim to draw quantitative conclusions is often futile and simply not the purpose of the tools employed. Perhaps more interesting and closer to Poincaré, the Lorenz system, being a simplified mathematical model for atmospheric convection, is well-known for being chaotic. Now, weather-forecasters do use this model, not to predict the weather, but to elucidate why their job is hard!

Summarizing, I believe the editorial note by Prof. Sepulchre is very timely as some have forgotten that the initial dynamical systems work by Poincaré and others was to get a better understanding, to adress questions, to learn from, not to end with. Equation (1) is not magical, but a tool to build the bridge, a tool to see if and where the bridge can be built.

On the complexity of matrix multiplications |10 October 2022|
tags: math.LA

Recently, DeepMind published new work on ‘‘discovering’’ numerical linear algebra routines. This is remarkable, but I leave the assesment of this contribution to others. Rather, I like to highlight the earlier discovery of Strassen. Namely, how to improve upon the obvious matrix multiplication algorithm.

To get some intuition, consider multiplying two complex numbers $(a_1+ib_1),(a_2+ib_2)in mathbf{C}times mathbf{C}$ , that is, construct a_1a_2-b_1b_2+i(b_1a_2+a_1b_2)

. In this case, the obvious thing to do would lead to 4 multiplications, namely a_1cdot a_2

and

. However, consider using only (a_1+b_1)cdot (a_2+b_2)

and

. These multiplications suffice to construct (a_1+ib_1)cdot (a_2+ib_2)

(subtract the latter two from the first one). The emphasis is on multiplications as it seems to be folklore knowledge that multiplications (and especially divisions) are the most costly. However, note that we do use more additive operations. Are these operations not equally ‘‘costly’’? The answer to this question is delicate and still changing!

When working with integers, additions are indeed generally ‘‘cheaper’’ than multiplications. However, oftentimes we work with floating points. Roughly speaking, these are numbers fin mathbf{Q}

characterized by some (signed) mantissa

, a base

and some (signed) exponent

, specifically, f=mb^e

. For instance, consider the addition $1.2345cdot 10^{-1}+1.2345cdot 10^{-4}$ . To perform the addition, one normalizes the two numbers, e.g., one writes $1.2345cdot 10^{-1}+0.0012345cdot 10^{-1}$ . Then, to perform the addition one needs to take the appropriate precision into account, indeed, the result is $1.2357cdot 10^{-1}$ . All this example should convey is that we have sequential steps. Now consider multiplying the same numbers, that is, $(1.2345cdot 10^{-1})cdot(1.2345cdot 10^{-4})$ . In this case, one can proceed with two steps in parallel, on the one hand (1.2345)^2

and on the other hand $10^{-1-4}$ . Again, one needs to take the correct precision into account in the end, but we see that by no means multiplication must be significantly slower in general!

$A = left[begin{array}{ll} A_{11} & A_{12} A_{21} & A_{22} end{array}right],quad B = left[begin{array}{ll} B_{11} & B_{12} B_{21} & B_{22} end{array}right],$

$C = left[begin{array}{ll} C_{11} & C_{12} C_{21} & C_{22} end{array}right]= left[begin{array}{ll} A_{11}B_{11}+A_{12}B_{21} & A_{11}B_{12}+A_{12}B_{22} A_{21}B_{11}+A_{22}B_{21} & A_{21}B_{12}+A_{22}B_{22} end{array}right].$

Indeed, the most straightforward matrix multiplication algorithm for $Ain mathbf{R}^{ntimes n}$ and $Bin mathbf{R}^{ntimes n}$ costs you n^3

multiplications and n^2(n-1)

additions. Using the intuition from the complex multiplication we might expect we can do with fewer multiplications. In 1969 (Numer. Math. 13 354–356) Volker Strassen showed that this is indeed true.

Following his short, but highly influential, paper, let $mathrm{I}=(A_{11}+A_{22})(B_{11}+B_{22})$ , $mathrm{II}=(A_{21}+A_{22})B_{11}$ , $mathrm{III}=A_{11}(B_{12}-B_{22})$ , $mathrm{IV}=A_{22}(-B_{11}+B_{21})$ , $mathrm{V}=(A_{11}+A_{12})B_{22}$ , $mathrm{VI}=(-A_{11}+A_{21})(B_{11}+B_{12})$ and $mathrm{VII}=(A_{12}-A_{22})(B_{21}+B_{22})$ . Then, $C_{11}=mathrm{I}+mathrm{IV}-mathrm{V}+mathrm{VII}$ , $C_{12}=mathrm{II}+mathrm{IV}$ , $C_{12}=mathrm{III}+mathrm{V}$ and $C_{22}=mathrm{I}+mathrm{III}-mathrm{II}+mathrm{VI}$ . See that we need

multiplications and

additions to construct

for both

and

being

-dimensional. The ‘‘obvious’’ algorithm would cost

multiplications and

additions. Indeed, one can show now that for matrices of size 2^k

one would need 7^k

multiplications (via induction). Differently put, one needs $n^{log_2(7)}approx n^{2.8}$ multiplications. This bound prevails when including all operations, but only asymptotically, in the sense that $O(n^{2.8})<O(n^3)$ . In practical algorithms, not only multiplications, but also memory and indeed additions play a big role. I am particularly interested how the upcoming DeepMind algorithms will take numerical stability into account.

A corollary |6 June 2022|
tags: math.DS

The lack of mathematically oriented people in the government is an often observed phenomenon |1|. In this short note we clarify that this is a corollary to ‘‘An application of Poincare's recurrence theorem to adademic adminstration’’, by Kenneth R. Meyer. In particular, since the governmental system is qualitatively the same as that of academic administration, it is also conservative. However - due to well-known results also pioneered by Poincare - this implies the following.

Co-observability |6 January 2022|
tags: math.OC

A while ago Prof. Jan H. van Schuppen published his book Control and System Theory of Discrete-Time Stochastic Systems. In this post I would like to highlight one particular concept from the book: (stochastic) co-observability, which is otherwise rarely discussed.

We start with recalling observability. Given a linear-time-invariant system with $xin mathbf{R}^n$ , $yin mathbf{R}^p$ :

$Sigma :left{ begin{array}{ll} x(t+1) &= Ax(t),quad x(0)=x_0 y(t) &= Cx(t) end{array}right.$

one might wonder if the initial state x_0

can be recovered from a sequence of outputs y(0),dots,y(k)

. (This is of great use in feedback problems.) By observing that y(0)=Cx(0)

one is drawn to the observability matrix

$mathcal{O}= left( begin{array}{l} C CA vdots CA^{n-1} end{array} right)$

Without going into detectability, when mathcal{O}

is full-rank, one can (uniquely) recover x_0

(proof by contradiction). If this is the case, we say that Sigma

, or equivalenty the pair (A,C)

, is observable. Note that using a larger matrix (more data) is redudant by the Cayley-Hamilton theorem (If mathcal{O}

would not be full-rank, but by adding CA^n

“from below” it would somehow be full-rank, one would contradict the Cayley-Hamilton theorem.). Also note that in practice one does not “invert” mathcal{O}

but rather uses a (Luenberger) observer (or a Kalman filter).

Now lets continue with a stochastic (Gaussian) system, can we do something similar? Here it will be even more important to only think about observability matrices as merely tools to assert observability. Let $v:Omegatomathbf{R}^v$ be a zero-mean Gaussian random variable with covariance Q_v

and define for some matrices

and

the stochastic system

$Sigma_G :left{ begin{array}{ll} x(t+1) &= Ax(t)+Mv(t),quad x(0)=x_0 y(t) &= Cx(t)+Nv(t) end{array}right.$

We will assume that

is asymptotically (exponentially stable) such that the Lyapunov equation describing the (invariant) state-covariance is defined:

$Q_x = AQ_xA^{mathsf{T}}+MQ_vM^{mathsf{T}}.$

A convenient tool to analyze Sigma_G

will be the characteristic function of a Gaussian random variable

, defined as varphi(omega)=mathbf{E}[mathrm{exp}(ilangle omega, Xrangle)]

. It can be shown that for a Gaussian random variable sim G(mu,Q)

$varphi(omega)=mathrm{exp}(ilangle omega,murangle - frac{1}{2}langle omega, Qomega rangle)$ . With this notation fixed, we say that $Sigma_{G}$ is stochastically observable on the internal t,dots,t+k

if the map

$x(t)mapsto mathbf{E}[mathrm{exp}(ilangle omega,bar{y}rangle|mathcal{F}^{x(t)}],quad bar{y}=(y(t),dots,y(t+k))in mathbf{R}^{pcdot(k+1)}quad forall omega$

is injective on the support of

(note the

). The intuition is the same as before, but now we want the state to give rise to an unique (conditional) distribution. At this point is seems rather complicated, but as it turns out, the conditions will be similar to ones from before. We start by writing down explicit expressions for bar{y}

, as

$y(t+s) = CA^s x(t) + sum^{t+s-1-i}_{i=0}CA^i M v(t+s-1-i) + N v(t+s)$

$bar{y} = mathcal{O}_k x(t) + mathcal{M}_k bar{v},$

for $mathcal{O}_k$ the observability matrix corresponding to the data (length) of bar{y}

, $mathcal{M}_k$ a matrix containing all the noise related terms and bar{v}

a stacked vector of noises similar to bar{y}

. It follows that $x(t)mapsto mathbf{E}[mathrm{exp}(ilangle omega,bar{y})|mathcal{F}^{x(t)}]$ is given by $mathrm{exp}(ilangle omega, mathcal{O}_k x(t)rangle - frac{1}{2}langle omega, mathcal{Q}_komega rangle)$ , for $mathcal{Q}_k = mathbf{E}[mathcal{M}_kbar{v}bar{v}^{mathsf{T}}mathcal{M}_k^{mathsf{T}}]$ . Injectivity of this map clearly relates directly to injectivity of $x(t)mapsto mathcal{O}_k x(t)$ . As such (taking the support into account), a neat characterization of stochastic observability is that $mathrm{ker}(mathcal{O}_kQx)=mathrm{ker}(Q_x)$ .

Then, to introduce the notion of stochastic co-observability we need to introduce the backward representation of a system. We term system representations like Sigma

and

“forward” representations as tto +infty

. Assume that Q_xsucc 0

, then see that the forward representation of a system matrix, denoted A^f

is given by $A^f = mathbf{E}[x(t+1)x(t)^{mathsf{T}}]Q_x^{-1}$ . In a similar vein, the backward representation is given by $A^b=mathbf{E}[x(t-1)x(t)^{mathsf{T}}]Q_x^{-1}$ . Doing the same for the output matrix

yields $C^b=mathbf{E}[y(t-1)x(t)^{mathsf{T}}]Q_x^{-1}$ and thereby the complete backward system

$Sigma^b_G :left{ begin{array}{ll} x(t-1) &= A^bx(t)+Mv(t),quad x(0)=x_0 y(t-1) &= C^bx(t)+Nv(t) end{array}right.$

Note, to keep

and

fixed, we adjust the distribution of

. Indeed, when Q_x

is not full-rank, the translation between forward and backward representations is not well-defined. Initial conditions $x_0in mathrm{ker}(Q_x)$ cannot be recovered.

To introduce co-observability, ignore the noise for the moment and observe that y(t-1)=C^bx(t)

, and so forth. We see that when looking at observability using the backward representation, we can ask if it possible to recover the current state using past outputs. Standard observability looks at past states instead. With this in mind we can define stochastic co-observability on the interval t-k-1,dots,t-1

be demanding that the map

$x(t)mapsto mathbf{E}[mathrm{exp}(ilangle omega,bar{y}^brangle|mathcal{F}^{x(t)}],quad bar{y}^b=(y(t-1),dots,y(t-k-1))in mathbf{R}^{pcdot(k+1)}quad forall omega$

is injective on the support of

(note the

). Of course, one needs to make sure that y(t-k-1)

is defined. It is no surprise that the conditions for stochastic co-observability will also be similar, but now using the co-observability matrix. What is however remarkable, is that these notions do not always coincide.

Lets look at when this can happen and what this implies. One reason to study these kind of questions is to say something about (minimal) realizations of stochastic processes. Simply put, what is the smallest (as measured by the dimension of the state

) system

that gives rise to a certain output process. When observability and co-observability do not agree, this indicates that the representation is not minimal. To get some intuition, we can do an example as adapted from the book. Consider the scalar (forward) Gaussian system

$Sigma^f_G :left{ begin{array}{ll} x(t+1) &= ax(t)+mv(t),quad x(0)=x_0,quad ain (-1,1) setminus{0},quad m=(a^2-1)/a y(t) &= x(t)+v(t) end{array}right.$

for

. The system is stochastically observable as cneq 0

and

. Now for stochastic co-observability we see that $c^b=mathbf{E}[y(t-1)x(t)]q_x^{-1}]=0$ , as such the system is not co-observable. What this shows is that y(t)

behaves as a Gaussian random variable, no internal dynamics are at play and such a minimal realization is of dimension n=0

Fair convex partitioning |26 September 2021|
tags: math.OC

When learning about convex sets, the definitions seem so clean that perhaps you think all is known what could be known about finite-dimensional convex geometry. In this short note we will look at a problem which is still largely open beyond the planar case. This problem is called the fair partitioning problem.

In the

-dimensional case, the question is the following, given any integer

, can a convex set $mathcal{K}subset mathbf{R}^2$ be divided into

convex sets all of equal area and perimeter. Differently put, does there exist a fair convex partitioning, see Figure 1.

As you can see, this work was updated just a few months ago. The general proof is involved, lets see if we can do the case for a compact set and n=2

First of all, when splitting mathcal{K}

(into just two sets) you might think of many different methods to do so. What happens when the line is curved? The answer is that when the line is curved, one of the resulting sets must be non-convex, compare the options in Figure 2.

This observation is particularly useful as it implies we only need to look at two points on the boundary of mathcal{K}

(and the line between them). As mathcal{K}

is compact we can always select a cut such that the resulting perimeters of $mathcal{K}_1$ and $mathcal{K}_2$ are equal.

Let us assume that the points

and

in Figure 3.(i) are like that. If we start moving them around with equal speed, the resulting perimeters remain fixed. Better yet, as the cut is a straight-line, the volumes (area) of the resulting set $mathcal{K}_1$ and $mathcal{K}_2$ change continuously. Now the result follows from the Intermediate Value Theorem and seeing that we can flip the meaning of $mathcal{K}_1$ and $mathcal{K}_2$ , see Figure 3.(iii).

Solving Linear Programs via Isospectral flows |05 September 2021|
tags: math.OC, math.DS, math.DG

In this post we will look at one of the many remarkable findings by Roger W. Brockett. Consider a Linear Program (LP)

$mathrm{maximize}_{xin X}quad langle c,x ranglequad (1)$

parametrized by the compact set $X={xin mathbf{R}^n:Axleq b}$ and a suitable triple (A,b,c)

. As a solution to (1)

can always be found to be a vertex of

, a smooth method to solve (1)

seems somewhat awkward. We will see that one can construct a so-called isospectral flow that does the job. Here we will follow Dynamical systems that sort lists, diagonalize matrices and solve linear programming problems, by Roger. W. Brockett (CDC 1988) and the book Optimization and Dynamical Systems edited by Uwe Helmke and John B. Moore (Springer 2ed. 1996). Let

have

vertices, then one can always find a map $T:mathbf{R}^kto mathbf{R}^n$ , mapping the simplex $S={xin mathbf{R}_{geq 0}^k:sum_{j}x_j=1}$ onto

. Indeed, with some abuse of notation, let

be a matrix defined as T=(v_1,dots,v_k)

, for ${v_j}_{j=1}^k$ , the vertices of

Before we continue, we need to establish some differential geometric results. Given the Special Orthogonal group $mathsf{SO}(n)={Qin mathbf{R}^{ntimes n}:Q^{mathsf{T}}Q=I_n}cap mathsf{GL}^+(n,mathbf{R})$ , the tangent space is given by $T_Q mathsf{SO}(n)={QOmega : Omegain mathrm{skew}(n,mathbf{R})}$ . Note, this is the explicit formulation, which is indeed equivalent to shifting the corresponding Lie Algebra. The easiest way to compute this is to look at the kernel of the map defining the underlying manifold.

Now, following Brockett, consider the function f:mathsf{SO}(n)to mathbf{R}

defined by $f:Thetamapsto mathrm{Tr}(QTheta NTheta^{mathsf{T}})$ for some Q,Nin mathsf{Sym}(n)

. This approach is not needed for the full construction, but it allows for more intuition and more computations. To construct the corresponding gradient flow, recall that the (Riemannian) gradient at Thetain mathsf{SO}(n)

is defined via $df(Theta)[V]=langle mathrm{grad},f(Theta), Vrangle_{Theta}$ for all $Vin T_{Theta} mathsf{SO}(n)$ . Using the explicit tangent space representation, we know that V=Theta Omega

with $Omega = -Omega^{mathsf{T}}$ . Then, see that by using

$f(Theta+tV) = mathrm{Tr}(QTheta(I_n+tOmega)N(I_n-tOmega)Theta^{mathsf{T}})$

$df(Theta)[V]=lim_{tdownarrow 0}frac{f(Theta+tV)-f(Theta)}{t} = langle QTheta N, Theta Omega rangle - langle Theta N Theta^{mathsf{T}} Q Theta, Theta Omega rangle.$

This means that (the minus is missing in the paper) the (a) gradient flow becomes

$dot{Theta} = mathrm{grad},f(Theta) = QTheta N-Theta NTheta^{mathsf{T}}QTheta, quad Theta(0)in mathsf{SO}(n).$

Consider the standard commutator bracket [A,B]=AB-BA

and see that for $H(t)=Theta(t)^{mathsf{T}}QTheta(t)$ one obtains from the equation above (typo in the paper)

$dot{H}(t) = [H(t),[H(t),N]],quad H(0)=Theta^{mathsf{T}}QThetaquad (2).$

Hence,

can be seen as a reparametrization of a gradient flow. It turns out that (2)

has a variety of remarkable properties. First of all, see that H(t)

preserves the eigenvalues of

. Also, observe the relation between extremizing

and the function

defined via $g:H=Theta^{mathsf{T}}QTheta mapsto -frac{1}{2}|N-H|_F^2$ . The idea to handle LPs is now that the limiting H(t)

will relate to putting weight one the correct vertex to get the optimizer,

gives you this weight as it will contain the corresponding largest costs.

In fact, the matrix

can be seen as an element of the set $mathsf{M}(Q)={Theta^{mathsf{T}}QTheta:Thetain mathsf{O}(n)}$ . This set is in fact a $C^{infty}$ -smooth compact manifold as it can be written as the orbit space corresponding to the group action $sigma:mathsf{O}(n)times mathbf{R}^{ntimes n}to mathsf{R}^{ntimes}$ , $sigma:(Theta,Q)mapsto Theta^{mathsf{T}}QTheta$ , one can check that this map satisfies the group properties. Hence, to extremize

over

, it appears to be appealing to look at Riemannian optimization tools indeed. When doing so, it is convenient to understand the tangent space of mathsf{M}(Q)

. Consider the map defining the manifold h:mathsf{O}(n)to mathsf{M}(Q)

, $h:Theta mapsto Theta^{mathsf{T}}QTheta$ . Then by the construction of $T_Theta mathsf{O}(n)$ , see that dh(Theta)[V]=0

yields the relation [H,Omega]=0

for any

For the moment, let $Q=mathrm{diag}(lambda_{1}I_{n_1},dots,lambda_{r}I_{n_r})in mathsf{Sym}(n)$ such that $lambda_{1}>cdots>lambda_{r}$ and $sum_{n_i}n_i=n$ . First we consider the convergence of (2)

. Let

have only distinct eigenvalues, then $H_{infty}:=lim_{tto infty}H(t)$ exists and is diagonal. Using the objective

from before, consider f(H)=mathrm{Tr}(HN)

and see that by using the skew-symmetry one recovers the following

$begin{array}{lll} frac{d}{dt}mathrm{Tr}(H(t)N) &=& mathrm{Tr}(N [H,[H,N]]) &=& -mathrm{Tr}((HN-NH)^2) &=& |HN-NH|_F^2. end{array}$

This means the cost monotonically increases, but by compactness converges to some point $H_{infty}$ . By construction, this point must satisfy $[H_{infty},N]=0$ . As

has distinct eigenvalues, this can only be true if $H_{infty}$ itself is diagonal (due to the distinct eigenvalues).

More can be said about $H_{infty}$ , let (lambda_1,dots,lambda_n)

be the eigenvalues of H(0)

, that is, they correspond to the eigenvalues of

as defined above. Then as H(t)

preserves the eigenvalues of

(

), we must have $H_{infty}=pi Q pi^{mathsf{T}}$ , for

a permutation matrix. This is also tells us there is just a finite number of equilibrium points (finite number of permutations). We will write this sometimes as $H_{infty}=mathrm{diag}(lambda_{pi(1)},dots,lambda_{pi(n)})$ .

Now as

is one of those points, when does H(t)

converge to

? To start this investigation, we look at the linearization of (2)

, which at an equilibrium point $H_{infty}$ becomes

$dot{xi}_{ij} = -(lambda_{pi(i)}-lambda_{pi(j)})(mu_i-mu_j)xi_{ij}$

for $xiin T_{H_{infty}}mathsf{M}(Q)$ . As we work with matrix-valued vector fields, this might seems like a duanting computation. However, at equilibrium points one does not need a connection and can again use the directional derivative approach, in combination with the construction of $T_{H_{infty}}mathsf{M}(Q)$ , to figure out the linearization. The beauty is that from there one can see that

is the only asymptotically stable equilibrium point of (2)

. Differently put, almost all initial conditions H(0)in mathsf{M}(Q)

will converge to

with the rate captured by spectral gaps in

and

. If

does not have distinct eigenvalues and we do not impose any eigenvalue ordering on

, one sees that an asymptotically stable equilibrium point $H_{infty}$ must have the same eigenvalue ordering as

. This is the sorting property of the isospectral flow and this is of use for the next and final statement.

Theorem: Consider the LP (1)

with

for all

, then, there exist diagonal matrices

and

such that

converges for almost any H(0)in mathsf{M}(Q)

to $H_{infty}=mathrm{diag}(d)$ with the optimizer of (1)

being $x^{star}=Td$ .

Proof: Global convergence is prohibited by the topology of mathsf{M}(Q)

. Let $N=mathrm{diag}(T^{mathsf{T}}c)$ and let Q=(1,0,dots,0)in mathsf{Sym}(k)

. Then, the isospectral flow will converge from almost everywhere to $H_{infty}=mathrm{diag}(0,dots,0,1,0,dots,0)$ (only $(H_{infty})_{ii}=1$ ), such that $x^{star}=v_i$ .

The Imaginary trick |12 March 2021|
tags: math.OC

Most of us will learn at some point in our life that sqrt{-1}

is problematic, as which number multiplied by itself can ever be negative? To overcome this seemingly useless defiency one learns about complex numbers and specifically, the imaginary number

, which is defined to satisfy i^2=-1

. At this point you should have asked yourself when on earth is this useful?

In this short post I hope to highlight - from a slightly different angle most people grounded in physics would expect - that the complex numbers are remarkably useful.

Complex numbers, and especially the complex exponential, show up in a variety of contexts, from signal processing to statistics and quantum mechanics. With of course most notably, the Fourier transformation.

We will however look at something completely different. It can be argued that in the late 60s James Lyness and Cleve Moler brought to life a very elegant new approach to numerical differentiation. To introduce this idea, recall that even nowadays the most well-known approach in numerical differentiation is to use some sort of finite-difference method, for example, for any $fin C^1(mathbf{R};mathbf{R})$ one could use the central-difference method

$partial_x f(x) = frac{f(x+h)-f(x-h)}{2h} + O(h^2)quad (1).$

Now one might be tempted to make

extremely small, as then the error must vanish! However, numerically, for a very small

the two function evaluations f(x+h)

and

will be indistinguishable. So although the error scales as O(h^2)

there is some practical lower bound on this error based on the machine precision of your computer. One potential application of numerical derivatives is in the context of zeroth-order (derivative-free) optimization. Say you want to adjust van der Poel his Canyon frame such that he goes even faster, you will not have access to explicit gradients, but you can evaluate the performance of a change in design, for example in a simulator. So what you usually can obtain is a set of function evaluations f(x_0),f(x_1),dots,f(x_K)

. Given this data, a somewhat obvious approach is to mimick first-order algorithms

$x_{k+1}=x_k - eta_k nabla f(x_k)quad (2)$

where

is some stepsize. For example, one could replace nabla f(x_k)

by the central-difference approximation (1)

. Clearly, if the objective function

is well-behaved and the approximation of nabla f(x_k)

is reasonably good, then something must come out? As was remarked before, if your approximation - for example due to numerical cancellation errors - will always have a bias it is not immediate how to construct a high-performance zeroth-order optimization algorithm. Only if there was a way to have a numerical approximation without finite differencing?

Let us assume that f:mathbf{R}to mathbf{R}

is sufficiently smooth, let

be the imaginary number and consider the following series

$f(x+ih) = f(x) + partial_x f(x) ih - frac{1}{2}partial_x^2 f(x) h^2 - frac{1}{6}partial_x^3 f(x) ih^3 + O(h^4).$

$partial_x f(x) = frac{Imbig(f(x+ih) big)}{h}+O(h^2).$

So we see that by passing to the complex domain and projecting the imaginary part back, we obtain a numerical method to construct approximations of derivatives without even the possibility of cancellation errors. This remarkable property makes it a very attractive candidate to be used in zeroth-order optimization algorithms, which is precisely what we investigated in our new pre-print. It turns out that convergence is not only robust, but also very fast!

Topological Considersations in Stability Analysis |17 November 2020|
tags: math.DS

In this post we discuss very classical, yet, highly underappreciated topological results.

$dot{x}=f(x,u),quad uin mathcal{U},quad xin mathsf{M}subseteq mathbf{R}^n,$

As it turns out, topological properties of mathsf{M}

encode a great deal of fundamental control possibilities, most notably, if a continuous globally asymptotically stabilizing control law exists or not (usually, one can only show the negative). In this post we will work towards two things, first, we show that a technically interesting nonlinear setting is that of a dynamical system evolving on a compact manifold, where we will discuss that the desirable continuous feedback law can never exist, yet, the boundedness implied by the compactness does allow for some computations. Secondly, if one is determined to work on systems for which a continuous globally asymptotically stable feedback law exists, then a wide variety of existential sufficient conditions can be found, yet, the setting is effectively Euclidean, which is the most basic nonlinear setting.

Definition [Contractability]. Given a topological manifold mathsf{X}

. If the map $mathsf{id}_{mathsf{X}}:xmapsto x$ is null-homotopic, then,

is contractible.

To say that a space is contractible when it has no holes (genus

) is wrong, take for example the sphere. The converse is true, any finite-dimensional space with a hole cannot be contractible. See Figure 1 below.

Topological Obstructions to Global Asymptotic Stability

In this section we highlight a few (not all) of the most elegant topological results in stability theory. We focus on results without appealing to the given or partially known vector field

. This line of work finds its roots in work by Bhatia, Szegö, Wilson, Sontag and many others.

Theorem [The domain of attraction is contractible, Theorem 21] Let the flow varphi(t,x)

be continuous and let $x^{star}$ be an asymptotically stable equilibrium point. Then, the domain of attraction ${xin mathsf{M};:;lim_{tto infty}varphi (t,x)=x^{star}}$ is contractible.

This theorem by Sontag states that the underlying topology of a dynamical system, better yet, the space on which the dynamical system evolves, dictates what is possible. Recall that linear manifolds are simply hyperplanes, which are contractible and hence, as we all know, global asymptotic stability is possible for linear systems. However, if mathsf{M}

is not contractible, there does not exist a globally asymptotically stabilizable continuous-time system on mathsf{M}

, take for example any sphere.

The next example shows that one should not underestimate ‘‘linear systems’’ either.

Example [Dynamical Systems on ] Recall that $mathsf{GL}(n,mathbf{R}):={Xin mathbf{R}^{ntimes n}:mathrm{det}(X)neq 0}$ is a smooth n^2

-dimensional manifold (Lie group). Then, consider for some $Ain mathbf{R}^{ntimes n}$ , n>1

the (right-invariant) system

$dot{X}=AX,quad X_0in mathsf{GL}(n,mathbf{R}) quad (1)$

Indeed, the solution to (1) is given by $X(t)=mathrm{exp}(tcdot A)X_0 in mathsf{GL}(n,mathbf{R})=:mathsf{M}$ . Since this group consists out of two disjoint components there does not exist a matrix

, or continuous nonlinear map for that matter, which can make a vector field akin to (1) globally asymptotically stable. This should be contrasted with the closely related ODE dot{x}=Ax

, $xin mathbf{R}^n=:mathsf{M}$ . Even for the path connected component $mathsf{GL}^+(n,mathbf{R}):={Xin mathbf{R}^{ntimes n}:mathrm{det}(X)>0}$ , The theorem by Sontag obstructs the existence of a continuous global asymptotically stable vector field. This because the group is not simply connected for n>1

(This follows most easily from establishing the homotopy equivalence between $mathsf{GL}^+(n,mathbf{R})$ and mathsf{SO}(n)

via (matrix) Polar Decomposition), hence the fundamental group is non-trivial and contractibility is out of the picture. See that if one would pick

to be stable (Hurwitz), then for $X_0in mathsf{GL}^+(n,mathbf{R})$ we have $X(t)in mathsf{GL}^+(n,mathbf{R})$ , however $lim_{tto infty}X(t)notin mathsf{GL}^+(n,mathbf{R})$ .

Following up on Sontag, Bhat and Bernstein revolutionized the field by figuring out some very important ramifications (which are easy to apply). The key observation is the next lemma (that follows from intersection theory arguments, even for non-orientable manifolds).

Lemma [Proposition 1] Compact, boundaryless, manifolds are never contractible.

Clearly, we have to ignore

-dimensional manifolds here. Important examples of this lemma are the

-sphere $mathbf{S}^{n-1}$ and the rotation group mathsf{SO}(3)

as they make a frequent appearance in mechanical control systems, like a robotic arm. Note that the boundaryless assumption is especially critical here.

Now the main generalization by Bhat and Bernstein with respect to the work of Sontag is to use a (vector) bundle construction of mathsf{M}

. Loosely speaking, given a base mathsf{Q}

and total manifold mathsf{M}

is a vector bundle when for the surjective map pi:mathsf{M}to mathsf{Q}

and each

we have that the fiber $pi^{-1}(q)$ is a finite-dimensional vector space (see the great book by Abraham, Marsden and Ratiu Section 3.4 and/or the figure below).

Intuitively, vector bundles can be thought of as manifolds with vector spaces attached at each point, think of a cylinder living in $mathbf{R}^3$ , where the base manifold is $mathbf{S}^1subset mathbf{R}^2$ and each fiber is a line through $mathbf{S}^1$ extending in the third direction, as exactly what the figure shows. Note, however, this triviality only needs to hold locally.

A concrete example is a rigid body with $mathsf{M}=mathsf{SO}(3)times mathbf{R}^3$ , for which a large amount of papers claim to have designed continuous globally asymptotically stable controllers. The fact that this was so enormously overlooked makes this topological result not only elegant from a mathematical point of view, but it also shows its practical value. The motivation of this post is precisely to convey this message, topological insights can be very fruitful.

Theorem [Theorem 1] Let mathsf{K}

be a compact, boundaryless, manifold, being the base of the vector bundle pi:mathsf{M}to mathsf{K}

with

, then, there is no continuous vector field on mathsf{M}

with a global asymptotically stable equilibrium point.

Indeed, compactness of mathsf{K}

can also be relaxed to mathsf{K}

not being contractible as we saw in the example above, however, compactness is in most settings more tangible to work with.

Example [The Grassmannian manifold] The Grassmannian manifold, denoted by $mathsf{GR}_{k,n}$ , is the set of all

-dimensional subspaces $mathcal{S}subseteq mathbf{R}^n$ . One can identify $mathsf{GR}_{k,n}$ with the Stiefel manifold $mathsf{GF}_{k,n}$ , the manifold of all orthogonal

-frames, in the bundle sense of before, that is $pi:mathsf{GF}_{k,n}to mathsf{GR}_{k,n}$ such that the fiber $pi^{-1}(mathcal{S})$ represent all

-frames generating the subspace mathcal{S}

. In its turn, $mathsf{GF}_{k,n}$ can be indentified with the compact Lie group mathsf{O}(n)

(via a quotient), such that indeed $mathsf{GR}_{k,n}$ is compact. This manifold shows up in several optimization problems and from our continuous-time point of view we see that one can never find, for example, a globally converging gradient-flow like algorithm.

Example [Tangent Bundles] By means of a Lagrangian viewpoint, a lot of mechanical systems are of a second-order nature, this means they are defined on the tangent bundle of some mathsf{Q}

, that is, pi:Tmathsf{Q}to mathsf{Q}

. However, then, if the configuration manifold mathsf{Q}

is compact, we can again appeal to the Theorem by Bhat and Bernstein. For example, Figure 2 can relate to the manifold over which the pair (theta,dot{theta})

is defined.

Global Isomorphisms

One might wonder, given a vector field

over a manifold mathsf{M}

, since we need contractibility of mathsf{M}

for

to have a global asymptotically stable equilibrium points, how interesting are these manifolds? As it turns out, for ngeq 5

, which some might call the high-dimensional topological regime, contractible manifolds have a particular structure.

A topological space

being simply connected is equivalent to the corresponding fundamental group being trivial. However, to highlight the work by Stallings, we need a slightly less well-known notion.

Definition [Simply connected at infinity] The topological space mathsf{X}

is simply connected at infinity, when for any compact subset mathcal{K}subseteq mathsf{X}

there exists a mathcal{K}'

where

mathcal{K}subseteq mathcal{K}'subseteq mathsf{X}

such that

is simply connected.

This definition is rather complicated to parse, however, we can give a more tangible description. Let mathsf{X}

be a non-compact topological space, then, mathsf{X}

is said to have one end when for any compact mathcal{K}

there is a mathcal{K}'

where

such that

is connected. So, mathbf{R}

, as expected, fails to have one end, while $mathbf{R}^2$ does have one end. Now, Proposition 2.1 shows that if mathsf{X}

and

are simply connected and both have one end, then mathsf{X}times mathsf{Y}

is simply connected at infinity. This statement somewhat clarifies why dimension

and above are somehow easier to parse in the context of topology. A similar statement can be made about the cylindrical space mathsf{X}times mathbf{R}

Lemma [Product representation Proposition 2.4] Let mathsf{X}

and

be manifolds of dimension greater or equal than

. If

is contractible and mathrm{dim}(mathsf{M})geq 3

, then,

is simply connected at infinity.

Now, most results are usually stated in the context of a piecewise linear (PL) topology. By appealing to celebrated results on triangulization (by Whitehead) we can state the following.

Theorem [Global diffeomorphism Theorem 5.1] Let mathsf{M}

be a contractible C^r

-dimensional manifold which is simply connected at infinity. If ngeq 5

, then,

is diffeomorphic to $mathbf{R}^n$ .

You might say that this is all obvious, as in Figure 3, we can always do it, also for lower-dimensions. However, there is famous counterexample by Whitehead, which provided lots of motivation for the community:

‘‘There is an open, -dimensional manifold which is contractible, but not homeomorphic to $mathbf{R}^3$ .’’

In fact, this example was part of the now notorious program to solve the Poincaré conjecture. All in all, this shows that contractible manifolds are not that interesting from a nonlinear dynamical systems point of view, compact manifolds is a class of manifolds providing for more of a challenge.

Remark on the Stabilization of Sets

So far, the discussion has been on the stabilization of (equilibrium) points, now what if we want to stabilize a curve or any other desirable set, that is, a non-trivial set mathcal{X}

. It seems like the shape of these objects, with respect to the underlying topology of mathsf{M}

, must indicate if this is possible again? Let U_x

denote an open neighbourhood of some xin mathcal{X}

. In what follows, the emphasis is on $U_{mathcal{X}}:=cup_xU_x$ and

having the same topology. We will be brief and follow Moulay and Bhat.

Lemma [Theorem 5] Consider a closed-loop dynamical system given rise to a continuous flow. Suppose that the compact set mathcal{K}subseteq mathsf{M}

is asymptotically stable with domain of attraction mathcal{A}subseteq mathsf{M}

. Then,

is a weak deformation retract of mathcal{A}

In rather loose terms, mathcal{K}

being a weak deformation retract of mathcal{A}

means that one can continuously ‘‘retract’’ points from mathcal{A}

and morph them into the set mathcal{K}

. So, this works for $mathbf{S}^{n-1}$ and $mathbf{R}^nsetminus{0}$ , due to $mathbf{R}^n$ being punctured. See p.200 for more information on retracts. In Figure 4 below we show that since this lemma is mostly concerned with the local topology, the global topology is less important. For mathsf{M}

, the set

is not a deformation retract of the set $U_{mathcal{X}}$ such that $U_{mathcal{X}}$ can never be a region of attraction, this should be contrasted with the setting on the Torus $mathbf{T}^1$ .

The beauty of all of this is that even if you are not so sure about the exact dynamical system at hand, the underlying topology can already provide for some answers. In particular, to determine, what can, or cannot be done.

Endomorphisms, Tensors in Disguise |18 August 2020|
tags: math.LA

Unfortunately the radio silence remained, the good news is that even more cool stuff is on its way*!

For now, we briefly mention a very beautiful interpretation of (1,1)

-tensors. Given some vectorspace mathcal{V}

, let

be the set of linear endomorphisms from mathcal{V}

. Indeed, for finite-dimensional vector spaces one can just think of these maps as being parametrized by matrices. Now, we recall that (1,1)

-tensors are multilinear maps which take an element of mathcal{V}

and its dual space $mathcal{V}^{star}$ and map it to some real number, that is, $T^1_1:mathcal{V}times mathcal{V}^{star}to mathbf{R}$ .

The claim is that for finite-dimensional mathcal{V}

the set

is isomorphic to the set of (1,1)

-tensors $mathcal{T}_1^1(mathcal{V})$ . This is rather intuitive to see, let vin mathcal{V}

and $v^{star}in mathcal{V}^{star}$ , then, we can write $langle v^{star},T_1^1 vrangle$ and think of $T_1^1in mathcal{T}^1_1(mathcal{V})$ as a matrix. This can be formalized, as is done here by showing there is a bijective map $phi:mathrm{End}(V)to mathcal{T}^1_1(mathcal{V})$ . So, we can identify any T_1^1

with some $Ain mathbf{R}^{ntimes n}$ , that is $T_1^1(v,v^{star})=v^{star}(Av)=langle v^{star},Av rangle$ . Similarly, we can fix $v^{star}$ and identify T_1^1

with a linear map in the dual space, that is $T_1^1(v^{star})(v)=(A^{star}v^{star})(v)=langle A^{star}v^{star},vrangle$ . Regardless, both represent the same tensor evaluation, hence

$langle v^{star},Av rangle = langle A^{star}v^{star},v rangle,$

Now we use this relation, recall that discrete-time linear systems are nothing more than maps vmapsto Av

, that is, linear endomorphims, usually on $mathbf{R}^n$ . Using the tensor interpretation we recover what is sometimes called the ‘‘dual system’’ to vmapsto Av

, namely $v^{star}mapsto A^{top}v^{star}$ (suggestive notation indeed). Using more traditional notation for the primal: $x_{k+1}=Ax_k$ and dual: $z_{k+1}=A^{top}z_k$ system, we recover that $langle z_{k+1},x_k rangle = langle z_k, x_{k+1}rangle forall kin mathbf{Z}$ . Then, let z_0=w

and

for

and

being a left- and right-eigenvector of

, respectively. We see that this implies that mu_j^k langle w_j, v_i rangle = lambda_i^k langle w_j, v_i rangle

, for

the eigenvalue corresponding to w_j

and

the eigenvalue corresponding to v_i

. Hence, in general, the eigenspaces of

and $A^{top}$ are generically orthogonal.

In other words, we recover the geometric interpretation of mapping

-volumes under

versus $A^{top}$ .

Optimal coordinates |24 May 2020|
tags: math.LA, math.OC, math.DS

There was a bit of radio silence, but as indicated here, some interesting stuff is on its way.

In this post we highlight this 1976 paper by Mullis and Roberts on ’Synthesis of Minimum Roundoff Noise Fixed Point Digital Filters’.

It is known that the input-output behaviour of any sigma

, that is, the map from u(t)

is invariant under a similarity transform. To be explicit, the tuples (A,b,c,d)

and $(TAT^{-1},Tb,cT^{-1},d)$ , which correspond to the change of coordinates z(t)=Tx(t)

for some $Tin mathsf{GL}_n$ , give rise to the same input-output behaviour. Hence, one can define the equivalence relation sigma sim sigma'

by imposing that the input-output maps of sigma

and

are equivalent. By doing so we can consider the quotient $Sigma / mathsf{GL}_n$ . However, in practice, given a sigma

, is any

such that

equally useful? For example, the following

and

are similar, but clearly,

is preferred from a numerical point of view:

$A = left[begin{array}{ll} 0.5 & 10^9 0 & 0.5 end{array}right],quad A' = left[begin{array}{ll} 0.5 & 1 0 & 0.5 end{array}right].$

In what follows, we highlight the approach from Mullis and Roberts and conclude by how to optimally select $Tin mathsf{GL}_n$ . Norms will be interpreted in the ell_2

sense, that is, for ${x(t)}_{tin mathbf{N}}$ , $|x|^2=sum^{infty}_{t=0}|x(t)|_2$ . Also, in what follows we assume that x(0)=0

plus that

is stable, which will mean rho(A)<1

and that

corresponds to a minimal realization.

The first step is to quantify the error. Assume we have allocated m_i

bits at our disposal to present the $i^{mathrm{th}}$ component of x(t)

. These values for m_i

can differ, but we constrain the average by $sum^n_{i=1}m_i = nm$ for some

. Let

be a 'step-size’, such that our dynamic range of x_i(t)

is bounded by $pm mu 2^{m_i-1}$ (these are our possible representations). Next we use the modelling choices from Mullis and Roberts, of course, they are debatable, but still, we will infer some nice insights.

First, to bound the dynamic range, consider solely the effect of an input, that is, define f_i(t)

by $x(t)=sum^{infty}_{k=1}A^{k-1}b u(t-k)$ , $x_i(t)=sum^{infty}_{k=1}f_i(t)u(t-k)$ . Then we will impose the bound pm delta |f_i|

. In combination with the step size (quantization), this results in $delta^2 |f_i|^2 = (mu 2^{m_i-1})^2$ . Let

be a sequence of i.i.d. sampled random variables from mathcal{N}(0,1)

. Then we see that $mathrm{var}big(x_i(t)big)= |f_i|^2$ . Hence, one can think of delta

as scaling parameter related to the probability with which this dynamic range bound is true.

Next, assume that all the round-off errors are independent and have a variance of mu^2

. Hence, the worst-case variance of computating x_i(t+1)

. To evaluate the effect of this error on the output, assume for the moment that u(t)

is identically

. Then,

. Similar to f_i(t)

, define

as the $i^{mathrm{th}}$ component of cA^t

. As before we can compute the variance, this time of the full output signal, which yields $sigma_y^2:=mathrm{var}(y) = mu^2(n+1)sum^n_{i=1}|g_i|^2$ . Note, these expressions hinge on the indepedence assumption.

$begin{array}{lll} W^{(c)} &= AW^{(c)}A^{top} + bb^{top} &= sum^{infty}_{k=0} A^k bb^{top} (A^k)^{top}, W^{(o)} &= A^{top}W^{(o)} A + c^{top}c &=sum^{infty}_{k=0} (A^k)^{top}c^{top}c A. end{array}$

If we assume that the realization is not just minimal, but that (A,b)

is a controllable pair and that (A,c)

is an observable pair, then, $W^{(c)}succ 0$ and $W^{(o)}succ 0$ . Now the key observation is that $W^{(c)}_{ij} = langle f_i, f_j rangle$ and similarly $W^{(o)}_{ij} = langle g_i, g_j rangle$ . Hence, we can write $delta^2 |f_i|^2 = (mu 2^{m-1})^2$ as $delta^2 K_ii = (mu 2^{m-1})^2$ and indeed $sigma_y^2 = mu^2(n+1)sum^n_{i=1}W_{ii}$ . Using these Lyapunov equations we can say goodbye to the infinite-dimensional objects.

To combine these error terms, let say we have apply a coordinate transformation x'=Tx

, for some $Tin mathsf{GL}_n$ . Specifically, let

be diagonal and defined by $T_{ii}=2delta sqrt{K_{ii}}/(mu 2^m)$ . Then one can find that $K'_{ii}=K_{ii}/T_{ii}^2$ , $W'_{ii}=W_{ii}T_{ii}^2$ . Where the latter expressions allows to express the output error (variance) by

$sigma_y^2 = (n+1)delta^2 sum^n_{i=1}frac{W^{(c)}_{ii}W^{(o)}_{ii}}{(2^{m_i})^2}.$

Now we want to minimize sigma_y^2

over all

plus some optimal selection of m_i

. At this point it looks rather painful. To make it work we first reformulate the problem using the well-known arithmetic-geometric mean inequality $^{[1]}$ for non-negative sequences ${a_i}_{iin [n]}$ :

$frac{1}{n} sum^n_{i=1}a_i geq left( prod^n_{i=1}a_i right)^{1/n}.$

$sigma_y^2 = n(n+1)delta^2 frac{1}{n}sum^n_{i=1}frac{W^{(c)}_{ii}W^{(o)}_{ii}}{(2^{m_i})^2}geq n(n+1)left(frac{delta}{2^m}right)^2left(prod^n_{i=1}{W^{(c)}_{ii}W^{(o)}_{ii}}right)^{1/n}qquad (1).$

See that the right term is independent of m_i

, hence this is a lower-bound with respect to minimization over m_i

. To achieve this (to make the inequality from (1)

an equality), we can select

$m_i = m + frac{1}{2} left(log_2 W_{ii}^{(c)}W_{ii}^{(o)} - frac{1}{2}sum^n_{j=1}log_2 W_{ii}^{(c)}W_{ii}^{(o)} right).$

Indeed, as remarked in the paper, m_i

is not per se an integer. Nevertheless, by this selection we find the clean expression from (1)

to minimize over systems equivalent to sigma

, that is, over some transformation matrix

. Define a map $f:mathcal{S}^n_{succ 0}to (0,1]$ by

$f(P) = left( frac{mathrm{det}(P)}{prod^n_{i=1}P_{ii}}right)^{1/2}.$

It turns out that f(P)=1

if and only if

is diagonal. This follows $^{[2]}$ from Hadamard's inequality. We can use this map to write

$sigma_y^2 = n(n+1) left(frac{delta}{2^m} right)^2 frac{mathrm{det}(W^{(c)}W^{(o)})^{1/n}}{big(f(W^{(c)})f(W^{(o)})big)^{2/n}} geq n(n+1) left(frac{delta}{2^m}right)^2 mathrm{det}(W^{(c)}W^{(o)})^{1/n}.$

Since the term $mathrm{det}(W^{(c)}W^{(o)})$ is invariant under a transformation

, we can only optimize sigma_y^2

over a structural change in realization tuple (A,b,c,d)

, that is, we need to make $W^{(c)}$ and $W^{(o)}$ simultaneous diagonal! It turns out that this numerically 'optimal’ realization, denoted $sigma^{star}$ , is what is called a principal axis realization.

To compute it, diagonalize $W^{(o)}$ , $W^{(o)}=Q_o Lambda_o Q_o^{top}$ and define $T_1^{-1}:=Lambda_o^{1/2}Q_o^{top}$ . Next, construct the diagonalization $T_1^{-1}W^{(c)}T_1^{-top}=Q_{1,c}Lambda_{1,c}Q_{1,c}^{top}$ . Then our desired transformation is $T:=T_1 Q_{1,c}$ . First, recall that under any $Tin mathsf{GL}_n$ the pair $(W^{(c)},W^{(o)})$ becomes $(T^{-1}W^{(c)}T^{-top},T^{top}W^{(o)}T)$ . Plugging in our map

yields the transformed matrices $(Lambda_{1,c},I_n)$ , which are indeed diagonal.

: To show this AM-GM inequality, we can use Jensen's inequality for concave functions, that is, mathbf{E}[g(x)]leq g(mathbf{E}[x])

. We can use the logarithm as our prototypical concave function on $mathbf{R}_{>0}$ and find $logbig(frac{1}{n}sum^n_{i=1}x_ibig)geq frac{1}{n}sum^n_{i=1}log(x_i) = logbig((prod^n_{i=1}x_i)^{1/n} big)$ . Then, the result follows.
[2]

: The inequality attributed to Hadamard is slightly more difficult to show. In its general form the statement is that $|mathrm{det}(A)|leq prod^n_{i=1}|a_i|_2$ , for a_i

the $i^{mathrm{th}}$ column of

. The inequality becomes an equality when the columns are mutually orthogonal. The intuition is clear if one interprets the determinant as the signed volume spanned by the columns of

, which. In case $Ain mathcal{S}^n_{succ 0}$ , we know that there is

such that $A=L^{top}L$ , hence, by this simple observation it follows that

$mathrm{det}(A)=mathrm{det}(L)^2 leq prod^n_{i=1}|ell_i|_2^2 = prod^n_{i=1}a_{ii}.$

Equality holds when the columns are orthogonal, so $AA^{top}$ must be diagonal, but $A=A^{top}$ must also hold, hence, A^2

must be diagonal, and thus

must be diagonal, which is the result we use.

A Peculiar Lower Bound to the Spectral Radius |7 March 2020|
tags: math.LA

As a follow up to the previous post, we discuss a lower bound, given as exercise P7.1.14 in (GvL13). I have never seen this bound before, especially not with the corresponding proof, so here we go. Given any $Ain mathbf{R}^{ntimes n}$ , we have the following lower bound:

$rho(A)geq (sigma_1cdots sigma_n)^{1/n},$

which can be thought of as a sharpening of the bound $|lambda_i(A)|geq sigma_{mathrm{min}}(A)$ . In the case of a singular

, the bound is trivial (rho(A)geq 0)

, but in the general setting it is less obvious. To prove that this holds we need a few ingredients. First, we need that for any $X,Yin mathbf{R}^{ntimes n}$ we have mathrm{det}(XY)=mathrm{det}(X)mathrm{det}(Y)

mathrm{det}(XY)=mathrm{det}(X)mathrm{det}(Y)

. Then, we construct two decompositions of

, (1) the Jordan Normal form $A=TJT^{-1}$ , and (2) the Singular Value Decomposition $A=USigma V^{top}$ . Using the multiplicative property of the determinant, we find that

$|mathrm{det}(A)| = left|prod^n_{i=1}lambda_i(A) right| = left| prod^n_{i=1}sigma_i(A) right|.$

Hence, since $rho(A)=max_i{|lambda_i(A)|}$ it follows that $rho(A)^n geq |prod^n_{i=1}sigma_i(A)|$ and thus we have the desired result.

(GvL13) G.H. Golub and C.F. Van Loan: ‘‘Matrix Computations’’, 2013 John Hopkins University Press.

Spectral Radius vs Spectral Norm |28 Febr. 2020|
tags: math.LA, math.DS

Lately, linear discrete-time control theory has start to appear in areas far beyond the ordinary. As a byproduct, papers start to surface which claim that stability of linear discrete-time systems

is characterized by |A|_2<1

. The confunsion is complete by calling this object the spectral norm of a matrix $Ain mathbf{R}^{ntimes n}$ . Indeed, for fixed coordinates, stability of

is not characterized by $|A|_2:=sup_{xin mathbf{S}^{n-1}}|Ax|_2$ , but by $rho(A):=max_{iin [n]}|lambda_i(A)|$ . If rho(A)<1

, then, all absolute eigenvalues of

are strictly smaller than one, and hence for x_k = A^k x_0

, $lim_{kto infty}x_k = 0$ . This follows from the Jordan form of

in combination with the simple observation that for lambda in mathbf{C}

, $lim_{kto infty} lambda^k = 0 iff |lambda|<1$ .

To continue, define two sets; $mathcal{A}_{rho}:={Ain mathbf{R}^{ntimes n};:;rho(A)<1}$ and $mathcal{A}_{ell_2}:={Ain mathbf{R}^{ntimes n};:;|A|_2<1}$ . Since $^{[1]}$ rho(A)leq |A|_p

we have $mathcal{A}_{ell_2}subset mathcal{A}_{rho}$ . Now, the main question is, how much do we lose by approximating $mathcal{A}_{rho}$ with $mathcal{A}_{ell_2}$ ? Motivation to do so is given by the fact that |cdot|_2

is a norm and hence a convex function $^{[2]}$ , i.e., when given a convex polytope mathcal{P}

with vertices A_i

, if ${A_i}_isubset mathcal{A}_{ell_2}$ , then $mathcal{P}subset mathcal{A}_{ell_2}$ . Note that rho(A)

is not a norm, rho(A)

can be

without

(consider an upper-triangular matrix with zeros on the main diagonal), the triangle inequality can easily fail as well. For example, let

$A_1 = left[begin{array}{ll} 0 & 1 0 & 0 end{array}right], quad A_2 = left[begin{array}{ll} 0 & 0 1 & 0 end{array}right],$

Then

, but

, hence

fails to hold. A setting where the aforementioned property of |cdot|_2

might help is Robust Control, say we want to assess if some algorithm rendered a compact convex set mathcal{K}

, filled with

's, stable. As highlighted before, we could just check if all extreme points of mathcal{K}

are members of $A_{ell_2}$ , which might be a small and finite set. Thus, computationally, it appears to be attractive to consider $mathcal{A}_{ell_2}$ over the generic $mathcal{A}_{rho}$ .

As a form of critique, one might say that $mathcal{A}_{rho}$ is a lot larger than $mathcal{A}_{ell_2}$ . Towards such an argument, one might recall that $|A|_2=sqrt{lambda_{mathrm{max}}(A^{top}A)}$ . Indeed, |A|_2=rho(A)

if $^{[3]}$ $Ain mathsf{Sym}_n$ , but $mathsf{Sym}_n simeq mathbf{R}^{n(n+1)/2}$ . Therefore, it seems like the set of

for which considering |cdot|_2

over

is reasonable, is negligibly small. To say a bit more, since $^{[4]}$ |A|_2leq |A|_F

we see that we can always find a ball with non-zero volume fully contained in $mathcal{A}_{rho}$ . Hence, $mathcal{A}_{rho}$ is at least locally dense in $mathbf{R}^{ntimes n}$ . So in principle we could try to investigate $mathrm{vol}(mathcal{A}_{ell_2})/mathrm{vol}(mathcal{A}_{rho})$ . For n=1

, the sets agree, which degrades asymptotically. However, is this the way to go? Lets say we consider the set $widehat{mathcal{A}}_{rho,N}:={Ain mathbf{R}^{ntimes n};:;rho(A)<N/(N+1)}$ . Clearly, the sets $widehat{mathcal{A}}_{rho,N}$ and $mathcal{A}_{rho}$ are different, even in volume, but for sufficiently large Nin mathbf{N}

, should we care? The behaviour they parametrize is effectively the same.

We will stress that by approximating $mathcal{A}_{rho}$ with $mathcal{A}_{ell_2}$ , regardless of their volumetric difference, we are ignoring a full class of systems and miss out on a non-neglible set of behaviours. To see this, any system described by $mathcal{A}_{ell_2}$ is contractive in the sense that |Ax_k|_2leq |x_k|_2

, while systems in $mathcal{A}_{rho}$ are merely asymptotically stable. They might wander of before they return, i.e., there is no reason why for all kin mathbf{N}

we must have $|x_{k+1}|_pleq |x_{k}|_p$ . We can do a quick example, consider the matrices

$A_2 = left[begin{array}{lll} 0 & -0.9 & 0.1 0.9 & 0 & 0 0 & 0 & -0.9 end{array}right], quad A_{rho} = left[begin{array}{lll} 0.1 & -0.9 & 1 0.9 & 0 & 0 0 & 0 & -0.9 end{array}right].$

Then $A_2in mathcal{A}_{ell_2}$ , $A_{rho}in mathcal{A}_{rho}setminus{mathcal{A}_{ell_2}}$ and both A_2

and $A_{rho}$ have |lambda_i|=0.9 forall i

. We observe that indeed A_2

is contractive, for any initial condition on $mathbf{S}^2$ , we move strictly inside the sphere, whereas for $A_{rho}$ , when starting from the same initial condition, we take a detour outside of the sphere before converging to

. So although A_2

and $A_{rho}$ have the same spectrum, they parametrize different systems.

In our deterministic setting this would mean that we would confine our statespace to a (solid) sphere with radius |x_0|_2

, instead of $mathbf{R}^n$ . Moreover, in linear optimal control, the resulting closed-loop system is usually not contractive. Think of the infamous pendulum on a cart. Being energy efficient has usually nothing to do with taking the shortest, in the Euclidean sense, path.

(update June 06): As suggested by Pedro Zattoni, we would like to add some nuance here. As highlighted in the first 12 minutes of this lecture by Pablo Parrilo, if

is stable, then, one can always find a (linear) change of coordinates such that the transformed system matrix is contractive. Although a very neat observation, you have to be careful, since your measurements might not come from this transformed system. So either you assume merely that

is stable, or you assume that

is contractive, but then you might need to find an additional map, mapping your states to your measurements.

: Recall, $|A|_p:=sup_{xin mathbf{S}^{n-1}}|Ax|_p$ . Then, let $xin mathbf{S}^{n-1}$ be some eigenvector of

. Now we have |A|_pgeq |Ax|_p = |lambda x|_p = |lambda |

. Since this eigenvalue is arbitrary it follows that |A|_pgeq rho(A)

: Let $(A_1,A_2)in mathcal{A}_{ell_2}$ then |theta A_1 + (1-theta)A_2|_2 leq |theta||A_1|_2 + |1-theta||A_2|_2 < 1

. This follows from the |cdot|_2

being a norm.
[3]

: Clearly, if $Ain mathsf{Sym}_n$ , we have $sqrt{lambda_{mathrm{max}}(A^{top}A)}=sqrt{lambda_{mathrm{max}}(A^2)}=rho(A)$ . Now, when rho(A)=|A|_2

, does this imply that $Ain mathsf{Sym}_n$ ? The answer is no, consider

Then, $A'notin mathsf{Sym}_n$ , yet, rho(A)=|A|_2=0.9

. For the full set of conditions on

such that

see this paper by Goldberg and Zwas.
[4]

: Recall that $|A|_F = sqrt{mathrm{Tr}(A^{top}A)}=sqrt{sum_{i=1}^n lambda_i(A^{top}A)}$ . This expression is clearly larger or equal to $sqrt{lambda_{mathrm{max}}(A^{top}A)}=|A|_2$ .

Risky Business in Stochastic Control: Exponential Utility |12 Jan. 2020|
tags: math.OC

One of the most famous problems in linear control theory is that of designing a control law which minimizes some cost being quadratic in input and state. We all know this as the Linear Quadratic Regulator (LQR) problem. There is however one problem (some would call it a blessing) with this formulation once the dynamics contain some zero-mean noise: the control law is independent of this stochasticity. This is easy in proofs, easy in practice, but does it work well? Say, your control law is rather slow, but bringing you back to

in the end of time. In the noiseless case, no problem. But now say there is a substantial amount of noise, do you still want such a slow control law? The answer is most likely no since you quickly drift away. The classical LQR formulation does not differentiate between noise intensities and hence can be rightfully called naive as Bertsekas did (Bert76). Unfortunately, this name did not stuck.

$gamma_{T-1}(theta) = frac{2}{theta T}log mathbf{E}_{xi}left[mathrm{exp}left(frac{theta}{2}sum^{T-1}_{t=0}x_t^{top}Qx_t + u_t^{top}Ru_t right) right]=frac{2}{theta T}log mathbf{E}_{xi}left[mathrm{exp}left(frac{theta}{2}Psi right) right]$

and consider the problem of finding the minimizing policy $pi^{star}_{T-1}=u_0,u_1,dots,u_{T-1}$ in:

$mathcal{J}_{T-1}:=inf_{pi_{T-1}} gamma_{T-1}(theta)quad mathrm{s.t.} quad x_{t+1}=Ax_t+Bu_t+xi_t,quad xi_t {sim} mathcal{N}(0,Sigma_{xi}^{-1}).$

Which is precisely the classical LQR problem, but now with the cost wrapped in an exponential utility function parametrized by thetain mathbf{R}

. This problem was pioneered by most notably Peter Whittle (Wh90).

Before we consider solving this problem, let us interpret the risk parameter thetain mathbf{R}

. For

we speak of a risk-sensitive formulation while for theta<0

we are risk-seeking. This becomes especially clear when you solve the problem, but a quick way to see this is to consider the approximation of gamma(theta)

near

, which yields $gammaapprox mathbf{E}[Psi]+frac{1}{4}{theta}mathbf{E}[Psi]^2$ , so theta>0

relates to pessimism and theta<0

to optimism indeed. Here we skipped a few scaling factors, but the idea remains the same, for a derivation, consider cumulant generating functions and have a look at our problem.

As with standard LQR, we would like to obtain a stabilizing policy and to that end we will be mostly bothered with solving $limsup_{Tto infty}mathcal{J}_T$ . However, instead of immediately trying to solve the infinite horizon average cost Bellman equation it is easier to consider $gamma_{T}(theta)$ for finite

first. Then, when we can prove monotonicty and upper-bound $limsup_{Tto infty}mathcal{J}_{T}$ , the infinite horizon optimal policy is given by $lim_{Tto infty}pi_{T}^{star}$ . The reason being that monotonic sequences which are uniformly bounded converge.

The main technical tool towards finding the optimal policy is the following Lemma similar to one in the Appendix of (Jac73):
Lemma

Consider a noisy linear dynamical system defined by $x_{t+1}=Ax_k+Bu_t+Dxi_t$ with $xi_t {sim} mathcal{N}(0,Sigma_{xi}^{-1})$ and let |A|

be shorthand notation for mathrm{det}(A)

. Then, if $(Sigma_{xi}-theta D^{top}PD)^{-1}succ 0$ holds we have

$mathbf{E}_{xi}left[mathrm{exp}left(frac{theta}{2}x_{k+1}^{top}Px_{k+1} right)|x_kright] = frac{|(Sigma_{xi}-theta D^{top}PD)^{-1}|^{1/2}}{|Sigma_{xi}^{-1}|^{1/2}}mathrm{exp}left(frac{theta}{2}(Ax_k+Bu_k)^{top}widetilde{P}(Ax_k+Bu_k) right)$

Proof

Let $z:= mathbf{E}_{xi}left[mathrm{exp}left(frac{theta}{2}x_{t+1}^{top}Px_{t+1} right)|x_tright]$ and recall that the shorthand notation for mathrm{det}(A)

, then:

$begin{array}{ll} z &= frac{1}{(2pi)^{d/2}|Sigma_{xi}^{-1}|^{1/2}}int_{mathbf{R}^d} mathrm{exp}left(frac{theta}{2}x_{t+1}^{top}Px_{t+1} right)mathrm{exp}left(-frac{1}{2}xi_t^{top}Sigma_{xi}xi_t right) dxi_t &= frac{1}{(2pi)^{d/2}|Sigma_{xi}^{-1}|^{1/2}}Big{mathrm{exp}left(frac{theta}{2}(Ax_t+Bu_t)^{top}P(Ax_t+Bu_t) right) &quad cdotint_{mathbf{R}^d} mathrm{exp}left( theta (Ax_t+Bu_t)^{top}PDxi_t - frac{1}{2}xi_t^{top}(Sigma_{xi}-theta D^{top}PD)xi_tright)dxi_tBig} &=frac{|(Sigma_{xi}-theta D^{top}PD)^{-1}|^{1/2}}{|Sigma_{xi}^{-1}|^{1/2}}mathrm{exp}left(frac{theta}{2}(Ax_t+Bu_t)^{top}widetilde{P}(Ax_t+Bu_t) right) &quad cdot frac{1}{(2pi)^{d/2}|(Sigma_{xi}-theta D^{top}PD)^{-1}|^{1/2}}int_{mathbf{R}^d} mathrm{exp}left(-frac{1}{2}(xi_t-overline{xi}_t)^{top}(Sigma_{xi}-theta D^{top}PD)(xi_t-overline{xi}_t)right) dxi_t &=frac{|(Sigma_{xi}-theta D^{top}PD)^{-1}|^{1/2}}{|Sigma_{xi}^{-1}|^{1/2}}mathrm{exp}left(frac{theta}{2}(Ax_t+Bu_t)^{top}widetilde{P}(Ax_t+Bu_t) right). end{array}$

Here, the first step follows directly from

being a zero-mean Gaussian. In the second step we plug in $x_{t+1}=Ax_t+Bu_t+Dxi_t$ . Then, in the third step we introduce a variable $overline{xi}_t$ with the goal of making $(Sigma_{xi}-theta D^{top}PD)$ the covariance matrix of a Gaussian with mean $overline{xi}_t$ . We can make this work for

$overline{xi}_t = theta( Sigma_{xi}-theta D^{top}PD)^{-1}D^{top}P(Ax_t+Bu_t)$

$widetilde{P} = P +theta PD(Sigma_{xi}-theta D^{top}PD)^{-1}D^{top}P.$

Using this approach we can integrate the latter part to

and end up with the final expression. Note that in this case the random variable xi_t

needs to be Gaussian, since the second to last expression in

equals

by being a Gaussian probability distribution integrated over its entire domain.

What is the point of doing this? Let f(x,u)=Ax+Bu+xi

and assume that $r_tmathrm{exp}big(frac{theta}{2} x^{top}P_txbig)$ represents the cost-to-go from stage

and state

. Then consider

$r_{t-1}mathrm{exp}left(frac{theta}{2}x^{top}P_{t-1} xright) = inf_{u} left{mathrm{exp}left(frac{theta}{2}(x^{top}Qx+u^{top}Ru)right)mathbf{E}_{xi}left[r_tmathrm{exp}left( frac{theta}{2}f(x,u)^{top}P_{t}f(x,u)right);|;xright]right}.$

Note, since we work with a sum within the exponent, we must multiply within the right-hand-side of the Bellman equation. From there it follows that $u^{star}_t=-(R+B^{top}widetilde{P}_tB)^{-1}B^{top}widetilde{P}_tAx_t$ , for

$P_{t-1} = Q + A^{top}widetilde{P}_tA - A^{top}widetilde{P}_tB(R+ B^{top}widetilde{P}_tB)^{-1}B^{top}widetilde{P}_tA,$

$widetilde{P}_t = P_t +theta P_tD(Sigma_{xi}-theta D^{top}P_tD)^{-1}D^{top}P_t.$

The key trick in simplifying your expressions is to apply the logarithm after minimizing over

such that the fraction of determinants becomes a state-independent affine term in the cost. Now, using a matrix inversion lemma and the push-through rule we can remove $widetilde{P}_t$ and construct a map $P_{t-1}=f(P_t)$ :

$P_{t-1} = Q + A^{top}P_t left(I_n + (BR^{-1}B^{top}-theta DSigma_{xi}^{-1}D^{top})P_t right)^{-1}A = Q+A^{top}P_tLambda_t^{-1}A.$

such that $u_t^{star} = -R^{-1}B^{top}P_t Lambda_t A x_t$ . See below for the derivations, many (if not all) texts skip them, but if you have never applied the push-through rule they are not that obvious.

As was pointed out for the first time by Jacobson (Jac73), these equations are precisely the ones we see in (non-cooperative) Dynamic Game theory for isotropic $Sigma_{xi}$ and appropriately scaled theta geq 0

Especially with this observation in mind there are many texts which show that $lim_{tdownarrow 0}P_t=P$ is well-defined and finite, which relates to finite cost and a stabilizing control law $u^{star}_t=-R^{-1}B^{top}PLambda^{-1}Ax_t$ . To formalize this, one needs to assume that (A,B,C)

is a minimal realization for

defined by $C^{top}C=Q$ . Then you can appeal to texts like (BB95).

Numerical Experiment and Robustness Interpretation
To show what happens, we do a small

-dimensional example. Here we want to solve the Risk-Sensitive (theta=0.0015

) and the Risk-neutral ( theta=0

) infinite-horizon average cost problem for

$A = left[ begin{array}{ll} 2 & 1 0 & 2 end{array}right],quad B = left[ begin{array}{l} 0 1 end{array}right],quad Sigma_{xi}^{-1} = left[ begin{array}{ll} 10^{-1} & 200^{-1} 200^{-1} & 10 end{array}right],$

. There is clearly a lot of noise, especially on the second signal, which also happens to be critical for controlling the first state. This makes it interesting. We compute $K^{star}|_{theta=0}=:K^{star}$ and $K^{star}|_{theta=0.0015}=:K^{star}_{theta}$ . Given the noise statistics, it would be reasonable to not take the certainty equivalence control law $K^{star}$ since you control the first state (which has little noise on its line) via the second state (which has a lot of noise on its line). Let $x^{(i)}$ be the $i^{mathrm{th}}$ state under $K^{star}$ and $x^{(i)}_{theta}$ the $i^{mathrm{th}}$ state under $K^{star}_{theta}$ .

We see in the plot below (for some arbitrary initial condition) typical behaviour, $K^{star}_{theta}$ does take the noise into account and indeed we see the $K^{star}_{theta}$ induces a smaller variance.

So, $K^{star}$ is more robust than $K^{star}$ in a particular way. It turns out that this can be neatly explained. To do so, we have to introduce the notion of Relative Entropy (Kullback-Leibler divergence). We will skip a few technical details (see the references for full details). Given a measure

, then for any other measure

, being absolutely continuous with respect to

(

), define the Relative Entropy as:

$h(nu || mu ) = int_{mathcal{V}}log. frac{dnu}{dmu} dnu$

Now, for any measurable function Psi

, being bounded from below, it can be shown (see (DP96)) that

$log int e^{Psi}dmu = sup_{nu}left{ int Psi dnu - h(nu || mu) ; : ; h(nu || mu )<infty right}$

For the moment, think of Psi

as your standard finite-horizon LQR cost with product measure dnu

, then we see that an exponential utility results in the understanding that a control law which minimizes $log int e^{Psi}dmu$ is robust against adversarial noise generated by distributions sufficiently close (measured by

) to the reference

Here we skipped over a lot of technical details, but the intuition is beautiful, just changing the utility to the exponential function gives a wealth of deep distributional results we just touched upon in this post.

Simplification steps in $u_t^{star}$
We only show to get the simpler representation for the input, the approach to obtain $P_{k-1}=f(P_k)$ is very similar.

$(A+BCD)^{-1}=A^{-1}-A^{-1}B(C^{-1}+DA^{-1}B)^{-1}DA^{-1}$

$widetilde{P}_t= (I_n -theta P_t D Sigma_{xi}^{-1}D^{top})^{-1}P_t=X^{-1}P_t.$

Note, we factored our P_t

since we cannot assume that P_t

is invertible. Our next tool is called the push-through rule. Given $Qin mathbf{R}^{mtimes n}$ and $Pin mathbf{R}^{ntimes m}$ we have

$(I_m + QP)^{-1}Q = Q(I_n + PQ)^{-1}.$

You can check that indeed Q(I_n+PQ) = (I_m +QP)Q

. Now, to continue, plug this expression for $widetilde{P}_t$ into the input expression:

$begin{array}{ll} -(R+B^{top}widetilde{P}_t B)^{-1}B^{top}widetilde{P}_t A &= -(R+B^{top}X^{-1}P_t B)^{-1}B^{top}X^{-1}P_t A &= -big( (I_m+B^{top}X^{-1}P_t BR^{-1})Rbig)^{-1}B^{top}X^{-1}P_t A &= -R^{-1}(I_m+B^{top}X^{-1}P_t BR^{-1})^{-1}B^{top}X^{-1}P_t A &= -R^{-1}B^{top}(I_n+X^{-1}P_t BR^{-1}B^{top})^{-1}X^{-1}P_t A &= -R^{-1}B^{top}(X+P_t BR^{-1}B^{top})^{-1}P_t A &= -R^{-1}B^{top}P_t Lambda_t A. end{array}$

(Bert76) Dimitri P. Bertsekas : ‘‘Dynamic Programming and Stochastic Control’’, 1976 Academic Press.
(Jac73) D. Jacobson : ‘‘Optimal stochastic linear systems with exponential performance criteria and their relation to deterministic differential games’’, 1973 IEEE TAC.
(Wh90) Peter Whittle : ‘‘Risk-sensitive Optimal Control’’, 1990 Wiley.
(BB95) Tamer Basar and Pierre Bernhard : ‘‘ $H_{infty}$ -Optimal Control and Related Minimax Design Problems A Dynamic Game Approach’’ 1995 Birkhauser.
(DP96) Paolo Dai Pra, Lorenzo Meneghini and Wolfgang J. Runggaldier :‘‘Connections Between Stochastic Control and Dynamic Games’’ 1996 Math. Control Signals Systems.

Are Theo Jansen his Linkages Optimal? |16 Dec. 2019|
tags: math.OC

Theo Jansen, designer of the infamous ‘‘strandbeesten’’ explains in a relatively recent series of videos that the design of his linkage system is the result of what could be called an early implementation of genetic programming. Specifically, he wanted the profile that the foot follows to be flat on the bottom. He does not elaborate on this too much, but the idea is that in this way his creatures flow over the surface instead of the more bumpy motion we make. Moreover, you can imagine that in this way the center of mass remains more or less at a constant height, which is indeed energy efficient. Now, many papers have been written on understanding his structure, but I find most of them rather vague, so lets try a simple example.

To that end, consider just 1 leg, 1 linkage system as in the figure(s) below. We can write down an explicit expression for $p(L,theta)in mathbf{R}^2$ where L=(a,b,c,d,e,f,g,h,i,j,k,l,m)

and

. See below for an explicit derivation. Now, with his motivation in mind, what would be an easy and relevant cost function? An idea is to maximize the ratio p_w/p_h

, of course, subject to still having a functioning leg. To that end we consider the cost function (see the schematic below for notation):

$c(L) = logleft(frac{leftlangle e_1,p(L,1/2pi)-p(L,-1/2pi)rightrangle}{leftlangle e_2,p(L,pi)-p(L,0)rightrangle} right).$

Of course, this is a heuristic, but the intuition is that this should approximate p_w/p_h

. Let

be the set of parameters as given in Jansen his video, then we apply a few steps of (warm-start) gradient ascent: $L_{k+1}=L_k + k^{-2}nabla c(L_k)$ , k=1,2,dots

Can we conclude anything from here? I would say that we need dynamics, not just kinematics, but more importantly, we need to put the application in the optimization and only Theo Jansen understands the constraints of the beach.

Still, it is fun to see for yourself how this works and like I did you can build a (baby) strandbeest yourself using nothing more than some PVC tubing and a source of heat.

Derivation of :
Essentially, all that you need is the cosine-rule plus some feeling for how to make the equations continuous. To start, let e_1=(1,0)

and

(just the standard unit vectors), plus let R(theta)

be a standard (counter-clockwise) rotation matrix acting on $(x,y)in mathbf{R}^2$ . To explain the main approach, when we want to find the coordinates of for example point p_4

, we try to compute the angle theta_2

such that when we rotate the unit vector e_1

with length

emanating from point p_3

, we get point p_4

. See the schematic picture below.

Then we easily obtain p_1=(-a,-l)

and

. Now, to obtain p_4

and

compute the length of the diagonal, n=|p_1-p_3|_2

and using the cosine rule $beta_1=arccosbig( -(2nj)^{-1}(b^2-n^2-j^2) big)$ , $beta_2=arccosbig( -(2nk)^{-1}(c^2-n^2-k^2) big)$ . Then, the trick is to use cos(theta)=langle a,brangle/ (|a|_2|b|_2)

to compute theta_4:=theta_2+beta_1

. To avoid angles larger than

, consider an inner product with e_2

and add the remaining 1/2pi

; $theta_4=arccosbig(n^{-1}langle e_2,(p_1-p_3)rangle big)+1/2pi$ . From there we get theta_2=theta_4-beta_1

. The point p_6

is not missing, to check your code it is convenient to compute p_6=p_3+R(theta_4)ne_1

and check that p_6

agrees with p_1

for all

. In a similar fashion, $gamma_1=arccosbig(-(2db)^{-1}(e^2-d^2-b^2) big)$ , $delta_1=arccosbig(b^{-1}langle e_1,(p_4-p_1)rangle big)$ , p_7=p_1+R(gamma_1+delta_1)de_1

. Then, for p_8

, again compute a diagonal q=|p_5-p_7|_2

plus $alpha_1=arccosbig(-(2qc)^{-1}(d^2-q^2-c^2) big)$ , $alpha_2=arccosbig(c^{-1}langle e_1,(p_1-p_5)rangle big)$ $alpha_3=arccosbig(-(2qg)^{-1}(f^2-q^2-q^2) big)$ , such that for alpha=alpha_1+alpha_2+alpha_3

we have

. At last, $psi_1=arccosbig(-(2gi)^{-1}(h^2-i^2-g^2) big)$ , such that p(L,theta)=p_5+R(psi)ie_1

for

A Special Group, Volume Preserving Feedback |16 Nov. 2019|
tags: math.LA, math.DS

This short note is inspired by the beautiful visualization techniques from Duistermaat and Kolk (DK1999) (themselves apparently inspired by Atiyah).

Let's say we have a

-dimensional linear discrete-time dynamical system $x_{k+1}=Ax_k$ , which preserves the volume of the cube [-1,1]^2

under the dynamics, i.e. $mathrm{Vol}([-1,1]^2)=mathrm{Vol}(A^k[-1,1]^2)$ for any kin mathbf{Z}

Formally put, this means that

is part of a certain group, specifically, consider some field mathbf{F}

and define the Special Linear group by

Now, assume that we are only interested in matrices Ain mathsf{SL}(2,mathbf{R})

such that the cube [-1,1]^2

remains bounded under the dynamics, i.e., $lim_{kto infty}A^k[-1,1]^2$ is bounded. In this scenario we restrict our attention to mathsf{SO}(2,mathbf{R})subset mathsf{SL}(2,mathbf{R})

mathsf{SO}(2,mathbf{R})subset mathsf{SL}(2,mathbf{R})

. To see why this is needed, consider A_1

and

both with determinant

$A_1 = left[ begin{array}{ll} 2 & 0 0 & frac{1}{2} end{array}right], quad A_2 = left[ begin{array}{ll} 1 & 2 0 & 1 end{array}right],$

If we look at the image of [-1,1]^2

under both these maps (for several iterations), we see that volume is preserved, but also clearly that the set is extending indefinitely in the horizontal direction.

To have a somewhat interesting problem, assume we are given $x_{k+1}=(A+BK)x_k$ with Ain mathsf{SL}

while it is our task to find a

such that

, hence, find feedback which not only preserves the volume, but keeps any set of initial conditions (say [-1,1]^2

) bounded over time.

Towards a solution, we might ask, given any Ain mathsf{SL}(2,mathbf{R})

what is the nearest (in some norm) rotation matrix? This turns out to be a classic question, starting from orthogonal matrices, the solution to $mathcal{P}:min_{widehat{A}in mathsf{O}(n,mathbf{R})}|A-widehat{A}|_F^2$ is given by $widehat{A}=UV^{top}$ , where (U,V)

follows from the SVD of

, $A=USigma V^{top}$ . Differently put, when we use a polar decomposition of

, then the solution is given by widetilde{U}

. See this note for a quick multiplier proof. An interesting sidenote, problem mathcal{P}

is well-defined since mathsf{O}(n,mathbf{R})

is compact. To see this, recall that for any Qin mathsf{O}(n,mathbf{R})

we have $|Q|_F = sqrt{n}$ , hence the set is bounded, plus mathsf{O}(n,mathbf{R})

is the pre-image of I_n

under $varphi:Amapsto A^{top}A$ , Ain mathsf{GL}(n,mathbf{R})

, hence the set is closed as well.

This is great, however, we would like to optimize over mathsf{SO}subset mathsf{O}

instead. To do so, one usually resorts to simply checking the sign - and if necessary - flipping it via finding the closest matrix with a change in determinant sign. We will see that by selecting appropriate coordinates we can find the solution without this checking of signs.

For today we look at mathsf{SL}(2,mathbf{R})

and largely follow (DK1999), for such a

-dimensional matrix, the determinant requirement translates to ad-bc=1

. Under the following (invertible) linear change of coordinates

becomes

, i.e., for any pair $(q,r)in mathbf{R}^2$ the point (p,s)

runs over a circle with radius $(q^2+r^2+1)^{1/2}$ . Hence, we obtain the diffeomorphism $mathsf{SL}(2,mathbf{R})simeq mathbf{S}^1 times mathbf{R}^2$ . We can use however a more pretty diffeomorphism of the sphere and an open set in $mathbf{R}^2$ . To that end, use the diffeomorphism:

$(theta,(u,v)) mapsto frac{1}{sqrt{1-u^2-v^2}} left[ begin{array}{ll} cos(theta)+u & -sin(theta)+v sin(theta)+v & cos(theta)-u end{array}right] in mathsf{SL}(2,mathbf{R}),$

for

(formal way of declaring that theta

should not be any real number) and the pair (u,v)

being part of $mathbf{D}_o:={(u,v)in mathbf{R}^2;|; u^2+v^2<1}$ (the open unit-disk). To see this, recall that the determinant is not a linear operator. Since we have a

-dimensional example we can explicitly compute the eigenvalues of any Ain mathsf{SL}(2,mathbf{R})

(plug

into the characteristic equation) and obtain:

$lambda_{1,2} = frac{mathrm{Tr}(A)pmsqrt{mathrm{Tr}(A)^2-4}}{2},quad mathrm{Tr}(A) = frac{2cos(theta)}{sqrt{1-u^2-v^2}}.$

At this point, the only demand on the eigenvalues is that lambda_1 cdot lambda_2 =1

. Therefore, when we would consider

within the context of a discrete linear dynamical system $x_{k+1}=Ax_k$ ,

is either a saddle-point, or we speak of a marginally stable system ( $|lambda_{1,2}|=1$ ). We can visualize the set of all marginally stable Ain mathsf{SL}

, called

, defined by all $big(theta,(u,v)big)in mathbf{R}/2pimathbf{Z}times mathbf{D}_o$ satisfying

$left|frac{cos(theta)}{sqrt{1-u^2-v^2}}pm left(frac{1-u^2-v^2-sin(theta)^2}{1-u^2-v^2} right)^{1/2} right|=1$

(DK1999) J.J. Duistermaat and J.A.C. Kolk : ‘‘Lie Groups’’, 1999 Springer.

Riemannian Gradient Flow |5 Nov. 2019|
tags: math.DG, math.DS

Previously we looked at dot{x}=f(x)

, $f(x)=big(A-(x^{top}Ax)I_n big)x$ - and its resulting flow - as the result from mapping dot{x}=Ax

to the sphere. However, we saw that for $Ain mathcal{S}^n_{++}$ this flow convergences to pm v_1

, with $Av_1=lambda_{mathrm{max}}(A)v_1$ such that for g(x)=|Ax|_2

we have $lim_{tto infty}gbig(x(t,x_0) big)=|A|_2$ $forall;x_0in mathbf{S}^{n-1}setminus{v_2,dots,v_n}$ . Hence, it is interesting to look at the flow from an optimization point of view: $|A|_2=sup_{xin mathbf{S}^{n-1}}|Ax|_2$ .

A fair question would be, is our flow not simply Riemannian gradient ascent for this problem? As common with these kind of problems, such a hypothesis is something you just feel.

Now, using the tools from (CU1994, p.311) we can compute the gradient of g(x)

(on the sphere) via $mathrm{grad}_{mathbf{S}^2}(g|_{mathbf{S}^2})=mathrm{grad}_{mathbf{R}^3}(g)-mathrm{d}g(xi)xi$ , where

is a vector field normal to $mathbf{S}^2$ , e.g., xi=(x,y,z)

. From there we obtain

$mathrm{grad}_{mathbf{S}^2}(g|_{mathbf{S}^2}) = left[ begin{array}{lll} (1-x^2) & -xy & -xz -xy & (1-y^2) & -yz -xz & -yz & (1-z^2) end{array}right] left[ begin{array}{l}partial_x g partial_y g partial_z g end{array}right]=G(x,y,z)nabla g.$

To make our life a bit easier, use instead of

the map

. Moreover, set $s=(x,y,z)in mathbf{R}^3$ . Then it follows that

$begin{array}{ll} mathrm{grad}_{mathbf{S}^2}(h|_{mathbf{S}^2}) &= left(I_3-mathrm{diag}(s)left[begin{array}{l} s^{top}s^{top}s^{top} end{array}right]right) 2A^{top}As &= 2left(A^{top}A-(s^{top}A^{top}Asright)I_3)s. end{array}$

Of course, to make the computation cleaner, we changed

, but the relation between $mathrm{grad}_{mathbf{S}^2}(h|_{mathbf{S}^2})$ and

is beautiful. Somehow, mapping trajectories of dot{x}=Ax

, for some $Ain mathcal{S}^n_{++}$ , to the sphere corresponds to (Riemannian) gradient ascent applied to the problem $sup_{xin S^{2}}|Ax|_2$ .

Now, we formalize the previous analysis a bit a show how fast we convergence. Assume that the eigenvectors are ordered such that eigenvector q_1

corresponds to the largest eigenvalue of $Ain mathcal{S}^n_{++}$ . Then, the solution to $dot{x}=big(A-(x^{top}Ax)I_nbig)x$ is given by

$x(t) = frac{exp({At})x_0}{|exp({At})x_0|_2}.$

Let $x_0=sum^{n}_{i=1}c_iq_i$ be x_0

expressed in eigenvector coordinates, with all $q_iin mathbf{S}^{n-1}$ (normalized). Moreover, assume all eigenvalues are distinct. Then, to measure if x(t)

is near

, we compute 1-|langle x(t),q_1 rangle|

, which is

if and only if q_1

is parallel to x(t)

. To simplify the analysis a bit, we look at 1-langle x(t),q_1 rangle ^2=varepsilon

, for some perturbation $varepsilonin mathbf{R}_{geq 0}$ , this yields

$frac{c_1^2 exp({2lambda_1 t})}{sum^{n}_{i=1}c_i^2 exp({2lambda_i t})} = 1-varepsilon.$

$2log (|c_1|)+2lambda_1 t - log left( sum^{n}_{i=1}expleft{2lambda_it + 2log(|c_i|)right}right) = log(1-varepsilon).$

This log-sum-exp terms are hard to deal with, but we can apply the so-called ‘‘log-sum-exp trick’’:

$log sum_{iin I}exp (x_i) = x^{star}+log sum_{iin I}exp(x_i-x^{star}), quad forall x^{star}in mathbf{R}^n.$

$-log left(sum^{n}_{i=1}exp left{2 left[(lambda_i-lambda_1)t + logleft(frac{|c_i|}{|c_1|} right) right] right}+1 right) = log(1-varepsilon).$

We clearly observe that for tto infty

the LHS approaches

from below, which means that varepsilonto 0

from above, like intended. Of course, we also observe that the mentioned method is not completely general, we already assume distinct eigenvalues, but there is more. We do also not convergence when $x_0 in mathrm{span}{q_2,dots,q_n}$ , which is however a set of measure

on the sphere $mathbf{S}^{n-1}$ .

More interestingly, we see that the convergence rate is largely dictated by the ‘‘spectral gap/eigengap’’ lambda_2-lambda_1

. Specifically, to have a particular projection error varepsilon

, such that 1gg varepsilon>0

, we need

$t approx log left( frac{varepsilon}{1-varepsilon}right)frac{1}{lambda_2-lambda_1}.$

Comparing this to the resulting flow from $dot{s}=2left(A^{top}A-(s^{top}A^{top}Asright)I_3)s$ , $s(0)in mathbf{S}^2$ , we see that we have the same flow, but with Amapsto 2 A^2

. This is interesting, since

and

have the same eigenvectors, yet a different (scaled) spectrum. With respect to the convergence rate, we have to compare (lambda_2-lambda_1)

and

for any $lambda_1,lambda_2in mathbf{R}_{> 0}$ with lambda_1>lambda_2

(the additional

is not so interesting).

It is obvious what we will happen, the crux is, is lambda

larger or smaller than

? Can we immediately extend this to a Newton-type algorithm? Well, this fails (globally) since we work in $mathbf{R}^3$ instead of purely with $mathbf{S}^2$ . To be concrete, $mathrm{det}(G(x,y,z))|_{mathbf{S}^2}=0$ , we never have

degrees of freedom.

Of course, these observations turned out to be far from new, see for example (AMS2008, sec. 4.6).

(AMS2008) P.A. Absil, R. Mahony and R. Sepulchre: ‘‘Optimization Algorithms on Matrix Manifolds’’, 2008 Princeton University Press.
(CU1994) Constantin Udriste: ‘‘Convex Functions and Optimization Methods on Riemannian Manifolds’’, 1994 Kluwer Academic Publishers.

Simple Flow Algorithm to Compute the Operator Norm |4 Nov. 2019|
tags: math.LA

Using the ideas from the previous post, we immediately obtain a dynamical system to compute |A|_2

for any $Ain mathcal{S}^n_{++}$ (symmetric positive definite matrices). Here we use the vector field dot{x}=f(x)

, defined by $f(x)=big(A-(x^{top}Ax)I_nbig)x$ and denote a solution by x(t,x_0)

, where

is the initial condition. Moreover, define a map $g:mathbf{R}^nto mathbf{R}_{geq 0}$ by g(x)=|Ax|_2

. Then $lim_{tto infty}gbig(x(t,x_0) big)=|A|_2$ $forall;x_0in mathbf{S}^{n-1}setminus{v_2,dots,v_n}$ , where v_1

is the eigenvector corresponding to the largest eigenvalue.

Dynamical Systems on the Sphere |3 Nov. 2019|
tags: math.DS

To start, consider a linear dynamical systems dot{x}=Ax

, for which the solution is given by $x(t)=mathrm{exp}(At)x_0$ . We can map this solution to the sphere via x(t) mapsto x(t)/|x|_2=:y(t)

. The corresponding vector field for y(t)

is given by

$begin{array}{ll} dot{y} &= f(y) &= frac{dot{x}}{|x|_2}+x frac{partial}{partial t}left(frac{1}{|x|_2} right) &= frac{dot{x}}{|x|_2}-x frac{1}{|x|_2^3}x^{top}dot{x} &= A frac{x}{|x|_2} - frac{x}{|x|_2} frac{x^{top}}{|x|_2}Afrac{x}{|x|_2} &= left(A-(y^{top}Ay) I_nright) y. end{array}$

The beauty is the following, to find the equilibria, consider $(A-(y^{top}Ay)I_n)y = 0$ . Of course, y=0

is not valid since $0notin mathbf{S}^{n-1}$ . However, take any eigenvector

, then

is still an eigenvector, plus it lives on $mathbf{S}^{n-1}$ . If we plug such a scaled

(from now on consider all eigenvectors to live on $mathbf{S}^{n-1}$ ) into f(y) = 0

we get

, which is rather cool. The eigenvectors of

are (at least) the equilibria of

. Note, if f(v)=0

, then so is f(-v)=0

, each eigenvector gives rise to two equilibria. We say at least, since if mathrm{ker}(A)

is non-trivial, then for any $yin mathrm{ker}(A)subset mathbf{R}^n$ , we get f(y)=0

Let

, then what can be said about the stability of the equilibrium points? Here, we must look at $T_vmathbf{S}^{n-1}$ , where the

-dimensional system is locally a n-1

-dimensional linear system. To do this, we start by computing the standard Jacobian of

, which is given by

$Df(y) = A-I_n (y^{top}Ay) - 2 yy^{top}A.$

As said before, we do not have a

-dimensional system. To that end, we assume Ain mathsf{Sym}(n,mathbf{R})

with diagonalization $A=QLambda Q^{top}$ (otherwise peform a Gram-Schmidt procedure) and perform a change of basis $Df'=Q^{top}DfQ$ . Then if we are interested in the qualitative stability properties of v_i

(the $i^{mathrm{th}}$ eigenvector of

), we can simply look at the eigenvalues of the matrix resulting from removing the $i^{mathrm{th}}$ row and column in $Df'|_{v_i}$ , denoted $Df'|_{T_{v_i}mathbf{S}^{n-1}}$ .

We do two examples. In both figures are the initial conditions indicated with a dot while the equilibrium points are stars.

Finite Calls to Stability | 31 Oct. 2019 |
tags: math.LA, math.DS

Consider an unknown dynamical system $x_{k+1}=Ax_k$ parametrized by Ain mathsf{Sym}(n,mathbf{R})

. In this short note we look at a relatively simple, yet neat method to assess stability of

via finite calls to a particular oracle. Here, we will follow the notes by Roman Vershynin (RV2011).

Assume we have no knowledge of

, but we are able to obtain langle Ay, y rangle

for any desired $yin mathbf{R}^n$ . It turns out that we can use this oracle to bound rho(A)

from above.

First, recall that since

is symmetric, $|A|_2=sigma_{mathrm{max}}(A)=|lambda_{mathrm{max}}(A)|=rho(A)$ . Now, the common definition of |A|_2

is $sup_{xin mathbf{R}^nsetminus{0}}|Ax|_2/|x|_2$ . The constraint set $mathbf{R}^nsetminus{0}$ is not per se convenient from a computational point of view. Instead, we can look at a compact constraint set: $|A|_2=sup_{xin mathbf{S}^{n-1}} |Ax|_2$ , where $mathbf{S}^{n-1}$ is the sphere in $mathbf{R}^n$ . The latter formulation has an advantage, we can discretize these compact constraints sets and tame functions over the individual chunks.

To do this formally, we can use varepsilon

-nets, which are convenient in quantitative analysis on compact metric spaces (X,d)

. Let $mathcal{N}_{varepsilon}subseteq X$ be a

-net for

if $forall xin X;exists yin mathcal{N}_{varepsilon}:d(x,y)leq varepsilon$ . Hence, we measure the amount of balls to cover

For example, take $(mathbf{S}^{n-1},ell^n_2)$ as our metric space where we want to bound $|mathcal{N}_{varepsilon}|$ (the cardinality, i.e., the amount of balls). Then, to construct a minimal set of balls, consider an

-net of balls $B_{varepsilon/2}(y)$ , with $d(y_1,y_2)geq varepsilon;forall y_1,y_2in mathcal{N}_{varepsilon}$ . Now the insight is simply that $mathrm{vol}(B_{varepsilon/2})|mathcal{N}_{varepsilon}|leq mathrm{vol}(B_{1+varepsilon/2})-mathrm{vol}(B_{1-varepsilon/2})$ . Luckily, we understand the volume of

-dimensional Euclidean balls well such that we can establish

$|mathcal{N}_{varepsilon}| leq frac{mathrm{vol}(B_{1+varepsilon/2})-mathrm{vol}(B_{1-varepsilon/2})}{mathrm{vol}(B_{varepsilon/2})}= (1+2/varepsilon)^n-(1-2/varepsilon)^n.$

Next, we recite Lemma 5.4 from (RV2011). Given an varepsilon

-net for $mathbf{S}^{n-1}$ with varepsilonin[0,1)

we have

$|A|_2 = sup_{xin mathbf{S}^{n-1}}|langle Ax,xrangle | leq frac{1}{(1-2varepsilon)}sup_{xin mathcal{N}_{varepsilon}}|langle Ax,x rangle |$

This can be shown using the Cauchy-Schwartz inequality. Now, it would be interesting to see how sharp this bound is. To that end, we do a

-dimensional example ( x=(x_1,x_2)

). Note, here we can easily find an optimal grid using polar coordinates. Let the unknown system be parametrized by

such that

. In the figures below we compute the upper-bound to |A|_2

for several nets $mathcal{N}_{varepsilon}$ . Indeed, for finer nets, we see the upper-bound approaching 0.85

. However, we see that the convergence is slow while being inversely related to the exponential growth in $|mathcal{N}_{varepsilon}|$ .

Overall, we see that after about a 100 calls to our oracle we can be sure about

being exponentially stable. Here we just highlighted an example from (RV2011), at a later moment we will discuss its use in probability theory.

Help the (skinny) Hedgehogs | 31 Oct. 2019 |
tags: math.LA

In this first post we will see that the ‘‘almost surely’’ in the title is not only hinting at stochasticity.

For a long time this picture - as taken from Gabriel Peyré his outstanding twitter feed - has been on a wall in our local wildlife (inc. hedgehogs) shelter. This has lead to a few fun discussions and here we try to explain the picture a bit more.

Let $B^{ell_1^n}_1 := {xin mathbf{R}^n;:;sum^n_{i=1}|x_i|leq 1}$ . Now, we want to find the largest ball $B^{ell_2^n}_r:={xin mathbf{R}^n;:; sum^n_{i=1}x_i^2leq r^2}$ within $B^{ell_1^n}_1$ . We can find that $min_{xin partial B^{ell_1^n}_1}|x|_2$ is given by $|x_i^{star}|=n^{-1};forall iin overline{n}$ with $|x^{star}|_2=sqrt{1/n}$ . This follows the most easily from geometric intuition.

Besides being a fun visualization of concentration, this picture means a bit more to me. For years I have been working at the Wildlife shelter in Delft and the moment has come where the lack of financial support from surrounding municipalities is becoming critical. As usual, this problem can be largely attributed to bureaucracy and the exorbitant amount of managers this world has. Nevertheless, the team in Delft and their colleagues around the country do a fantastic job in spreading the word (see Volkskrant, RTLnieuws). Lets hope Delft, and it neighbours, finally see value in protecting the local wildlife we still have.

Almost Surely Investigations in Optimal Control and Dynamical Systems,

Communities |July 02, 2024|
`tags: math.OC`

Weekend read |May 14, 2023|
`tags: math.OC`

Standing on the shoulders of giants |April 3, 2023|
`tags: math.OC`

On Critical Transitions in Nature and Society |29 January 2023|
`tags: math.DS`

Counterexamples 1/N |9 January 2023|
`tags: math.An`

On ‘‘the art of the state’’ |29 December 2022|
`tags: math.DS`

On the complexity of matrix multiplications |10 October 2022|
`tags: math.LA`

A corollary |6 June 2022|
`tags: math.DS`

Co-observability |6 January 2022|
`tags: math.OC`

Fair convex partitioning |26 September 2021|
`tags: math.OC`

Solving Linear Programs via Isospectral flows |05 September 2021|
`tags: math.OC, math.DS, math.DG`

The Imaginary trick |12 March 2021|
`tags: math.OC`

Topological Considersations in Stability Analysis |17 November 2020|
`tags: math.DS`

Topological Obstructions to Global Asymptotic Stability

Global Isomorphisms

Remark on the Stabilization of Sets

Endomorphisms, Tensors in Disguise |18 August 2020|
`tags: math.LA`

Optimal coordinates |24 May 2020|
`tags: math.LA, math.OC, math.DS`

A Peculiar Lower Bound to the Spectral Radius |7 March 2020|
`tags: math.LA`

Spectral Radius vs Spectral Norm |28 Febr. 2020|
`tags: math.LA, math.DS`

Risky Business in Stochastic Control: Exponential Utility |12 Jan. 2020|
`tags: math.OC`

Are Theo Jansen his Linkages Optimal? |16 Dec. 2019|
`tags: math.OC`

A Special Group, Volume Preserving Feedback |16 Nov. 2019|
`tags: math.LA, math.DS`

Riemannian Gradient Flow |5 Nov. 2019|
`tags: math.DG, math.DS`

Simple Flow Algorithm to Compute the Operator Norm |4 Nov. 2019|
`tags: math.LA`

Dynamical Systems on the Sphere |3 Nov. 2019|
`tags: math.DS`

Finite Calls to Stability | 31 Oct. 2019 |
`tags: math.LA, math.DS`

Help the (skinny) Hedgehogs | 31 Oct. 2019 |
`tags: math.LA`

Almost Surely Investigations in Optimal Control and Dynamical Systems,

Communities |July 02, 2024|tags: math.OC

Weekend read |May 14, 2023|tags: math.OC

Standing on the shoulders of giants |April 3, 2023|tags: math.OC

On Critical Transitions in Nature and Society |29 January 2023|tags: math.DS

Counterexamples 1/N |9 January 2023|tags: math.An

On ‘‘the art of the state’’ |29 December 2022|tags: math.DS

On the complexity of matrix multiplications |10 October 2022|tags: math.LA

A corollary |6 June 2022|tags: math.DS

Co-observability |6 January 2022|tags: math.OC

Fair convex partitioning |26 September 2021|tags: math.OC

Solving Linear Programs via Isospectral flows |05 September 2021|tags: math.OC, math.DS, math.DG

The Imaginary trick |12 March 2021|tags: math.OC

Topological Considersations in Stability Analysis |17 November 2020|tags: math.DS

Topological Obstructions to Global Asymptotic Stability

Global Isomorphisms

Remark on the Stabilization of Sets

Endomorphisms, Tensors in Disguise |18 August 2020|tags: math.LA

Optimal coordinates |24 May 2020|tags: math.LA, math.OC, math.DS

A Peculiar Lower Bound to the Spectral Radius |7 March 2020|tags: math.LA

Spectral Radius vs Spectral Norm |28 Febr. 2020|tags: math.LA, math.DS

Risky Business in Stochastic Control: Exponential Utility |12 Jan. 2020|tags: math.OC

Are Theo Jansen his Linkages Optimal? |16 Dec. 2019|tags: math.OC

A Special Group, Volume Preserving Feedback |16 Nov. 2019|tags: math.LA, math.DS

Riemannian Gradient Flow |5 Nov. 2019|tags: math.DG, math.DS

Simple Flow Algorithm to Compute the Operator Norm |4 Nov. 2019|tags: math.LA

Dynamical Systems on the Sphere |3 Nov. 2019|tags: math.DS

Finite Calls to Stability | 31 Oct. 2019 |tags: math.LA, math.DS

Help the (skinny) Hedgehogs | 31 Oct. 2019 |tags: math.LA

Communities |July 02, 2024|
`tags: math.OC`

Weekend read |May 14, 2023|
`tags: math.OC`

Standing on the shoulders of giants |April 3, 2023|
`tags: math.OC`

On Critical Transitions in Nature and Society |29 January 2023|
`tags: math.DS`

Counterexamples 1/N |9 January 2023|
`tags: math.An`

On ‘‘the art of the state’’ |29 December 2022|
`tags: math.DS`

On the complexity of matrix multiplications |10 October 2022|
`tags: math.LA`

A corollary |6 June 2022|
`tags: math.DS`

Co-observability |6 January 2022|
`tags: math.OC`

Fair convex partitioning |26 September 2021|
`tags: math.OC`

Solving Linear Programs via Isospectral flows |05 September 2021|
`tags: math.OC, math.DS, math.DG`

The Imaginary trick |12 March 2021|
`tags: math.OC`

Topological Considersations in Stability Analysis |17 November 2020|
`tags: math.DS`

Endomorphisms, Tensors in Disguise |18 August 2020|
`tags: math.LA`

Optimal coordinates |24 May 2020|
`tags: math.LA, math.OC, math.DS`

A Peculiar Lower Bound to the Spectral Radius |7 March 2020|
`tags: math.LA`

Spectral Radius vs Spectral Norm |28 Febr. 2020|
`tags: math.LA, math.DS`

Risky Business in Stochastic Control: Exponential Utility |12 Jan. 2020|
`tags: math.OC`

Are Theo Jansen his Linkages Optimal? |16 Dec. 2019|
`tags: math.OC`

A Special Group, Volume Preserving Feedback |16 Nov. 2019|
`tags: math.LA, math.DS`

Riemannian Gradient Flow |5 Nov. 2019|
`tags: math.DG, math.DS`

Simple Flow Algorithm to Compute the Operator Norm |4 Nov. 2019|
`tags: math.LA`

Dynamical Systems on the Sphere |3 Nov. 2019|
`tags: math.DS`

Finite Calls to Stability | 31 Oct. 2019 |
`tags: math.LA, math.DS`

Help the (skinny) Hedgehogs | 31 Oct. 2019 |
`tags: math.LA`