Recently, DeepMind published new work on ‘‘ discovering’’ numerical linear algebra routines. This is remarkable, but I leave the assesment of this contribution to others. Rather, I like to highlight the earlier discovery of Strassen. Namely, how to improve upon the obvious matrix multiplication algorithm.
To get some intuition, consider multiplying two complex numbers , that is, construct . In this case, the obvious thing to do would lead to 4 multiplications, namely , , and . However, consider using only , and . These multiplications suffice to construct (subtract the latter two from the first one). The emphasis is on multiplications as it seems to be folklore knowledge that multiplications (and especially divisions) are the most costly. However, note that we do use more additive operations. Are these operations not equally ‘‘costly’’? The answer to this question is delicate and still changing!
When working with integers, additions are indeed generally ‘‘cheaper’’ than multiplications. However, oftentimes we work with floating points. Roughly speaking, these are numbers characterized by some (signed) mantissa , a base and some (signed) exponent , specifically, . For instance, consider the addition . To perform the addition, one normalizes the two numbers, e.g., one writes . Then, to perform the addition one needs to take the appropriate precision into account, indeed, the result is . All this example should convey is that we have sequential steps. Now consider multiplying the same numbers, that is, . In this case, one can proceed with two steps in parallel, on the one hand and on the other hand . Again, one needs to take the correct precision into account in the end, but we see that by no means multiplication must be significantly slower in general!
Now back to the matrix multiplication algorithms. Say we have
then, is given by
Indeed, the most straightforward matrix multiplication algorithm for and costs you multiplications and additions.
Using the intuition from the complex multiplication we might expect we can do with fewer multiplications. In 1969 (Numer. Math. 13 354–356) Volker Strassen showed that this is indeed true.
Following his short, but highly influential, paper, let , , , , , and . Then, , , and .
See that we need multiplications and additions to construct for both and being -dimensional. The ‘‘obvious’’ algorithm would cost multiplications and additions. Indeed, one can show now that for matrices of size one would need multiplications (via induction). Differently put, one needs multiplications.
This bound prevails when including all operations, but only asymptotically, in the sense that . In practical algorithms, not only multiplications, but also memory and indeed additions play a big role. I am particularly interested how the upcoming DeepMind algorithms will take numerical stability into account.
*Update: the improved complexity got immediately improved.
|