On the complexity of matrix multiplications |10 October 2022|
tags: math.LA

Recently, DeepMind published new work on ‘‘discovering’’ numerical linear algebra routines. This is remarkable, but I leave the assesment of this contribution to others. Rather, I like to highlight the earlier discovery of Strassen. Namely, how to improve upon the obvious matrix multiplication algorithm.

To get some intuition, consider multiplying two complex numbers (a_1+ib_1),(a_2+ib_2)in mathbf{C}times mathbf{C}, that is, construct a_1a_2-b_1b_2+i(b_1a_2+a_1b_2). In this case, the obvious thing to do would lead to 4 multiplications, namely a_1cdot a_2, b_1cdot b_2, b_1cdot a_2 and a_1cdot b_2. However, consider using only (a_1+b_1)cdot (a_2+b_2), a_1cdot a_2 and b_1cdot b_2. These multiplications suffice to construct (a_1+ib_1)cdot (a_2+ib_2) (subtract the latter two from the first one). The emphasis is on multiplications as it seems to be folklore knowledge that multiplications (and especially divisions) are the most costly. However, note that we do use more additive operations. Are these operations not equally ‘‘costly’’? The answer to this question is delicate and still changing!

When working with integers, additions are indeed generally ‘‘cheaper’’ than multiplications. However, oftentimes we work with floating points. Roughly speaking, these are numbers fin mathbf{Q} characterized by some (signed) mantissa m, a base b and some (signed) exponent e, specifically, f=mb^e. For instance, consider the addition 1.2345cdot 10^{-1}+1.2345cdot 10^{-4}. To perform the addition, one normalizes the two numbers, e.g., one writes 1.2345cdot 10^{-1}+0.0012345cdot 10^{-1}. Then, to perform the addition one needs to take the appropriate precision into account, indeed, the result is 1.2357cdot 10^{-1}. All this example should convey is that we have sequential steps. Now consider multiplying the same numbers, that is, (1.2345cdot 10^{-1})cdot(1.2345cdot 10^{-4}). In this case, one can proceed with two steps in parallel, on the one hand (1.2345)^2 and on the other hand 10^{-1-4}. Again, one needs to take the correct precision into account in the end, but we see that by no means multiplication must be significantly slower in general!

Now back to the matrix multiplication algorithms. Say we have

 A = left[begin{array}{ll} A_{11} & A_{12} A_{21} & A_{22}  end{array}right],quad  B = left[begin{array}{ll} B_{11} & B_{12} B_{21} & B_{22}  end{array}right],

then, C=AB is given by

 C = left[begin{array}{ll} C_{11} & C_{12} C_{21} & C_{22}  end{array}right]= left[begin{array}{ll} A_{11}B_{11}+A_{12}B_{21} & A_{11}B_{12}+A_{12}B_{22} A_{21}B_{11}+A_{22}B_{21} & A_{21}B_{12}+A_{22}B_{22}  end{array}right].

Indeed, the most straightforward matrix multiplication algorithm for Ain mathbf{R}^{ntimes n} and Bin mathbf{R}^{ntimes n} costs you n^3 multiplications and n^2(n-1) additions. Using the intuition from the complex multiplication we might expect we can do with fewer multiplications. In 1969 (Numer. Math. 13 354–356) Volker Strassen showed that this is indeed true.

Following his short, but highly influential, paper, let mathrm{I}=(A_{11}+A_{22})(B_{11}+B_{22}), mathrm{II}=(A_{21}+A_{22})B_{11}, mathrm{III}=A_{11}(B_{12}-B_{22}), mathrm{IV}=A_{22}(-B_{11}+B_{21}), mathrm{V}=(A_{11}+A_{12})B_{22}, mathrm{VI}=(-A_{11}+A_{21})(B_{11}+B_{12}) and mathrm{VII}=(A_{12}-A_{22})(B_{21}+B_{22}). Then, C_{11}=mathrm{I}+mathrm{IV}-mathrm{V}+mathrm{VII}, C_{12}=mathrm{II}+mathrm{IV}, C_{12}=mathrm{III}+mathrm{V} and C_{22}=mathrm{I}+mathrm{III}-mathrm{II}+mathrm{VI}. See that we need 7 multiplications and 18 additions to construct C for both A and B being 2-dimensional. The ‘‘obvious’’ algorithm would cost 8 multiplications and 4 additions. Indeed, one can show now that for matrices of size 2^k one would need 7^k multiplications (via induction). Differently put, one needs n^{log_2(7)}approx n^{2.8} multiplications. This bound prevails when including all operations, but only asymptotically, in the sense that O(n^{2.8})<O(n^3). In practical algorithms, not only multiplications, but also memory and indeed additions play a big role. I am particularly interested how the upcoming DeepMind algorithms will take numerical stability into account.

*Update: the improved complexity got immediately improved.