On the complexity of matrix multiplications |10 October 2022|
`tags: math.LA`

Recently, DeepMind published new work on ‘‘discovering’’ numerical linear algebra routines. This is remarkable, but I leave the assesment of this contribution to others. Rather, I like to highlight the earlier discovery of Strassen. Namely, how to improve upon the obvious matrix multiplication algorithm.

To get some intuition, consider multiplying two complex numbers $(a_1+ib_1),(a_2+ib_2)in mathbf{C}times mathbf{C}$ , that is, construct a_1a_2-b_1b_2+i(b_1a_2+a_1b_2)

. In this case, the obvious thing to do would lead to 4 multiplications, namely a_1cdot a_2

and

. However, consider using only (a_1+b_1)cdot (a_2+b_2)

and

. These multiplications suffice to construct (a_1+ib_1)cdot (a_2+ib_2)

(subtract the latter two from the first one). The emphasis is on multiplications as it seems to be folklore knowledge that multiplications (and especially divisions) are the most costly. However, note that we do use more additive operations. Are these operations not equally ‘‘costly’’? The answer to this question is delicate and still changing!

When working with integers, additions are indeed generally ‘‘cheaper’’ than multiplications. However, oftentimes we work with floating points. Roughly speaking, these are numbers fin mathbf{Q}

characterized by some (signed) mantissa

, a base

and some (signed) exponent

, specifically, f=mb^e

. For instance, consider the addition $1.2345cdot 10^{-1}+1.2345cdot 10^{-4}$ . To perform the addition, one normalizes the two numbers, e.g., one writes $1.2345cdot 10^{-1}+0.0012345cdot 10^{-1}$ . Then, to perform the addition one needs to take the appropriate precision into account, indeed, the result is $1.2357cdot 10^{-1}$ . All this example should convey is that we have sequential steps. Now consider multiplying the same numbers, that is, $(1.2345cdot 10^{-1})cdot(1.2345cdot 10^{-4})$ . In this case, one can proceed with two steps in parallel, on the one hand (1.2345)^2

and on the other hand $10^{-1-4}$ . Again, one needs to take the correct precision into account in the end, but we see that by no means multiplication must be significantly slower in general!

$A = left[begin{array}{ll} A_{11} & A_{12} A_{21} & A_{22} end{array}right],quad B = left[begin{array}{ll} B_{11} & B_{12} B_{21} & B_{22} end{array}right],$

$C = left[begin{array}{ll} C_{11} & C_{12} C_{21} & C_{22} end{array}right]= left[begin{array}{ll} A_{11}B_{11}+A_{12}B_{21} & A_{11}B_{12}+A_{12}B_{22} A_{21}B_{11}+A_{22}B_{21} & A_{21}B_{12}+A_{22}B_{22} end{array}right].$

Indeed, the most straightforward matrix multiplication algorithm for $Ain mathbf{R}^{ntimes n}$ and $Bin mathbf{R}^{ntimes n}$ costs you n^3

multiplications and n^2(n-1)

additions. Using the intuition from the complex multiplication we might expect we can do with fewer multiplications. In 1969 (Numer. Math. 13 354–356) Volker Strassen showed that this is indeed true.

Following his short, but highly influential, paper, let $mathrm{I}=(A_{11}+A_{22})(B_{11}+B_{22})$ , $mathrm{II}=(A_{21}+A_{22})B_{11}$ , $mathrm{III}=A_{11}(B_{12}-B_{22})$ , $mathrm{IV}=A_{22}(-B_{11}+B_{21})$ , $mathrm{V}=(A_{11}+A_{12})B_{22}$ , $mathrm{VI}=(-A_{11}+A_{21})(B_{11}+B_{12})$ and $mathrm{VII}=(A_{12}-A_{22})(B_{21}+B_{22})$ . Then, $C_{11}=mathrm{I}+mathrm{IV}-mathrm{V}+mathrm{VII}$ , $C_{12}=mathrm{II}+mathrm{IV}$ , $C_{12}=mathrm{III}+mathrm{V}$ and $C_{22}=mathrm{I}+mathrm{III}-mathrm{II}+mathrm{VI}$ . See that we need

multiplications and

additions to construct

for both

and

being

-dimensional. The ‘‘obvious’’ algorithm would cost

multiplications and

additions. Indeed, one can show now that for matrices of size 2^k

one would need 7^k

multiplications (via induction). Differently put, one needs $n^{log_2(7)}approx n^{2.8}$ multiplications. This bound prevails when including all operations, but only asymptotically, in the sense that $O(n^{2.8})<O(n^3)$ . In practical algorithms, not only multiplications, but also memory and indeed additions play a big role. I am particularly interested how the upcoming DeepMind algorithms will take numerical stability into account.

On the complexity of matrix multiplications |10 October 2022|tags: math.LA

On the complexity of matrix multiplications |10 October 2022|
`tags: math.LA`