A Visual Tutorial for Matrix Multiplication


Mastering the Math Behind AI, Machine Learning and Data Science

Catchy image generated using AI (Gemini)
Image generated using AI (Gemini)

Introduction

Matrix multiplication is the workhorse of the modern world. From powering deep learning algorithms to rendering photo-realistic graphics in video games — it’s everywhere.

A solid understanding of this process is fundamental to anyone working with data. Unfortunately, we are often taught just one way to perform matrix multiplication — the traditional ”row-times-column” method. However, seeing that matrix multiplication can be approached in different ways can be a real revelation boosting your math and conceptual understanding skills. 

In this guide, I’ll present four distinct ways to perform matrix multiplication and I’ll illustrate them with clear, visual explanations, making it easy to internalize the concepts. While the core ideas about these varied perspectives can be found in excellent books like [1] (which is the main reference for this text), their descriptions can be very short. My contribution is to provide a gentle, step-by-step approach, coupled with dedicated visualizations for each method, along with clear numerical examples. I’ve used AI for proofreading as English is not my first language.

First, let’s quickly establish some common notations I’ll use throughout this guide:

  • Matrix: A matrix will be denoted as an uppercase bold letter, e.g. A denotes a matrix.
  • Columns: I’ll denote the i-th column of a matrix using a lowercase bold letter with a subscript. For example, aᵢ will represent the i-th column of matrix A.
  • Dimensions: The variable m will always refer to the number of rows in a matrix, and n will denote the number of columns. So, if I say A ∈ ℝ³ˣ⁴, it means matrix A has m = 3 rows and n = 4 columns. 
  • Multiplication Compatibility: The product of two matrices, A and B, can only be computed if the number of columns in A matches the number of rows in B. If A is an m × n matrix and B is an n × p matrix, then the resulting product AB will be a new matrix with dimensions m × p (number of rows of A by number of columns of B). Visually, you can think of it like this: if A ∈ ℝᵐˣⁿ and B ∈ ℝⁿˣᵖ , the ”inner” dimensions (n) must match, and the ”outer” dimensions (m and p) define the size of
    the result.

Perspective 1: Columns of A, Weighted by Entries of b

Let’s start with something basic that we’ll need for understanding more complex matrix multiplications: we multiply a matrix A by a matrix that is just a single column vector b. (When B is a single column vector meaning B ∈ ℝⁿˣ¹ we can also refer to it as b.) The product Ab is simply the sum of each column of A, scaled by the respective entry in b. More precisely: If A has columns a₁, a₂, . . . , aₙ and b is a column vector with entries b₁, b₂, . . . , bₙ , then:

Matrix times column is the sum of the columns of the matrix weighted by the entries of the column.
Matrix times column is the sum of the columns of the matrix weighted by the entries of the column.

This relationship is visualized in Figure 1, where you can see how each column of A is ”stretched” or ”shrunk” by an element of b before being combined.

Visualizing A × B when B is a single column. The product shows each column of A being weighted (scaled) by its corresponding element in B, resulting in a linear combination of A’s columns.
Figure 1: Visualizing A × B when B is a single column. The product shows each column
of A being weighted (scaled) by its corresponding element in B, resulting in a linear
combination of A’s columns. Source: Image by the author.

Numerical Example

Let’s consider a numerical example for A × b. Let matrix A be:

A matrix.
A matrix.

And let the column vector b be:

A column vector.
A column vector.

Following Perspective 1, we treat b’s entries as weights for A’s columns. The columns of A are:

Columns of A.
Columns of A.

Then, the product Ab is:

The product Ab.
The product Ab.

This clearly shows how each column of A is scaled by the corresponding entry from b before they are summed to form the final result.


Perspective 2: Rows × Columns Dot Products

The second perspective is the most common — it’s likely the method you learned first in school or university. This approach focuses on calculating each individual entry in the resulting matrix, C = AB. In this view, every entry cᵢ,ⱼ (the element in row i and column j of C) is computed as a dot product between the i-th row of A and the j-th column of B.

Recall the Dot Product: The dot product of two column vectors a and b of the same length r is a single scalar number.

The dot product.
The dot product.

(The superscript T denotes the transpose, which means the column vector turns into a row vector. This is necessary for the dimensions to match.)

Here is a numerical example for two vectors a = [1, 2, 3]ᵀ and b = [4, 5, 6]ᵀ:

A numerical example for computing the dot product.
A numerical example for computing the dot product.

Thus, when you multiply two matrices, each entry in the resulting matrix cᵢ,ⱼ is defined as:

Matrix multiplication by computing dot products.
Matrix multiplication by computing dot products.

This process is visualized in Figure 2, highlighting how the matching row and column ”meet” to produce a single number in the result matrix.

Figure 2: In this figure, matrix A is displayed on the left, and matrix B is positioned on top. The resulting product AB appears in the center, visually aligning with the rows of A and the columns of B. This arrangement clearly illustrates which rows of A are multiplied by which columns of B to form the entries of the product matrix. Specifically, the product AB contains the dot product of row i of A and column j of B at position (i, j). This is explicitly demonstrated for position 2,1.
Figure 2: In this figure, matrix A is displayed on the left, and matrix B is positioned on
top. The resulting product AB appears in the center, visually aligning with the rows of A
and the columns of B. This arrangement clearly illustrates which rows of A are multiplied
by which columns of B to form the entries of the product matrix. Specifically, the product
AB contains the dot product of row i of A and column j of B at position (i, j). This
relationship is explicitly demonstrated in the figure for position (2, 1). Source: Image by the author.

Numerical Example

Let’s illustrate the ”Rows × Columns Dot Products” perspective with a numerical example. Consider two matrices A and B:

Two matrices.
Two matrices.

To find the product C = AB, we compute each entry cᵢ,ⱼ by taking the dot product of the i-th row of A and the j-th column of B. The resulting matrix C will be 2 × 2.
Let’s calculate each entry:

For c₁,₁ (Row 1 of A · Column 1 of B):

Row 1 of A · Column 1 of B
Row 1 of A · Column 1 of B.

For c₁,₂ (Row 1 of A · Column 2 of B):

Row 1 of A · Column 2 of B.
Row 1 of A · Column 2 of B.

For c₂,₁ (Row 2 of A · Column 1 of B):

Row 2 of A · Column 1 of B
Row 2 of A · Column 1 of B.

For c₂,₂ (Row 2 of A · Column 2 of B):

Row 2 of A · Column 2 of B.
Row 2 of A · Column 2 of B.

Therefore, the product matrix C is:

The product matrix C.
The product matrix C.

Perspective 3: Matrix × Columns (Column-by-Column Transformation)

This third perspective builds directly upon our first one, where we multiplied a matrix by a single column vector. Here, we recognize that the overall matrix multiplication AB can be broken down into a series of such individual operations. Specifically, each column j in the resulting product matrix C = AB is obtained by multiplying the first matrix A by the j-th column of the second matrix B. So, if B has columns b₁ , b₂ , . . . , bₚ , then the product AB will have columns Ab₁ , Ab₂ , . . . , Abₚ :

AB equals a matrix whose columns are formed by multiplying A by each column vector of B.
AB equals a matrix whose columns are formed by multiplying A by each column vector of B.

This perspective is useful because it highlights how matrix A transforms each column vector of B independently. Each column of the result is a linear combination of the columns of A, with the weights coming from the corresponding column of B. Refer to Figure 3 for a visualization of this transformation.

Figure 3: Again, A is shown on the left and B is positioned at the top, with AB is displayed in the center. Each column j in the result represents a matrix-vector multiplication, where A is multiplied by the j’th column of B.
Figure 3: Again, A is shown on the left and B is positioned at the top, with AB is displayed
in the center. Each column j in the result represents a matrix-vector multiplication, where
A is multiplied by the j’th column of B. Source: Image by the author.

Since I’ve already demonstrated a numerical example of matrix-vector multiplication in Perspective 1, I’ll forego another one here. The core calculation for each Abⱼ is identical to what we saw previously.


Perspective 4: Columns × Rows (Summing Outer Products)

The fourth and final perspective we’ll explore is somehow the opposite of the ”dot product view,” as it involves using outer products. This method emphasizes how the product matrix AB can be constructed by summing up a series of smaller matrices. This perspective is essential for understanding concepts like Singular Value Decomposition (SVD) and low-rank approximations!

Recall the Outer Product: An outer product is formed by multiplying a column vector by a row vector, and the result is a matrix. This is distinct from a dot product, which multiplies a row by a column and results in a single scalar number. If u is an m × 1 column vector and vᵀ is a 1 × p row vector, their outer product uvᵀ is an m × p matrix. When we multiply two matrices, A ∈ ℝᵐˣⁿ and B ∈ ℝⁿˣᵖ, we can compute AB by summing up
a series of outer products. Specifically, we take each column of A and multiply it by the corresponding row of B. Let aₖ denote the k-th column of A and bᵀₖ denote the k-th row of the transposed B. (The notation might be confusing, it is simply the k’th row of B but bᵀₖ looks as if it was the first column of B and then transposed, but that’s not what it is. It is the first column of the transposed B, and then transposed again. This is just a way to denote a row of a matrix.) The product AB is then the sum of these outer products:

AB as the sum of outer product.
AB as the sum of outer product.

Each term abᵀₖ is an m×p matrix, and their sum yields the final m×p product matrix AB. Refer to Figure 4 for a visual demonstration of
summing these outer products.

Figure 4: The product of two matrices is the sum of their outer products, meaning each column of A is multiplied by the corresponding row of B. In the figure, the column of A is shown on the left, while the corresponding row of B is displayed at the top, allowing the resulting product — a matrix — to be presented in the center
Figure 4: The product of two matrices is the sum of their outer products, meaning each
column of A is multiplied by the corresponding row of B. In the figure, the column of A
is shown on the left, while the corresponding row of B is displayed at the top, allowing the
resulting product — a matrix — to be presented in the center. Source: Image by the author.

Numerical Example

Let’s use a numerical example to demonstrate the ”Summing Outer Products” method. Consider two matrices A and B:

Two example matrices A and B.
Two example matrices A and B.

Here, A has columns a₁=[1,3]ᵀ and a₂=[2,4]ᵀ. And B has rows bᵀ₁ = [5, 6] and bᵀ₂=[7, 8]. According to this perspective, AB is the sum of outer products: abᵀ₁ + abᵀ₂ .

1. First Outer Product: abᵀ₁

First Outer Product: a₁bᵀ₁
First Outer Product: a₁bᵀ₁

2. Second Outer Product: abᵀ₂

Second Outer Product: a₂bᵀ₂
Second Outer Product: a₂bᵀ₂

Now, sum these two outer product matrices to get the final product AB:

AB is the sum of the outer products.
AB is the sum of the outer products.

This perspective is fundamental across linear algebra and its applications. It’s the core idea behind concepts like Singular Value Decomposition (SVD), where a matrix is decomposed into a sum of simple outer products, allowing for powerful techniques like dimensionality reduction, low-rank approximations, and recommendation systems. Understanding this
view unlocks a deeper insight into how matrices can capture and represent complex relationships.


Summing Up: Four Ways to View Matrix Multiplication

We’ve explored four distinct ways to interpret matrix multiplication, moving beyond the single ”row-times-column” method. Each perspective offers unique insights:

1. Matrix-Vector Product: AB (where B is a single column) is a linear combination of A’s columns, weighted by B’s entries.

2. Row-by-Column Dot Products: Each entry in AB is the dot product of a row from A and a column from B. (The traditional method).

3. Matrix-by-Columns: Each column of AB is A multiplied by the corresponding column of B.

4. Columns-by-Rows (Sum of Outer Products): AB is the sum of outer products, where each is a column of A times a corresponding row of B.

Congratulations for making it this far! By exploring these four distinct perspectives, you’ve moved beyond rote calculation and gained a deeper, more intuitive understanding of matrix multiplication, which I’m sure you’ll find useful in the future.


References

[1] Gilbert Strang. Introduction to Linear Algebra, Fifth Edition. Wellesley-Cambridge Press, 2016.

Was this helpful?

8 / 0

Leave a Reply 0

Your email address will not be published. Required fields are marked *


Cookie Consent with Real Cookie Banner