1 Introduction
Understanding how functions change in different directions is crucial in many fields. For example in the context of neural networks where gradients are used to update weights during the training process. In this article, we will explore the concepts of the directional derivative and the gradient, and we will demonstrate their relationship through the equation: \[\colorbox{yellow}{$ g'(0)=\nabla_u f(\mathbf{x}) = \nabla f(\mathbf{x})\mathbf{u}$}\] This article is best understood if you have some basic knowledge of dot products, derivatives, Leibniz and Lagrange notation, partial derivatives, and the chain rule.
TL;DR: jump to 6.
2 What are the Gradient and the Directional Derivative?
To begin with, let’s look at the derivative of a univariate function. It’s defined as: \[\frac{df}{dx} = \lim\limits_{h \to 0} \frac{f(x+h)-f(x)}{h}\] (Here we have used Leibniz notation for denoting the derivative, \(\frac{df}{dx}\). This is simply a different notation for the Lagrange notation \(f'(x)\), which many are familiar with from school.) The derivative of a univariate function \(f(x)\) is its slope, defined as the rise (vertical change) over the run (horizontal change), with the latter being infinitesimally small. It indicates how the function changes at a given point when we increase its parameter by an infinitesimal amount. This “change in the function” means we learn how steep the function is at that point and whether it is increasing or decreasing.
2.1 The Gradient
The gradient is a generalization of the derivative for the case of a scalar-valued function, \(f(\mathbf{x}): \mathbb{R}^n \to \mathbb{R}\) (many values in, one value out). It’s defined as: \[\nabla f(\mathbf{x}) = \begin{bmatrix} \frac{\partial f}{\partial{x_1}} \\ \frac{\partial f}{\partial{x_2}} \\ \vdots \\ \frac{\partial f}{\partial{x_n}} \end{bmatrix}.\] Thus, the gradient is a vector of all partial derivatives. The symbol \(\nabla\) is called nabla and read “del”. It can be interpreted as an operator that is applied to the function. Each element of the gradient vector gives the slope of the function with respect to the corresponding variable: \(\frac{\partial f}{\partial x_1}\) gives the slope of the function with respect to \(x_1\) treating all other variables (\(x_2,..x_n\)) as constants. In fact, the gradient points to the direction of steepest ascent. We will look at a proof for this shortly in section 4.
2.2 The Directional Derivative
The notation for the directional derivative is \(\nabla_u f(\mathbf{x})\), or \(D_u f(\mathbf{x})\), but here we will use the former. Its definition is given by: \[\nabla_u f(\mathbf{x}) = \lim\limits_{s \to 0}\frac{f(\mathbf{x}+\mathbf{u}s)-f(\mathbf{x})}{s}, \label{eq:directional_derivative}\] Here, \(\mathbf{u}\) is a unit vector, meaning its magnitude (or length) is 1. (Recall that the magnitude of a vector is the square root of the sum of the squares of its individual components, and it is denoted by two vertical bars around the vector, although in some references just one pair of vertical bars is used. \[||\mathbf{a}|| = \sqrt{{a_1}^2 + ..{a_n}^2}\] with \(\mathbf{a} \in \mathbb{R}^n\).) You can see that the definition of the directional derivative is very similar to the univariate case. (I deliberately chose to use the variable \(s\) instead of \(h\) to make the distinction from the univariate case clearer.)
However, in the case of the directional derivative, the input to \(f\) is \(\mathbf{x}+s\mathbf{u}\), which represents a line equation. You start from the vector \(\textbf{x}\) und you move \(s\) units in the direction of \(\textbf{u}\). Thus, the above expression examines how the function changes as we move an infinitesimal amount into the direction of \(\textbf{u}\). In other words: the directional derivative represents the rate of change of the function at a given point when the point is moved a tiny bit (an infinitesimal amount) in the direction of \(\mathbf{u}\), As mentioned, \(\mathbf{u}\) serves as a directional vector, and this is reflected in the notation \(\nabla_u f(\mathbf{x})\), where \(\mathbf{u}\) appears as a subscript. For a better understanding of the above let’s have a look at figure 1. We see a two-dimensional function and the red dot in the plane is a random but specific point. Its function value is marked by the blue dot. Now, in the case of a univariate function, we have only one single variable, e.g. \(x\) that is input to the function and we can only increase or decrease it. When computing the derivative of a univariate function we imagine computing the rise over the run when the change in \(x\) is infinitesimally small. But in the case of functions that take more than one input variable, e.g. \(x_1\) and \(x_2\), we can change each of these. Maybe we want to go one step in direction \(x_1\) and two steps in direction \(x_2\). So, if we want to make a statement about the change of the value of the function, we must first specify in which direction we want to move. This is shown in the figure as the green vector. It corresponds to the directional vector \(\mathbf{u}\), only that it is not normalized, so actually the green vector can be interpreted as \(s\mathbf{u}\). The scalar \(s\) tells how far we want to move in the given direction, e.g. 2 times the length and direction of \(\mathbf{u}\), hence you can also interpret \(s\) as a scaling factor. Now, that we have set a direction the entire idea is analogous to the concept of the derivative of the univariate case. The directional derivative tells us how the function value changes when we move a tiny bit in the given direction. Imagine the red vector is very, very small, (because \(s\) is approaching 0), then it corresponds to the tangent line of the function at the specific point. The slope of this tangent line represents the directional derivative. To make this more concrete, imagine you are standing blindfolded on a mountain (the function) and your goal is to avoid falling off a cliff. You could use one leg or a stick and carefully feel around you, testing the terrain to determine in which direction (the directional vector \(\mathbf{u}\)) it is safe to move.
2.3 The Relationship between Gradient and Directional Derivative
Let’s quickly look at the differences between the gradient and the directional derivative. While the gradient is a vector that indicates the direction in which the rate of change is maximal, the directional derivative is a scalar that tells us the rate of change of the function when moving in a specific direction. If that direction happens to be the direction of steepest ascent, the directional derivative coincides with the magnitude of the gradient, as we will demonstrate shortly in section 4. Also, the directional derivative equals the dot product of the gradient and the directional vector, which we will also show in section 4 The differences and relationship between the gradient and the directional derivative are summarized in below Table . .

3 How to compute the directional derivative?
In the following we will show – and this is the main point of this article – that: \[\colorbox{yellow}{$ g'(0)=\nabla_u f(\mathbf{x}) = \nabla f(\mathbf{x})\mathbf{u}$}\] But let’s start slowly.
3.1 Limit Definition and Derivative
We have now stated many times (repetitions were intended) that the directional derivative is the rate of change of the function at a point \(\mathbf{x}\) when moving an infinitesimal amount into the direction of \(\mathbf{u}\). We have shown how the directional derivative was defined as a limit in equation [eq:directional_derivative]. The limit interpretation means that we take two points on the surface of \(f\) (as indicated in Figure 1). One point is moved closer and closer to the point of interest and based on that we learn the rate of change of that point. Of course, for a multivariable function this only makes sense if we move the point along the line given by \(s\mathbf{u}\). But we can also interpret this “rate of change of the function at a point when moving an infinitesimal amount into the direction of \(\mathbf{u}\)” differently: Since \(\mathbf{x}\) and \(\mathbf{u}\) are fixed, only \(s\) is a free variable. Therefore, we can interpret \(\mathbf{x}+s\mathbf{u}\) as a line equation and \(f\) as a function that outputs a function value for each value of \(s\). Have a look at Figure 2. Here we see a zoomed-in section of the previous plot. In addition, I marked a few points along the line \(s\mathbf{u}\) and I marked the corresponding function values as dots, too. So you can clearly see that we have a function of a line, parametrized by \(s\). The derivative of this function is – per definition – the rate of change of this function at each point. Take note: the rate of change of the point where \(s=0\) is the “rate of change of the function at a point when moving an infinitesimal amount in the direction of \(\mathbf{u}\)“. Therefore, it is the same as the directional derivative.
This was a very wordy explanation. Next, we do the same reasoning but with math.
3.2 Part 1: Showing that \(g'(0)=\nabla_u f(\mathbf{x})\)
I have previously stated “we can interpret \(\mathbf{x}+s\mathbf{u}\) as a line equation and \(f\) as a function that outputs a function value for each value of \(s\).” To make this easier to understand, we give this alternative interpretation a new name, so that we can distinguish it from the function \(f(\mathbf{x})\). Therefore, we define: \[g(s)=f(\mathbf{x}+s\mathbf{u})\] I’ve also previously stated that the derivative of this function at \(s=0\) is the same as the directional derivative. Thus, what we need to show is: \[\colorbox{yellow}{$ g'(0)=\nabla_u f(\mathbf{x}) $}\]
To show that this statement is true, we take the derivative of our new function \(g\) with respect to \(s\) according to the definition of the derivative of a univariate function. \[g'(s) = \lim\limits_{h \to 0}\frac{g(s+h)-g(s)}{h}\] Next, we plug in 0 for \(s\), because we are looking for \(g'(0)\). \[g'(0) = \lim\limits_{h \to 0}\frac{g(0+h)-g(0)}{h} = \lim\limits_{h \to 0}\frac{g(h)-g(0)}{h}\] Now we substitute the definition of \(g\) (i.e. \(g(s)=f(\mathbf{x}+s\mathbf{u})\)). We must be careful not to confuse the symbols here; \(g\) is a function that takes the parameter \(s\). When we write \(g(h)\), it means that the parameter \(s\) takes the specific value of \(h\), thus \(g(s=h)\). Therefore, we can substitute \(h\) for \(s\) in the definition: \(g(s=h)=f(\mathbf{x}+h\mathbf{u})\). Similarly, for \(g(0)\)=\(f(\mathbf{x}+0\mathbf{u}) = f(\mathbf{x})\). Thus, we have: \[g'(0)= \lim\limits_{h \to 0}\frac{g(0+h)-g(0)}{h} = \lim\limits_{h \to 0}\frac{g(h)-g(0)}{h} = \lim\limits_{h \to 0}\frac{f(\mathbf{x}+h\mathbf{u})-f(\mathbf{x})}{h}\] This expression is just the definition of the directional derivative, with the difference that the variable that we take the limit of is called \(h\) instead of \(s\). However, the variable name can be anything. We can rename it to \(s\), leading us to: \[g'(0)= \lim\limits_{h \to 0}\frac{f(\mathbf{x}+h\mathbf{u})-f(\mathbf{x})}{h} = \lim\limits_{s \to 0}\frac{f(\mathbf{x}+s\mathbf{u})-f(\mathbf{x})}{s} = \nabla_u f(x)\] Which is what we wanted to show: \[g'(0)=\nabla_u f(\mathbf{x})\]
3.3 Part 2: Showing that \(\nabla_u f(\mathbf{x}) = \nabla f(\mathbf{x})\mathbf{u}\)
Now, we will show that the second part of the statement is true, i.e. \(\nabla_u f(\mathbf{x}) = \nabla f(\mathbf{x})\mathbf{u}\). In other words, the directional derivative is the dot product of the gradient and the directional vector. We will reuse our new function \(g(s)\). What we want to show is: \[\colorbox{yellow}{$\nabla_u f(\mathbf{x}) = \nabla f(\mathbf{x})\mathbf{u}$}\] Let’s do it: \[ \begin{align} g(s) &= f (\mathbf{x} + s\mathbf{u}) \tag{15}\label{eq:15} \\ g'(s) &= \frac{d}{d(\mathbf{x} + s\mathbf{u})} f (\mathbf{x} + s\mathbf{u}) \frac{d}{ds}(\mathbf{x} + s\mathbf{u}) \tag{16}\label{eq:16} \\ g'(s) &= \frac{d}{d(\mathbf{x} + s\mathbf{u})} f (\mathbf{x} + s\mathbf{u})\mathbf{u} \tag{17}\label{eq:17} \\ g'(0) &= \frac{d}{d(\mathbf{x} + 0\mathbf{u})} f (\mathbf{x} + 0\mathbf{u})\mathbf{u} \tag{18}\label{eq:18} \\ g'(0) &= \frac{d}{d(\mathbf{x})} f (\mathbf{x})\mathbf{u} \tag{19}\label{eq:19} \\ g'(0) &= \nabla f (\mathbf{x})\mathbf{u} \tag{20}\label{eq:20} \\ g'(0) &= \nabla_u f (\mathbf{x}) = \nabla f (\mathbf{x})\mathbf{u} \tag{21}\label{eq:21} \end{align} \]
