The exponential thread - Why e is natural

In Part 1 we put $e \approx 2.71828 \dots$ on stage and called it a special constant. We did not say why.

This part does. The question is direct. Out of every possible base for an exponential, why does this strange irrational number get the name "natural"? Why not $2$ ? Why not $10$ ? What does $e$ have that they don't?

The answer is a property so simple it sounds like cheating: $e^{x}$ is the only exponential function whose slope at every point equals its height at that point. Every other base $a$ produces a function $a^{x}$ whose slope is some constant multiple of its height. Only at $a = e$ does that constant equal exactly $1$ .

The rest of this post unpacks that property and shows why it is not an aesthetic preference but a structural fact about calculus itself. $e^{x}$ is the calibration of the family of exponentials at which the calculus identities lose their correction factors. Every other $a^{x}$ turns out to be a stretched copy of $e^{x}$ , and the natural logarithm exists exactly to record how much each exponential differs from this calibration.

Once we see this, an entire family of physical and computational phenomena lights up. Anything that grows or decays at a rate proportional to how much of it there currently is, is $e^{x}$ in disguise. Carbon dating, capacitor charging, learning-rate schedules, softmax attention, radioactive half-life, drug clearance from the bloodstream: all reach for $e^{x}$ for the same reason.

The family of exponentials

Look at $a^{x}$ for a few different bases on the same axes. The curves share a strong family resemblance. They all pass through the point $(0, 1)$ , since $a^{0} = 1$ for every base. They all stay positive forever. They all grow without bound for $a > 1$ . From far away, you cannot easily tell them apart.

Look closer. Each curve leaves $(0, 1)$ at a different angle. The function $2^{x}$ rises gently. $1 0^{x}$ shoots up almost vertically. Somewhere between them, $e^{x}$ rises at exactly slope $1$ . The whole difference between the curves is a single number: the slope at the origin.

yellow: 2^x (slope 0.693)·green: e^x (slope 1.000)·red: 10^x (slope 2.303)

Solid curves: a^x. Dashed lines: tangents at (0, 1).

So far this is just an observation. There is no obvious reason to prefer one slope over another. The slope $0.693$ from $2^{x}$ is no better or worse than the slope $2.303$ from $1 0^{x}$ . They are different points on a continuum, and any of them could be called "the natural choice" depending on what we are looking for.

The reason $e$ stands out becomes visible only when we ask what happens to the slope away from the origin. That is the question of the next section.

Differentiating an exponential

The argument below uses the slope of an exponential function at a point, made precise. If you have worked with derivatives before, skip the box. If not, the takeaway is: the derivative of a function $f$ at a point $x$ is the slope of the curve $y = f (x)$ at that one point, found by zooming in until the curve looks like a straight line and reading off the slope of that line.

▸ Show the derivative in three steps

The derivative is the slope of a curve at one specific point, made precise. Three steps build it.

Step 1: lines. A straight line has a constant slope, defined as rise over run. Pick any two points $(x_{1}, y_{1})$ and $(x_{2}, y_{2})$ on the line. The slope is

slope = \frac{y _{2} - y _{1}}{x _{2} - x _{1}}

This is what you would read off the side of a road sign telling you how steep a hill is. The number does not depend on which two points you picked, because every point on a straight line is equally tilted.

Step 2: curves complicate things. A curve does not have one slope. It has a different slope at every point. The function $y = x^{2}$ is shallow near $x = 0$ and steep out at $x = 5$ . The function $y = sin (x)$ is rising at $x = 0$ and falling at $x = π$ . To talk about "the slope of a curve" you have to specify which point.

Step 3: zoom in until the curve looks straight. Pick a point $x$ on the curve and a second point a small distance $h$ away. Compute the slope as if those two points were the endpoints of a line:

\frac{f ( x + h ) - f ( x )}{h}

This is an approximation, since the actual curve between the two points is not really straight. But as $h$ gets smaller, the approximation gets better, because over a smaller interval the curve looks more and more like its tangent line. The derivative of $f$ at $x$ is the value this expression converges to in the limit:

f^{'} (x) = h \to 0 lim \frac{f ( x + h ) - f ( x )}{h}

Two pieces of notation are interchangeable. $f^{'} (x)$ is read "f prime of x". The Leibniz form $\frac{df}{d x}$ is read "d f d x" and is meant to evoke the picture of a tiny rise $df$ divided by a tiny run $d x$ . They are different symbols for the same object.

Pick any base $a$ and compute the derivative of $a^{x}$ from the limit definition:

\frac{d}{d x} a^{x} = h \to 0 lim \frac{a ^{x + h} - a ^{x}}{h} = h \to 0 lim \frac{a ^{x} \cdot a ^{h} - a ^{x}}{h} = a^{x} \cdot h \to 0 lim \frac{a ^{h} - 1}{h}

The factor $a^{x}$ pulled out of the limit because it does not depend on $h$ . The limit on the right depends only on $a$ .

Look at this limit before going further. $lim_{h \to 0} (a^{h} - 1) / h$ is not a function of $h$ . The variable $h$ is bound by the limit operator, the way the index $i$ is bound by the summation symbol in $\sum_{i = 1}^{n} a_{i}$ . The whole expression evaluates to a single number for each fixed base $a$ , the value the ratio settles into as $h$ shrinks toward zero.

Plug in concrete values for $a = 2$ :

$h$	$(2^{h} - 1) / h$
$0.1$	$0.7177$
$0.01$	$0.6956$
$0.001$	$0.6934$
$0.0001$	$0.6932$

The values converge to $0.6931 \dots$ , which is exactly the slope-at-origin entry for $a = 2$ in the table from the previous section. The same exercise for $a = 10$ would converge to $2.303 \dots$ .

This match is not a coincidence. Set $x = 0$ in the derivative formula. The left-hand side becomes the derivative of $a^{x}$ at $0$ , which is the slope at the origin by definition. The right-hand side becomes $a^{0} \cdot lim_{h \to 0} (a^{h} - 1) / h = lim_{h \to 0} (a^{h} - 1) / h$ , since $a^{0} = 1$ . So the slope at the origin and the limit are the same number: two names for the same thing.

The structural argument identifies the limit but does not compute its actual value. The box below builds intuition for what "computing a limit" means in general and explains why our particular limit needs more machinery than algebra to resolve in closed form. The closed-form answer turns out to be $ln a$ , the natural logarithm of the base, with the derivation saved for Part 4. For $a = 2$ this is $ln 2 \approx 0.6931$ , matching the convergence table above. We return to $ln a$ in the next section.

▸ Show what 'resolving' a limit actually means

Most limits you meet do not need any tricks. If a function is continuous at the target point (no jumps, no kinks, no asymptotes), the limit is just the function's value there:

x \to 2 lim x^{2} = 2^{2} = 4

h \to 0 lim (1 + h) = 1 + 0 = 1

You "resolve" these by plugging in. They are not the interesting case.

The trouble starts when plugging in produces something undefined. Consider:

x \to 2 lim \frac{x ^{2} - 4}{x - 2}

At $x = 2$ both the numerator $x^{2} - 4$ and the denominator $x - 2$ are $0$ . The expression $\frac{0}{0}$ is not defined. But the limit is not $\frac{0}{0}$ ; that is just the form the expression takes at the target point, an indeterminate form that says nothing about the actual value the function approaches. To find the value, you have to do real work. In this case, factor the numerator as $(x - 2) (x + 2)$ and cancel the $x - 2$ with the denominator:

x \to 2 lim \frac{( x - 2 ) ( x + 2 )}{x - 2} = x \to 2 lim (x + 2) = 4

The form was an indeterminate $\frac{0}{0}$ , but the value is a clean $4$ . The factoring trick was the resolution.

Our limit $lim_{h \to 0} (a^{h} - 1) / h$ is the same kind of object: an indeterminate $\frac{0}{0}$ , since both the numerator $a^{h} - 1$ and the denominator $h$ vanish at $h = 0$ . The value exists, and we have established it numerically (the table for $a = 2$ converges cleanly to $0.6931 \dots$ ). What it does not allow is the kind of cheap algebraic resolution we just used for $(x^{2} - 4) / (x - 2)$ : there is no factoring or cancellation that removes the indeterminacy. The function $a^{h} - 1$ is not a polynomial in $h$ , so the trick does not apply.

The proper resolution requires a tool that approximates $a^{h}$ near $h = 0$ in a more subtle way. That tool exists, and we will develop it from scratch in Part 4. For now, the structural argument above is enough. It tells us what the limit is (the slope of $a^{x}$ at the origin), even without giving us its closed form. The closed form, when Part 4 delivers it, will turn out to be $ln a$ .

Substitute back, and the formula reads:

\frac{d}{d x} a^{x} = (slope of a^{x} at x = 0) \cdot a^{x}

Read this carefully. Every time you differentiate an exponential, you get back the same exponential, multiplied by a fixed constant. That constant is the slope at the origin.

Differentiate $2^{x}$ once and you get $0.693 \cdot 2^{x}$ . Differentiate again and you get $0.69 3^{2} \cdot 2^{x}$ . Take the third derivative and you get $0.69 3^{3} \cdot 2^{x}$ . The factor $0.693$ keeps showing up in every formula touching $2^{x}$ .

For $1 0^{x}$ , the same story plays out with $2.303$ in place of $0.693$ .

For most bases, the multiplier is some awkward irrational number. It gets baked into every derivative, and into the more elaborate formulas we will meet later in this series.

Among all bases, exactly one gives a multiplier of $1$ .

$e$ is the unique base for which $\frac{d}{d x} e^{x} = e^{x}$ .

Differentiate $e^{x}$ and the multiplier is $1$ , so the derivative is just $e^{x}$ itself. Differentiate again: still $e^{x}$ . Compute any higher derivative: it is $e^{x}$ all the way down. No multiplicative correction ever appears. The same cleanness extends to other operations on $e^{x}$ that we will meet in later parts. The differential equation $y^{'} = y$ has $e^{x}$ as its uncluttered solution.

The interactive below lets you find that base by hand. Drag $a$ until the slope at the origin reads $1.00$ . The tangent line will turn green, and the slider will be sitting at $a = e \approx 2.718$ .

a2.000

blue: a^x·yellow/green: tangent at (0, 1)·green dot: (0, 1)slope at origin = 0.69

Every exponential is a stretched e^x

We just saw that $e^{x}$ is structurally distinguished as the only exponential whose derivative equals itself, with no multiplicative correction. The next claim is stronger.

Once you have $e^{x}$ , you do not need any other exponential function. Every $a^{x}$ can be rewritten as $e^{x}$ with the input axis stretched by a single factor. The identity is

a^{x} = e^{x l n a}

where $ln a$ is, by definition, the multiplier from the previous section: the slope of $a^{x}$ at the origin. (It is also the inverse function of $e^{x}$ , so $ln (e^{x}) = x$ . The two characterizations agree because of how $e$ was defined: the identity $ln e = 1$ is the same statement as " $e^{x}$ has slope $1$ at the origin".)

Read the identity carefully. Here is why it holds.

The natural log $ln a$ is defined as the answer to one specific question: "what power do I raise $e$ to in order to get $a$ ?" In symbols, $ln a$ is the number $b$ for which $e^{b} = a$ . A few concrete examples:

$ln 10 \approx 2.303$ , since $e^{2.303} \approx 10$ .
$ln 2 \approx 0.693$ , since $e^{0.693} \approx 2$ .
$ln e = 1$ , since $e^{1} = e$ .

So any positive number $a$ can be written as $a = e^{l n a}$ .

Now substitute that into $a^{x}$ :

a^{x} = (e^{l n a})^{x} = e^{x l n a}

The last step is the familiar exponent rule $(p^{q})^{x} = p^{q x}$ : when you raise a power to another power, the exponents multiply.

The intuition is unit conversion. Read $a^{x}$ as "multiply by $a$ a total of $x$ times". Each one of those multiplications can be replaced by $ln a$ multiplications by $e$ instead, since $a$ itself is $e^{l n a}$ . So $x$ multiplications by $a$ does the same work as $x \cdot ln a$ multiplications by $e$ . The natural log is the conversion rate between the two ways of counting.

For $a = 10$ : $ln 10 \approx 2.303$ , so one multiplication by $10$ equals $2.303$ multiplications by $e$ .

For $a = 2$ : $ln 2 \approx 0.693$ , so one doubling equals $0.693$ multiplications by $e$ .

For $a = e$ : $ln e = 1$ , so nothing to convert.

The picture is direct. Start with the graph of $y = e^{t}$ . To read the same curve as $y = a^{x}$ , relabel the horizontal axis: put a new tick every $ln a$ units along the $t$ -axis, and call those ticks " $x = 1, 2, 3, \dots$ ". The curve itself does not change, only the labels do. With $a = 10$ , the new ticks land far apart on the $t$ -axis (every $2.303$ units): axis compressed. With $a = 2$ , the new ticks land closer together (every $0.693$ units): axis stretched. With $a = e$ , the new ticks coincide with the original $t$ -ticks exactly: no change at all.

a2.00

ln a = 0.693·axis stretched

Blue curve: y = e^t. The fixed reference. It does not change when you move the slider.

Colored curve: y = a^t, where a is the slider value. It rises slower than the blue curve when a is smaller than e, faster when a is larger. Color follows: yellow when a < e, green when a = e, red when a > e.

Tick marks below the t-axis: the spots on the t-axis where the labels 1, 2, 3 would sit if you relabeled the horizontal axis using base a instead of base e. They squeeze together when a < e, spread apart when a > e, and line up exactly with the original t-ticks when a = e.

The function $1 0^{x}$ is $e^{x}$ with the input multiplied by $ln 10 \approx 2.303$ . The function $2^{x}$ is $e^{x}$ with the input multiplied by $ln 2 \approx 0.693$ . The function $e^{x}$ is $e^{x}$ unchanged.

You might wonder if this conversion is special to $e$ at all. It is not. The same identity holds for any positive base:

a^{x} = b^{x l o g_{b} a}

for any $b > 0$ . We could write $a^{x}$ in base $2$ , base $10$ , or any other base just as well. So why does the article always reach for $e$ and $ln$ ? Because the calculus singles them out, not the algebra. The derivative of $b^{x}$ is

\frac{d}{d x} b^{x} = b^{x} ln b

with the natural log appearing on the right regardless of the base $b$ . Only $e$ makes this multiplier $1$ , since $ln e = 1$ . Even if we stubbornly used base $2$ in the conversion, the derivative would still drag $ln$ back in. Starting from $a^{x} = 2^{x l o g_{2} a}$ , the derivative works out to $(lo g_{2} a) (ln 2) \cdot a^{x}$ . Now $lo g_{2} a = ln a / ln 2$ by change of base, so $(lo g_{2} a) (ln 2) = ln a$ , and the derivative simplifies to $a^{x} ln a$ . We get the same answer either way. The natural log is forced out by the calculus, not chosen by us. $e$ is the unique calibration where the algebraic conversion factor and the calculus conversion factor are the same.

So the family of exponentials we plotted at the start is not really a family. It is one function, $e^{x}$ , viewed at different timescales. The "different bases" we drew side by side are the same curve compressed or stretched along the horizontal axis.

This is the structural reason $e$ is the natural base. It is not one exponential among many. It is the exponential that the others are derived from. The natural log is the function that records how each $a^{x}$ relates back to $e^{x}$ , and the relationship $ln e = 1$ pins $e$ to the place where the relationship is the identity.

When mathematicians say $e$ is the "natural" base, this is what they mean. $e$ is the unique base at which the derivative of the exponential equals the exponential itself, with no extra factor anywhere. The same cleanness extends to other operations on the exponential and its inverse that we will meet in later parts. Pick any other base and a constant appears in every formula.

The behavioral consequence of all this is one differential equation:

y^{'} = y

The rate of change of $y$ equals $y$ . Its solutions are exactly the functions $y (t) = C e^{t}$ , where $C$ is whatever value $y$ took at $t = 0$ . The generalization

y^{'} = k y

has solutions $y (t) = C e^{k t}$ . Positive $k$ gives exponential growth. Negative $k$ gives exponential decay. The whole world of "rate of change is proportional to size" lives in this one equation, which is why $e$ shows up everywhere it does.

Where e shows up

Here is a tour of the same equation in different domains. Each one has the same underlying pattern: rate of change proportional to current size.

Radioactive decay. In a sample of carbon- $14$ , every atom has the same fixed chance of decaying in any given second. So if you have $N$ atoms now, the rate of decay is proportional to $N$ , and the amount remaining drops exponentially over time. The half-life of carbon- $14$ is about $5, 730$ years. Carbon dating uses this: a sample with half its expected carbon- $14$ stopped exchanging carbon with the air about that long ago.

Capacitor charging. When a battery charges a capacitor through a resistor, the voltage rises quickly at first and slows as the capacitor fills up. The closer it is to "full", the smaller the remaining gap, and the slower the charging. The result is the curve $V (t) = V_{battery} (1 - e^{- t / R C})$ , where $R C$ is the time constant that sets how fast the charging goes. The same shape lives inside every analog filter and audio compressor.

Newton's law of cooling. A hot cup of coffee in a cool room cools off at a rate that depends on the temperature gap. The hotter the coffee is relative to the room, the faster it loses heat. The gap between coffee and room shrinks exponentially: starting at $80°$ C in a $20°$ C room, the temperature follows $T (t) = 20 + 60 e^{- k t}$ , approaching $20°$ C as the gap closes.

Drug clearance. The body removes most drugs from the bloodstream at a rate proportional to the current concentration. So the concentration decays exponentially, just like radioactive material. Dosing schedules are designed around the drug's half-life: take it every half-life, and the residual concentration in the blood stays roughly steady between doses.

Reverb tail of a struck bell. A bell, a guitar string, or a reverberating room loses energy at a rate proportional to how much energy is left, since the loss is dominated by friction and damping. The amplitude therefore decays exponentially. Audio engineers describe a room's reverb by its RT60, the time it takes for the sound level to drop by $60$ dB.

Softmax in machine learning. Neural networks turn a list of scores into a probability distribution by exponentiating each score and dividing by the sum:

softmax (x)_{i} = \frac{e ^{x_{i}}}{\sum _{j} e ^{x_{j}}}

The exponential is what makes the math behind training come out clean. The derivative of $e^{x}$ is $e^{x}$ itself, so when the network's loss function is differentiated, no awkward extra factors clutter the formulas. Every transformer model runs softmax billions of times per training step.

Reinforcement learning's discount factor. An agent collecting rewards over time treats future rewards as worth less than current ones. The discount is exponential: a reward $t$ steps in the future is worth $e^{- δ t}$ of an immediate reward. Same pattern as continuous compound interest, run in reverse.

Diffusion models. Modern image generators work by gradually corrupting a training image with random noise, then teaching a model to undo the corruption step by step. The noise is added on a schedule that lets the original signal fade exponentially toward pure noise. The exponential fade is what makes the reverse process tractable.

All of the above is the real-valued story of $e$ : growth and decay, with rate proportional to current size. There is also an imaginary-valued story, where the input to $e^{x}$ is an imaginary number rather than a real one. That story turns out to be about oscillation, the unit circle, and the surprise return of $sin$ and $cos$ . We will get there in Part 5.

A footnote on compound interest

There is a classical route to $e$ that does not go through derivatives at all. If you compound $100%$ interest more and more frequently, yearly to monthly to daily to every microsecond, the resulting amount converges:

N \to \infty lim (1 + \frac{1}{N})^{N} = e

This is one of the most-quoted ways to introduce the number, and Bernoulli stumbled into it in $1683$ looking at compound interest. It is true and charming, but it does not, on its own, explain why $e$ is structurally distinguished. The compounding limit produces $e$ because compounding is itself a discrete approximation to the differential equation $y^{'} = y$ we already met. Money is just one place to encounter the equation.

What's next

Part 3 takes the next step on the climb. Instead of a number that grows, we look at a point that moves on a circle. The functions $sin$ and $cos$ track the coordinates of that point, and from the geometry alone the entire trigonometric arsenal (the sum-of-angles formulas, the double-angle identities, the Pythagorean identity) falls out. None of it will look anything like $e^{x}$ .

Then in Part 5, the connection arrives. When we plug an imaginary number into the Taylor series of $e^{x}$ , $sin$ and $cos$ appear in the result as the real and imaginary parts. The unit circle and the exponential are not separate stories.