Yurii Shevchuk

Knuth's numbers and impractical memory devices

I’ve encountered Knuth’s numbers in the “Concrete Mathematics: A Foundation for Computer Science” book by Ronald Graham, Donald Knuth, and Oren Patashnik. In the book, the numbers are defined by the following recurrence relation:

\[\begin{align} K_{n+1} = 1 + \min \left(2K_{\left\lfloor{\frac{n}{2}}\right\rfloor}, 3K_{\left\lfloor{\frac{n}{3}}\right\rfloor}\right) \end{align}\]

with \(K_0=1\).

The sequence of these numbers can be found on OEIS A007448 and starts with the following terms

1, 3, 3, 4, 7, 7, 7, 9, 9, 10, 13, 13, 13, 15, 15, 19, 19, 19, 19, 21, 21, 22,
27, 27, 27, 27, 27, 28, 31, 31, 31, 39, 39, 39, 39, 39, 39, 39, 39, 40, 43, 43,
43, 45, 45, 46, 55, 55, 55, 55, 55, 55, 55, 55, 55, 57, 57, 58, 63, 63, 63, 63,
63, 64, 67, 67, 67, 79, 79, 79, 79, ...

Although the equation might look boring at first glance, I think its sequence is far more interesting. From the few first terms generated by the sequence, it’s not difficult to spot a few properties that can be proven individually, but I believe there is a better way to understand the sequence. Furthermore, shifting perspectives can allow us to prove some properties and discover new patterns.

In this article, I will show that many of the properties can be understood by imagining a rather impractical memory device that can be directly connected to Knuth’s numbers. In addition, the recursive construction process of the numbers can tell us exactly how to create such a device.

Memory devices

Before we get back to Knuth’s numbers, it might be important to start with an overview of the simplest memory devices and their construction process from a mathematical perspective. However, we’re only interested in theoretical devices in the article, so we should be encouraged to ignore the complexities introduced by actual engineering.

In the remainder of the article, we will focus on the two most crucial properties of the memory devices. Specifically, any memory device considered should be able to have a specific state and allow an external device to retrieve it. Naturally, the most straightforward memory device should be able to store two states represented by two numbers, namely, 0 and 1. Such a device consists of a single binary cell and can be represented in the following way.


Multiple binary cells can be stacked together to create a memory device that can store more than two different states. For example, four binary cells can form the following device

2 2 2 2

Each cell can have precisely two states, and a specific combination of 0s and 1s across all cells can represent a unique state. Since there are four cells, it means that overall such a memory device can have up to \(2^4=16\) states. A bit more generally, if a device consists of \(N\) binary cells, then it will be able to have up to \(2^N\) states.

The number of states can be increased for the fixed number of cells if we replace each binary cell with a cell with more than two states. So, for example, we can replace each binary cell with a trinary cell, and if a device has four cells in it, then it should be able to have up to \(3^4=81\) states.

3 3 3 3

We can even start mixing binary and trinary cells.

2 3 2 2 3

The device represented by the scheme above should be able to store up to \(2^3 3^2 = 72\) states, and generally, we can say that if the device consists of \(L\) binary cells and \(M\) trinary cells, then it should be able to store up to \(2^L 3^M\) states. Notice that the formula suggests that the actual order between binary and trinary cells doesn’t affect the maximum number of states such a device can have. Alternatively, we can say that each device with mixed binary and trinary cells can be represented as one binary memory device (a device that consists of only binary memory cells) and one trinary memory device.

Notice that up until this point, we always estimated the upper bounds on the number of states that each of the memory devices would be able to have. In fact, the estimates are exact if we think each device represents a digit of the number encoded in binary/ternary base or mixed radix. In other words, the relation between cells is multiplicative with respect to the memory’s size. For example, in binary memory devices, the total number of possible states can be obtained by multiplying the memory sizes of all individual cells. Multiplicative relations between cells make them extremely practical in real-world applications since memory size scales exponentially with respect to the number of cells.

As opposed to multiplicative memory devices, we can create additive memory devices. Unlike in multiplicative memory devices, memory size changes additively with each additional cell. For example, we can create a memory device with four distinct states; if we add one more binary cell to it, the device will be able to store five distinct states. Although the device doesn’t utilise cells efficiently, it has useful properties. Such a device can be designed to have any desired number of distinct states, whereas, with binary memory devices, it’s only possible if the number happens to be a power of two. Additive memory devices are easy to design (but perhaps difficult to build in practice). If one wants a memory unit that stores exactly \(N\) distinct states, then we will need \(N-1\) binary cells to store it. If each cell stores 0 or 1, we just have to count the number of cells that were set to 1, and the number will represent the state in which the device is. It’s easy to see that such a device will be able to store exactly \(N\) distinct states.

Everything described above will be needed to understand how Knuth’s number can be related to memory devices. As we will see in the next section, such a device combines multiplicative and additive memory devices into a single device that uses both properties.

Mixing multiplicative and additive memory devices

Adjusted Knuth’s numbers

Before we continue, it will be essential to add a minor adjustment to the sequence, which will not affect the properties of Knuth’s numbers. Specifically, it will be helpful to subtract one from each of the numbers in the sequence.

\[P_{n} = K_{n} - 1\]

or alternatively, we can write it in the following way

\[P_{n} = \min \left(2P_{\left\lfloor{\frac{n-1}{2}}\right\rfloor} + 2, 3P_{\left\lfloor{\frac{n-1}{3}}\right\rfloor} + 3\right)\]

We will continue exploring properties of the “new” sequence, but the same properties will also be valid for Knuth’s numbers.

Adjusted Knuth’s number and memory device

To understand better the connection between the sequence and memory devices, I believe it might be helpful to unroll the recursions and look through a couple of examples.

\[\begin{align} P_{11} &= 3 \cdot 3 + 3 = 12 \\ P_{18} &= 2 \cdot 2 \cdot 3 + 2 \cdot 2 + 2 = 18 \\ P_{24} &= 2 \cdot 3 \cdot 3 + 2 \cdot 3 + 2 = 26 \\ P_{37} &= 2 \cdot 2 \cdot 2 \cdot 3 + 2 \cdot 2 \cdot 2 + 2 \cdot 2 + 2 = 38 \end{align}\]

Notice that in the formulas above, each subsequent addend is 2 or 3 times less than the previous one. It happens because of how numbers are constructed, which is easily seen from the \(P_n\)’s recursive definition. We can rearrange numbers in the 2D grid. For example, \(P_{24}\) can be written in the following way

2 3 3
2 3

If you compare the grid to the \(P_{24}\) formula, you can notice that each addend occupies a separate row, meaning that the relation between numbers within a row is multiplicative, and the relation between rows is additive. Perhaps it can remind you of the discussion in the previous section. Unlike the examples in the previous section, the memory device defined by the grid above is a mixture of multiplicative and additive memory devices. Because of how the grid has been constructed, the same construction process can be applied for any \(P_n\). Each memory device will be arranged in the triangular grid with the following properties.

  1. Each row has one less memory cell compared to the row above it.
  2. Each column contains binary or trinary memory cells and never a mixture.
  3. Last row has only one cell.

Although we can design a memory device for each \(P_n\), it is still not clear what is the relation between the device and number \(n\). We can understand the primary connection from the following inequality \(P_n \ge n \) (See Proof 1). In other words, equality suggests that memory devices with memory capacity \(P_n\) can always have at least \(n\) states. But it’s also evident that there are infinitely many memory devices of this type that can have at least \(n\) states, making the property a bit less surprising. What’s interesting is that \(P_n\) is the memory size of the most optimal memory device in the sense that there are no smaller memory devices that follow the previously defined construction and can have at least \(n\) states (See Proof 2).

These two properties are essential since they indicate that if we want to construct a memory device following our construction process, then for each \(n\), we can find how grid cells of the optimal memory device have to be rearranged. To better illustrate the point, we can return to the example \(P_{24}\).

\[P_{24} = 2 \cdot 3 \cdot 3 + 2 \cdot 3 + 2 = 26\]

Notice that the first addend contains all of the most critical information. The recursion has \(\min\) function that can be thought of as a split point in the binary tree, and within each step, we either take a left path or the right path. The sequence’s first addend \(2 \cdot 3 \cdot 3\) says that recursion goes first through the left branch and then twice through the right branch (it’s essential that the order between numbers is preserved when recursion is being unrolled). This gives us the first row in the memory device grid

2 3 3

And all of the subsequent rows can be reconstructed from the first row. We only have to copy the bottom-most row, remove the last most cell and add it below the previous row. The process should be repeated until we get a row with a single cell.

2 3 3
2 3

Sorting out details

Some readers might have noticed a difficulty associated with the additive property between rows. If you try making such a device and adding states across all rows, you might notice that the memory size is smaller than the value of the \(P_n\). It’s probably easier to see from the example. Let’s say that we want to store six states, and the recursive formula tells us that \(P_6 = 2 \cdot 2

2 2

The first row can store states 0, 1, 2, and 3, and the second can only store 0 and 1. If we store these numbers as we do in binary memory devices and add them together, we can only get five different numbers (0, 1, 2, 3, 4) rather than six. It creates a problem, but luckily, with slight modifications to how cells in the grid interact, we can make sure that exactly \(P_n\) distinct numbers can be stored.

To show how the problem can be resolved, we can imagine that we have a simple computer program that reads a number from memory, increments it by 1, and writes it back into the memory. We can execute the program for \(P_n - 1\) iterations and show that with each iteration, the memory device will have a unique state.

  1. We start with a memory where all cells are equal to zero, the first state.
  2. We create a pointer that points to a row in the memory device grid and set it to 1 (first row)
  3. We keep incrementing the value in the row with the pointer on it (increment and go to Step 3) until the value cannot be incremented further, in which case we move to the Step 4.
  4. There are two possible outcomes
    1. If the pointer points to the last row, we go to Step 5.
    2. Otherwise, we increment the pointer, decrement the value in the first row and increment the value in the row with the pointer and move to Step 3.
  5. Continue incrementing the value in the firstmost row until the maximum number is reached. When that happens, terminate the program.

The state can be extracted from the memory device in the following way. First, we need to determine what number is encoded in each row. Then we add them together as before, but in addition to it, we also add +1 for every non-zero state row, excluding the first one. The last part explains why we need to decrement the value in the first row in step 4.B. Equivalently, we can say that the first row encodes numbers \(0, 1, 2, 3, …\) and each other row encodes numbers \(0, 2, 3, 4, …\). This trick allows us to ensure that the memory device can have exactly \(P_n\) distinct states. Also, notice that in step (4.B), we will never reach 0 in the first row since the memory size of the first row is much larger compared to the number of rows, and therefore memory size of the first row is always greater than the number of rows.


The previous section showed the relation between \(P_n\) numbers and some special memory devices. Since \(P_n\) is one less than Knuth’s number \(K_n\), the same findings can also be applied to Knuth’s numbers. We can also interpret the \(P_n + 1\) differently. For example, additional +1 can be represented by one more binary unit, which encodes one extra state if we decide to combine it additively with the memory device.

The connection between Knuth’s numbers and memory devices can help us to discover and better understand some of the other properties of the sequence.

Repeated numbers and change points

The fact that \(K_n\) equals the memory capacity of the optimal memory device that can have at least \(n\) states instantly explains why we observe repeated streaks of numbers. For example, \(K_4=7\) tells us that \(K_5=K_6=7\). Basically if \(K_{i-1}=b\), \(K_i=c\) and \(b \neq c\), then \(b=i\), \(K_{i} = K_{i+1} = … = K_{c-1} = c \) and \(K_{c-1} \lt K_{c}\). Alternatively, we can say that the difference between two distinct numbers defines the length of the streaks.

Short streaks and increasing density

Similar to the sequence of prime numbers, Knuth’s number gets much more sparse, but at the same time, there are infinitely many values such that \(K_{i+1}-K_{i}=1\). Sparseness can be easily measured by taking some number \(m\) which would be equal to the number of cells in the topmost row, and comparing the largest memory size that can be constructed with \(m\) cells in the row (e.g., memory device with only trinary cells) and compare it to the upper bound of the total number of the distinct memory devices.

Short-length streaks can be easily constructed from any memory device, with 2 in the first column and 3 in the second. For example,

2 3 ... ...
2 3 ...
2 3

And then, we can replace numbers in the first column by 3 and in the second one by 2.

3 2 ... ...
3 2 ...
3 2

Notice that memory size only changed in the last most row, and since it increased by 1 the difference between memory sizes of these devices should be equal to 1. Also, since we know that the construction is optimal, it means that if the first device can be associated with some number \(K_{i}=i+1\) (and perhaps with some numbers \(j < i\)) then that second memory device has to be associated with the number \(K_{i+1}=i+2\) (and that’s the only number that can be associated to it)


Proof 1: \(P_n \ge n \)

If we assume \(k, m, n \in N\) then result can be proved by applying strong induction

  1. \(P_0 = 0\) (e.g. \(P_0 \ge 0\))
  2. Assume \(P_k \ge k \, \forall \, k \lt n\)
  3. In order to show that \(P_n \ge n\) we need to consider two cases
\[\begin{align} P_{\left\lfloor{\frac{n-1}{2}}\right\rfloor} &\ge \left\lfloor{\frac{n-1}{2}}\right\rfloor \\ &= \begin{cases} m - 1,& \text{if } n = 2m \\ m, & \text{if } n = 2m+1 \\ \end{cases} \\ &\ge \frac{n}{2} - 1 \\ P_{\left\lfloor{\frac{n-1}{3}}\right\rfloor} &\ge \left\lfloor{\frac{n-1}{3}}\right\rfloor \\ &= \begin{cases} m - 1,& \text{if } n = 3m \\ m, & \text{if } n = 3m+1 \text{ or } n = 3m+2 \\ \end{cases} \\ &\ge \frac{n}{3} - 1 \\ \end{align}\]

From the two inequalities above follows

\[\begin{align} 2P_{\left\lfloor{\frac{n-1}{2}}\right\rfloor} + 2 &\ge n \\ 3P_{\left\lfloor{\frac{n-1}{3}}\right\rfloor} + 3 &\ge n \\ \end{align}\]

From the last two inequalities and definition of \(P_n\), it follows that \(P_{n} \ge n\), which concludes the proof.

Proof 2: Optimality

The result can be proven by contradiction. Let’s assume that we discovered first \(P_n\) for which memory device construction is not optimal (\(n \gt 0\), since \(P_0\) is optimal). It gives us the following inequalities

\[P_n = x \gt y \ge n\]

where \(x\) is the size of the non-optimal memory device and \(y\) is the size of the most optimal memory device of the given construction.

Because of the way memory devices are being constructed, there are two different forms for \(x\) and \(y\), which means there are four different use cases that we have to consider.

  1. \(x = 2x’+2\) and \(y = 2y’+2\)
  2. \(x = 3x’+3\) and \(y = 3y’+3\)
  3. \(x = 3x’+3\) and \(y = 2y’+2\)
  4. \(x = 2x’+2\) and \(y = 3y’+3\)

Let’s consider the first option. If we plug new values for \(x\), \(y\) and use definition \(P_n\) we can transform our previous inequalities into the following form

\[P_{\left\lfloor{\frac{n-1}{2}}\right\rfloor} = x' \gt y' \ge \frac{n}{2} - 1 \ge \left\lfloor{\frac{n-1}{2}}\right\rfloor\]

We assumed that \(x\) is the first non-optimal value, but since we’ve shown that \(x’\) is not optimal as well, then \(x\) is not a first non-optimal encoding which means we got a contradiction. The same proof can be used for option 2, except all numbers 2 will be replaced by 3.

Option 3 and 4 are not much more difficult; in fact, just like with options 1 and 2 it’s enough to prove option 3, and for option 4, the proof is identical. From option 3 and \(P_n\) definition we know that

\[P_n = x = 3x'+3 = \min(2x''+2, 3x'+3)\]

From this, it follows that

\[2x''+2 \ge P_n = 3x'+3 \gt 2y'+2 \ge n\]

We arrived at the same sequence of inequalities as the one we had in option 1, leading us to the same contradiction, which concludes the proof.

\[x'' = P_{\left\lfloor{\frac{n-1}{2}}\right\rfloor} \gt \left\lfloor{\frac{n-1}{2}}\right\rfloor\]

Share this: