Laaarge numbers, Part II

In our journey for large numbers, we encounter a mythical monster – the Hydra. We will though deal with a well known, more abstract version of it.

Our hydras are just trees, they have a root and heads and will thus look like this:

Image is missing

Hercules has to kill the Hydra by cutting off its heads. However, every time he removes a head, the remaining parent node it was attached to will re-grow several times. If we chop off the indicated head above, the Hydra will re-grow to be

Image is missing

Only the root will not re-grow, so Hercules can remove heads directly attached to the root safely. Can Hercules kill the Hydra? If he is unfortunate, the results will be horrific. If he cuts off the head on the very left, the Hydra will become

Image is missing

But even if he makes good choices, he can expect a hell of a fight. Try it on a piece of paper.

Can the Hydra get overwhelming in size with no chance of ever cutting it down? Let’s code up some fights:

We can represent Hydras as nested lists, for example the first hydra above would be written as

hydra = [ [ [], [ [], [] ] ] ]

How do we fight it? We have to indicate which head to kill, so we can use a path of indices through the Hydra until we reach a head. For example [0, 1, 0] would point to the top-left head which Hercules killed in the example. Now we code the removal of heads

def removehead(hydra, path):

    # Longer path to go? Descend further down
    if len(path) > 2:
        removehead(hydra[path[0]], path[1:])

    # We kill the head, but the parent will re-grow
    elif len(path) == 2:
        parent = hydra[path[0]]
        head = parent[path[1]]
        if head != []:
            raise ValueError("Indicated element is not a head")

        # Remove the head from parent
        del parent[path[1]]

        # Re-grow parent twice

    # We kill a head right at the root, so nothing will re-grow
    elif len(path) == 1:
        head = hydra[path[0]]
        if head != []:
            raise ValueError("Indicated element is not a head")

        del hydra[path[0]]

        raise ValueError("Invalid path")

def copy(hydra):
    return [ copy(child) for child in hydra ]

Let’s try

>>> removehead(hydra,[0,1,0])
>>> hydra
[[[], [[]], [[]], [[]]]]

We can now simulate how long the fight will last. We just need a strategy to select the head we want to kill. At the moment, let’s just always take the leftmost head

def leftmost(hydra):
    if hydra == []:
        return []
        return [0] + leftmost(hydra[0])

Then we can simulate

def hercules(hydra, headstrategy):
    steps = 0
    while hydra != []:
        path = headstrategy(hydra)
        removehead(hydra, path)
        steps += 1

    return steps

>>> hercules(hydra, headstrategy=leftmost)

88000 steps just to kill the tiniest Hydra? Let’s just attach one more head at the end

hydra = [ [ [], [ [], [], [] ] ] ]

and the program won’t be able to kill the thing in acceptable time on my computer. Let’s gather a bit of information on the shape of the hydra in each stage

def size(hydra):
    if hydra == []:
        return 1
        return 1 + sum(size(child) for child in hydra)

def numheads(hydra):
    if hydra == []:
        return 1
        return sum(numheads(child) for child in hydra)

def depth(hydra):
    if hydra == []:
        return 0
        return 1 + max(depth(child) for child in hydra)

def maxdegree(hydra):
    if hydra == []:
        return 0
        return max(len(hydra), max(maxdegree(child) for child in hydra))

and print it

    while hydra != []:
        if steps % 10000 == 0:
            print("STEP %i" % steps)
            print("Size: %i" % size(hydra))
            print("#Heads: %i" % numheads(hydra))
            print("Depth: %i" % depth(hydra))
            print("Max. degree: %i" % maxdegree(hydra))

We get after 100000 steps

Size: 1041575
#Heads: 881450
Depth: 3
Max. degree: 158880

Killing heads can never increase the depth, that’s good. But the size of the thing is enormous, and there is a single node with 158880 children attachted to it. None of these characteristic numbers decreases with each stage.

You cannot lose

The surprising result is: Hercules can not lose against the Hydra. Every strategy will eventually kill the Hydra, but it will take ridiculously long.

The key insight is that by killing any head, we somehow reduce the complexity of the Hydra in a way it can’t make up for it by re-growing a parent whose complexity already has been reduced. The measure of complexity that has to be used goes by the name epsilon-zero, and is not straightforward in any sense. No combination of the characteristics like #Heads, Size etc. will do. The intricate order on Epsilon-zero is recursive and in some sense resembles the nature of the Hydra-game itself.

Knowing that killing the Hydra is always possible in a finite number of steps, we can use the game to give us increadibly fast-growing functions. Which strategy do we use? Why not the worst strategy? But there might be no worst strategy. Start with some Hydra, then strategy 1 could kill it in $10^{10}$ moves, strategy 2 in $10^{20}$, strategy 3 in $10^{30}$ moves and so on. Because every strategy produces new and bigger hydras that we somehow need to cut down, there is no finite set of strategies.

We can show that the above situation cannot happen, which is precisely KÅ‘nig’s lemma. It says

For every game, there is either a longest finite game or there is an infinite game.

Think about it for a second, for games you know. Are there arbitrarily long games? Now, we know that there is no infinite fight against the Hydra, because its complexity gets reduced in every round. So there is a longest fight.

The function “the longest fight against a hydra of size $n$” is our new example for a disgustingly fast-growing function.

QuickCheck and random infinite trees

So I was QuickCheck-ing a bit of code with this amazing library that will automatically produce random test cases for your properties. Often in Haskell, we’ll have recursive, tree-like datastructures, like this simple binary tree

data Tree = Leaf | Branch Tree Tree deriving Show

In order for QuickCheck to generate some random trees, we make this type an instance of Arbitrary:

import Control.Monad
import Control.Applicative
import Test.QuickCheck

instance Arbitrary Tree where
    arbitrary = oneof 
        [ return Leaf
        , Branch <$> arbitrary <*> arbitrary ]

Easy! An arbitrary Tree is either a Leaf or a Branch into two arbitrary trees. I quickly learned that THIS IS NOT THE WAY YOU DO IT. Trying it out with

sample (arbitrary :: Gen Tree)

lets Haskell give us some random trees

Branch (Branch Leaf Leaf) Leaf
Branch Leaf Leaf
Branch Leaf (Branch (Branch (Branch Leaf (Branch (Branch Leaf (Branch (Branch (Branch (Branch Leaf Leaf) Leaf) (Branch Leaf (Branch Leaf Leaf))) (Branch Leaf (Branch Leaf (Branch (Branch (Branch (Branch Leaf (Branch Leaf Leaf)) (Branch (Branch Leaf (Branch Leaf Leaf)) (Branch Leaf (Branch (Branch Leaf (Branch Leaf (Branch (Branch (Branch (Branch (Branch (Branch (Branch (Branch Leaf Leaf) (Branch (Branch (Branch (Branch Leaf Leaf) (Branch Leaf Leaf)) Leaf) Leaf)) (Branch (Branch Leaf (Branch (Branch Leaf Leaf) Leaf)) (Branch (Branch (Branch (Branch Leaf Leaf) Leaf) (Branch (Branch (Branch Leaf (Branch (Branch (Branch (Branch (Branch Leaf (Branch Leaf Leaf)) (Branch Leaf (Branch Leaf (Branch Leaf Leaf)))) (Branch (Branch Leaf Leaf) (Branch (Branch (Branch Leaf (Branch (Branch Leaf (Branch (Branch Leaf Leaf) Leaf)) Leaf)) (Branch Leaf Leaf)) Leaf))) (Branch Leaf (Branch (Branch Leaf Leaf) (Branch (Branch Leaf Leaf)  ...

I have not written out that last tree completely; not even remotely. Let’s have a closer look at what happens

depth :: Tree -> Int
depth Leaf = 0
depth (Branch l r) = 1 + max (depth l) (depth r)

Now we can compute the depth of what we get

sample (depth <$> arbitrary)
> 0,0,862,0,0,12,1,0,1,0,0

A depth of 862? That seems pretty disproportionate. oneof throws a fair coin. On heads, it will just return a Leaf and the process terminates. On tails, it will spawn a branching and repeat the same process independently. Now, in order for the tree to terminate, all active branches have to terminate. Around half of them will die, half of them will branch even further. Do these two processes cancel each other out? Can we expect the trees to terminate at a reasonable depth or is there even a possibility that the program will run indefinately?

Some experimentation

In our example, the dying and spawning of new branches weirdly balance each other, making predictions hard. But our coins don’t have to be fair. We can have QuickCheck favor certain alternatives with some frequency

instance Arbitrary Tree where
    arbitrary = frequency 
        [ (2, return Leaf)
        , (1, Branch <$> arbitrary <*> arbitrary) ]

Now twice as many branches will die than newly spawn. We should expect finite trees now

sample (depth <$> arbitrary)
> 1,0,0,6,0,0,2,0,0,0,0

Fine. A lot of trees terminate right at the root, but no weird runaway recursions happen. On the other hand

instance Arbitrary Tree where
    arbitrary = frequency 
        [ (1, return Leaf)
        , (2, Branch <$> arbitrary <*> arbitrary) ]

will spawn more trees than it will kill, so most trees will send Haskell into an unending recursion. Actually, there is a probability strictly between $0$ and $1$ that the generated tree will be finite. Interestingly, the trees which do end up finite will be very short! What happens in our initial case with the fair coin stil remains unanswered.

Time to set up some maths. Let $p_0$ be the probability of getting a leaf and write $p(d)$ for the probability of the tree having depth $d$. Firstly

$$p(0) = p_0$$

Now a tree has depth precisely $d$ if

  • Option “branch” was chosen
  • both left and right child have depth $d-1$
  • or the left child has depth $d-1$ and the right one has a smaller depth $j$
  • or the right child has depth $d-1$ and the left one has a smaller depth $j$

giving us a recursive formula
$$p(d) = (1-p)\left(p(d – 1)^2 + 2p(d – 1)\sum_{j=0}^{d-2} P(j) \right)$$

Let’s experiment a bit with these

# Random trees

n = 1000000

# d = depth of a random tree

# p0 = probability of producing a leaf/terminating in each step
for p0 in [0.2, 1/3, 0.5, 2/3, 0.8]:
    q0 = 1-p0

    # p[n] = P(d = n)
    p = [0]*n

    # t[j] = P(d < j) = sum(p[i] for i in range(0,j))
    t = [0]*n

    p[0] = p0
    p[1] = q0 * p[0]**2
    t[1] = p[0]

    for i in range(2,n):
        p[i] = q0*(p[i-1]**2 + 2*p[i-1]*t[i-1])
        t[i] = t[i-1] + p[i-1]

    E = sum(d * p[d] for d in range(0,n))

    print("Random tree depth (n=%i, p=%f)" % (n,p0))
    print("P(d < inf) = %f" % t[n-1])
    print("E[d | d < inf] = %f" % E)

For $p=\frac 2 3$, we get

Random tree depth (n=1000000, p=0.666667)
P(d < inf) = 1.000000
E[d | d < inf] = 0.833477

100% of the trees are finite, but their expected depth is only 0.83. On the contrary, for $p=\frac 1 3$, we get

Random tree depth (n=1000000, p=0.333333)
P(d < inf) = 0.500000
E[d | d < inf] = 0.416738

Exactly 50% of the trees will be finite! Of the ones that do, the expected height is only 0.417. You either die young, our you live long enough to see yourself become infinite …

What happens for our original fair coin-toss?

Random tree depth (n=1000000, p=0.500000)
P(d < inf) = 0.999998
E[d | d < inf] = 22.581590

So, most trees seem to terminate, the expected height awkwardly hangs arounds 22. That’s no conclusive result, and the numbers change if we crank up the number of iterations. We need some theory to back up the empiricism.

Maths to the rescue

Our kind of branching tree experiment is an easy special case of a so called Galton-Watson process. In each generation, the nodes are allowed to have a random number of offspring with some probabilities $p_n$. In our case, that’s simply

$$p_0 = p,\,p_2 = 1-p$$

The deciding factor for the process is the expected number $\mu$ of offspring, which in our case is

$$\mu = 0\cdot p_0 + 2\cdot p_2 = 2-2p.$$

We can now apply some theorems:

If $\mu \leq 1$, the process terminates with probability 100%.

The depth $\tau$ of our trees would in the theory of these processes be called extinction time. $\mu \leq 1$ tells us the extinction time is always finite. We want to understand their exact distribution. What’s the probability that $\tau > n$? Proposition 5 says

  • if $\mu < 1$, $P(\tau > n)$ decays exponentially with $n$.
  • if $\mu = 1$, $P(\tau > n)$ decays only linearly with $n$.

We can always compute the expected value from these tail distributions with the formula

$$ \mathbb E[\tau] = \sum_{n=1}^\infty P(\tau \geq n) $$

This means that for $\mu < 1$, the expected tree depth has a finite value, whereas for $\mu = 1$, we get
$$\mathbb E[\tau] \approx \sum_{n=1}^\infty \frac 1 n = \infty.$$

Fair coin tosses leed to trees that all eventually terminate, but we can expect them to take arbitrarily long. As soon as we tweak the frequencies a bit towards termination, we get nice and finite trees.

tl/dr Never let QuickCheck throw fair coins. Tweak the frequencies.

Remark: The other idiomatic solution to the problem is to carry around a size parameter in the generating routine

genTree depth 
    | depth == 0 = return Leaf
    | depth > 0 = 
            [ return Leaf
            , Branch <$> (genTree (depth - 1)) <*> (genTree (depth - 1)) ] 

QuickCheck can even use the size parameters for its internal workings, so in the end we just declare

instance Arbitrary Tree where
    arbitrary = sized genTree

Laaarge numbers, Part I

What is the largest number you’ve come across? A million is a lot when you think of it as money. A billion is a lot when you think of it as operations to do on a computer. $10^{100}$ is a lot when compared to the number of atoms in the universe. $10^{2 000 000}$ is a loot of books. Still, each one of those numbers can be written out by a one and a moderate amount of zeroes. The googolplex $10^{(10^{100})}$ couldn’t be written out in a physical form any more (though in a digital one), and what it would count is unimaginably large. But again, it comes in a nice and compact form that we’ve just written down: $$10^{(10^{100})}$$

We want to get numbers larger than that. And not just a single one of them, but a whole function $f$ that takes a number and computes a large number out of it, growing extremely fast. For example
$$f(n) = 10^{10^{10^n}}$$
is growing quite fast, in fact we already get the googolplex as $f(2)$. We could now play the silly game and include larger and larger power towers, take factorials like
$$f(n) = n^{n^{n!}}$$
and so an. We will be looking for a concept of a fast-growing function that subsumes all of these.

The LOOP language

The LOOP language is a very simple programming language. All of its variables do hold arbitrarily sized integers and are initialized to $0$. The only statements of the language are assignments with some arithmetic (+,-,*,/) like

x = x + 2*(y - z/2)

and loops like

loop n

The loop reads the value of n at the beginning and runs its body exactly as many times (or not at all if the expression isn’t positive). It doesn’t care if we change the value of n inside the loop. So the program

n = 5
loop n
    n= 2*n

doubles n five times, resulting in n = 160.

Let’s convince ourselves that LOOP is surprisingly powerful: For example, we can simulate if switches with loops. Instead of

if n == 5

we write

loop n - 4
    loop 6 - n
        eq = 1
neq = 1
loop eq
    neq = 0
loop neq

We don’t care if the solution is efficient, we just care for a LOOP program that computes what we want.

Brainteaser 1) Can you write a LOOP program that tests if $n$ is prime?
Brainteaser 2) Can you write every LOOP program as one that only uses basic statements var = var + 1, var = var - 1 and loop var?

Let’s now use LOOP to generate large numbers. Surely, we can exponentiate

n = 10
loop 100
   n = n * 10

and take factorials

n = 10
i = n - 1
loop n - 2
   n = n * i
   i = i - 1

and we can get our googolplex

k = 10
loop 100
    k = k * 10
n = 10
loop k
    n = n * 10

We can create incredibly large numbers by nesting loops in worse and worse ways

n = 10
loop 100
   loop n
      n = n*n

Brainteaser 3) Think of how large the output of this program is.

Sure, we have a program that generates an enormous number. What’s so special when we have programs that generate arbitrarily high numbers?

while true
    n = n + 1

But that’s not a LOOP program. And we cannot write any LOOP program that does the same job. This is because

Every LOOP program terminates.

The instruction pointer will inevitably reach the end of the program, and it can only loop back a fixed number of times. There is no endless loop in LOOP. Let that statement sink in! It means the length of a program somehow bounds how long it can run! Bigger numbers require larger programs. Still, already a small program can run extremely long. We want to measure how long.

If we take n to be our dedicated output variable, every loop program will generate a well-defined number. There are only finitely many sourcecodes of length at most k and even fewer of these are syntactically valid LOOP programs. We can therefore define
$$f(k) = \text{ maximum output of a LOOP program of length at most $k$. }$$

Brainteaser 4) Convince yourself that f grows faster than any of the “large” functions we looked at before.
Brainteaser 5) Can there be a LOOP program that, given $k$, computes $f(k)$?
Brainteaser 6) Can you implement a quick parser/interpreter for LOOP?
Brainteaser 7) How would you implement procedures in LOOP? Is recursion possible?