The Futurama Problem


For those of you who don’t know Futurama, it is an animated sci-fi sitcom that have been running regularly on TV.
In one particular episode “The Prisoner of Benda” (s06e10) a character named Professor Farnsworth builds a mind swapping machine (or body swapping machine depending on how you look at it) which operates on two users at a time.
The characters in the show all find they have their personal reasons for wanting to swap bodies with one another. So not long after finding out about the machine they start engaging in a body swapping extravaganza.
Professor Farnsworth switches body with Amy, Amy switches with Leela and Fry switches with Zoidberg etc…


They however soon realize that the machine only works one way.
Once two bodies have been swapped, the same two bodies cannot be swapped back.
The characters eventually start regretting their decision and wish to swap back into their own bodies, but how?

In the words of Professor Farnsworth(now in Benders robot body):
“I’m not sure. I’m afraid we need to use………………MATH” *evil look*.

Unexpectedly by the end, the two super intelligent basketball globetrotters, Ethan “Bubblegum” Tate and “Sweet” Clyde Dixon come to rescue and mathematically prove it can be done using two extra bodies.
So what was their solution and how did they prove it?

This theorem if you will, was constructed exclusively for this episode by Ken Keeler (who by the way holds a PhD in applied mathematics from Harvard University, but instead chose a lucrative career as a story writer…).

The problem clearly revolves around permutations, so to understand the strategy most effectively let’s frame the problem in that language.
I briefly wrote something about permutations in a previous post but I’ll make a more detailed overview for this problem.
If you’re already familiar with permutations, you may skip directly to the theorem.
Otherwise you may benefit from reading on.

Formally a permutation of a set \Omega is bijection \sigma:\Omega \rightarrow \Omega.
Normally however we don’t care about how the set elements are labelled, and simply take \{ 1,...,n \} where n = |\Omega|.
For example if \Omega = \{ Professor\;Farnsworth,Fry,Leela,Zoidberg,Bender,Amy,Hermes\}
then we could relabel \Omega as \{1,...,7\} since it is only the way in which the elements are permuted that matters not the labels themselves.

A permutation cycle is usually denoted \sigma = (a_1\;a_2\;...\;a_k)
meaning \sigma\; a_i = a_{i+1} \; \forall i < k and \sigma\; a_k = a_1.

There are two special kinds of cycles:
If k=1 then we say \sigma fixes a_1.
If k=2 then we call \sigma = (a_1\;a_2) a transposition (i.e a “swap”)

The identity permutation (usually denoted \iota) is the permutation that fixes every element of the set.

Clearly in the problem a series of transpositions have taken place.
Our task is to find transpositions different from the ones already applied to turn the overall permutation back to the identity i.e so that everyone is back into their own body.

Permutations can be composed just like ordinary functions (although algebraists usually like to write composition from right to left in what is called Reverse Polish Notation we will write them from left to right like most of you are probably more familiar with).

Composition is perhaps best illustrated via an example:
Suppose we have two cycles \sigma_1 = (1\;3\;5\;2),\; \sigma_2 = (3\;1\;5\;6\;2)
Then \sigma_1\sigma_2 = (1\;3\;5\;2)(3\;1\;5\;6\;2) = (1\;2\;5\;6)(3) = (1\;2\;5\;6).
To arrive at this one simply looks at what \sigma_1\sigma_2 does to each number.
For example starting with 1 we see that \sigma_2\;1 = 5 and \sigma_1\;5 = 2.
Therefore \sigma_1\sigma_2\;1 = 2 after which the procedure is repeated for 2.
You notice that when we eventually get to 6 we are mapped back to 1 which is where we started, so this completes a cycle. One moreover sees that 3 is a fixed by \sigma_1\sigma_2 even though neither \sigma_1 nor \sigma_2 fixes 3. Also note that if a permutation fixes an element we usually don’t write it out, which is why 3 was omitted on the right hand side.

There are a few useful facts about permutations that I won’t prove here but are fundamental to the study of permutations:

  • Every permutation may be written as a composition of disjoint cycles.
    Two cycles are disjoint if they move (i.e don’t fix) no point in common.
    (The proof essentially goes by strong induction on the number of points moved by the permutation)
    In the example above we saw how the permutation \sigma_1\sigma_2 = (1\;3\;5\;2)(3\;1\;5\;6\;2) was put
    into disjoint cycle form (1\;2\;5\;6)(3).
  • Disjoint cycles commute.
    This means \sigma_1\sigma_2 = \sigma_2\sigma_1 whenever \sigma_1,\;\sigma_2 are disjoint.
    The result should be obvious if you think about it.
    For example (1\;2\;3)(4\;5) = (4\;5)(1\;2\;3) but (1\;2\;3)(3\;4) \neq (3\;4)(1\;2\;3) partly
    because 2 \rightarrow 3 on left hand side whereas 2 \rightarrow 4 on right hand side.

    Ok, now that we have plowed through all that language, let’s put it to good use.

    Futurama Theorem:

    Let \Omega be a finite set.

    Then any permutation \sigma of \Omega can be reduced to the identity by joint composition with distinct transpositions (a\;b) where a \in \Omega \cup \{ x,y \},\; b \in \{ x,y \} and x,y \notin \Omega


    Like the mathematician and problem solving expert George Pólya once said: if you can’t solve a problem then there is probably an easier problem which you can solve, find it.
    So let’s start by considering k-cycles instead of general permutations.

    Let \sigma = (a_1\;...\;a_k) be a k-cycle.
    Then \tau = (a_1\;x)(a_2\;x)\sigma will send a_1 back to itself given that \tau\;a_1 = (a_1\;x)(a_2\;x)\sigma\; a_1 = (a_1\;x)(a_2\;x)\; a_2 = (a_1\;x)\; x = a_1
    Similarly we see that \tau = (a_2\;x)(a_3\;x)\sigma sends a_2 back to itself.
    Moreover (a_1\;x)(a_2\;x)(a_3\;x)\sigma sends both a_1 and a_2 back to themselves.

    Therefore continuing inductively we find that for \tau := (a_1\;x)...(a_1\;a_k)\sigma we have:
    \tau\;a = a\;\;\;  \forall a \in \{ a_1,...,a_{k-1}\}.

    Unfortunately \tau sends a_k \rightarrow x and not back to itself.
    To fix this we can use another auxiliary element y and compose:
    \tau^* := (a_k\;y)(a_1\;x)...(a_k\;x)(a_1\;y)\sigma.

    Then \tau^* has the property that it fixes every a \in \{a_1,...,a_k\} and swaps x,y.
    This in turn can be corrected for by transposing (x\;y) at the end.

    Therefore (x\;y)(a_k\;y)(a_1\;x)...(a_k\;x)(a_1\;y)\sigma acts like the identity.

    This solves the problem for k-cycles.
    For example if the characters initially swap according to \sigma = (1\;2)(2\;3)(3\;4)...(k-1\;k) then the disjoint cycle form of \sigma is just the single k-cycle (1\;2...k).
    Hence (x\;y)(k\;y)(1\;x)...(k\;x)(1\;y) will undo \sigma so that everyone is back into their own bodies, including the globetrotters x,y.

    Now for a general permutation \sigma we can use the fact that every permutation may be written as a composition of disjoint cycles. This brings us back into familiar territory, and in combination with the fact that disjoint cycles commute we can solve the general problem.

    Let \sigma = C_1 C_2...C_n be a general permutation composed of disjoint cycles C_i.
    Moreover let \sigma^*_i be the permutation corresponding to C_i s.t \sigma^*_i C_i = (x\;y) (as above)

    Then since disjoint cycles commute
    \sigma^*_1...\sigma^*_n\sigma = \sigma^*_1...\sigma^*_n C_1 ... C_n = \sigma^*_1 C_1...\sigma^*_n C_n = (x\;y) ... (x\;y) = (x\;y)^n.

    Thus \sigma^*_1 ...\sigma^*_n \sigma = \begin{cases} \iota, & if\;\; n\;even \\ (x\;y) & if\;\;n\;odd \end{cases}.

    Therefore to get back to the identity we compose \sigma^*_1 ... \sigma^*_n\sigma with (x\;y) whenever n is odd and leave like it is when n is even. We can do this because we have not yet used (x\;y).

    This completes the argument, or to quote “Sweet” Clyde Dixon: “Q to the E to the D”.



    Playing around with the same idea I actually found one can generalize this statement to machines that swap p-tuples of bodies in a cyclic shift with p prime, disallowing the same p-tuple to ever swap again (in any order), using at most p extra bodies.


    Generalized Futurama Theorem:

    Let p \geq 3 be a prime number and let \Omega be a finite set s.t |\Omega| \geq p.

    Then any permutation \sigma of \Omega can be reduced to the identity by joint composition with distinct p-cycles in \Omega \bigcup \{x_1,...,x_p\}, not exclusively contained in \Omega, where \Omega \bigcap \{x_1,...,x_p\} = \emptyset.


    Like in the previous argument we first want to prove the statement for cycles, which we know are building blocks for general permutations.

    Let \sigma = (a_1\;...\;a_m) be an m-cycle in \Omega

    \tau := (a_m\;a_{m-1}\;x_{p-2}\;...\;x_1)...(a_2\;a_1\;x_{p-2}\;...\;x_1)(a_1\;x_{p-1}\;x_{p-2}\;...\;x_1)\sigma
    fixes every element in \{a_1,...a_{m-1}\}.

    Remaining elements a_m,x_1,...,x_{p-1} are permuted among themselves, with the restriction that a_m \rightarrow x_{p-1}.
    By rearranging the order of x_{p-2},...,x_1 in the leftmost cycle of \tau if necessary we may assume \tau reduces to a p-cycle.

    We have one unused element left, namely x_p.
    We may use x_p to fix a_m in the p-cycle resulting from \tau to get a cycle solely in terms of x_1,...,x_p.

    Suppose \tau = (a_m\;x_{p-1}\;y_{p-2}\;...\;y_1) where y_{p-2},...,y_1 is some permutation of x_{p-2},...,x_1.

    Define \sigma^* := (x_p\;a_m\;y_{p-2}\;y_{p-3}\;...\;y_1)(x_{p-1}\;x_p\;a_m\;y_{p-3}\;...\;y_1)\tau

    Then one can verify that \sigma^* fixes a_m and permutes x_p,x_{p-1},y_{p-2},...,y_1 in a p-cycle.

    Thus \sigma^* is some p-cycle in x_1,...,x_p.
    Therefore \sigma^* = (x_1\;...\;x_p)^j for some j \in \mathbb{Z} since p is prime.

    Now turning our attention to general permutations \sigma suppose \sigma = C_1 C_2...C_n where C_1,...,C_n are disjoint cycles.

    For each C_i let \sigma^*_i be the permutation s.t \sigma^*_i C_i = (x_1\;...\;x_p)^{j_i} for some j_i \in \mathbb{Z} (as above).

    Since disjoint cycles commute
    \sigma^*_1...\sigma^*_n\sigma = \sigma^*_1...\sigma^*_n C_1 C_2...C_n = \sigma^*_1 C_1 \sigma^*_2 C_2 ... \sigma^*_n C_n =
    (x_1\;...\;x_p)^{j_1}...(x_1\;...\;x_p)^{j_n} = (x_1\;...\;x_p)^{\sum_{i=1}^n j_i} = (x_1\;...\;x_p)^{(\sum_{i=1}^n j_i)\; (mod\;p)}.

    Since we have yet to use the p-tuple [x_1,...,x_p] we may compose \sigma^*_1...\sigma^*_n\sigma with
    the p-cycle (x_1\;...\;x_p)^{-(\sum_{i=1}^n j_i)\; (mod\;p)} to turn the permutation back to the identity
    (unless \sum_{i=1}^n j_i \equiv 0\;(mod\;p) in which case we need to do nothing).


    Below are some exercises related to this post, feel free to try them:

    Exercise 1 (The Swap Puzzle – Easy):

    Consider a puzzle where the numbers \{ 1,...,n \} are scrambled in some way into n different boxes.
    The only legal move is to swap numbers between two boxes.
    The puzzle is solved when the numbers are arranged from 1 to n in increasing order.


    i) Show that the puzzle can always be solved.

    ii) Deduce that any permutation may be written as a composition of transpositions.


    Exercise 2 (A Shorter Swapping Sequence – Intermediate):

    Suppose the Futurama characters swap according to the permutation (4\;5)(8\;9)(1\;2)(3\;9)(6\;5)(3\;7)(3\;6) = (1\;2)(3\;4\;5\;6\;7\;8\;9).

    The method outlined in the Futurama theorem would yield the inversion sequence

    This requires 13 swaps and 2 extra bodies.

    i) Show that this permuation can be brought back to the identity using only 9 swaps and no extra bodies.

    ii) Find a sequence of transpositions that cannot be undone unless one uses 2 extra bodies.

    Posted in Algebra, Combinatorics, Uncategorized | Tagged , , , , , , , , , | Leave a comment

    The Laser Gun


    I was casually browsing YouTube for nice puzzles (like one does) and stumbled upon this amazing point-line geometry puzzle that I had never seen before, so I couldn’t resist writing it up.

    The full video introduces a series of nice puzzles designed to challenge common intuition and this was one of them.

    You can see the introduction to the problem I am about to discuss in the clip below:

    Fortunately (or unfortunately depending on your preference) the presenter leaves out the full explanation, although he does give the answer and some good hints on where to begin for the proof.
    I thought I’d fill in the gaps to complete the argument of this excellent puzzle.

    In case you did not want to bother watching, here is the problem statement from the video:

    Problem (The Laser Gun):

    You are standing at a fixed point in a large, rectangular room with mirrored walls.


    Elsewhere in the room is your enemy(red), with a laser gun.

    Your only defense is that you may summon bodyguards and position them in the room so as to block all possible shots.

    How many bodyguards do you need?

    (I should remark that all entities in this problem are considered to be points, so they have no “thickness” although it appears that way in the picture).


    As the presenter mentions, you could clearly do it with an uncountable continuum of points.
    A circle is such a set and no matter how close the enemy is to us, we can center ourselves in a circle of bodyguards with radius half the distance between ourselves and our enemy.
    (If you know some Topology you trivially recognize that this comes from the fact that \mathbb{R}^2 with the Euclidean topology is Hausdorff, and in particular a Kolmogorov space).

    The presenter however tells us that only 16 bodyguards are necessary and sufficient to block all possible shots!
    What is counter-intuitive about this problem is that there are infinitely many dense trajectories in the rectangle, meaning we can get hit from an infinite number of angles, and yet the claim is that only 16 bodyguards suffice to block all shots – amazing, no?

    In case you wish to attempt the problem yourself here is the customary SPOILER ALERT!

    Solution (The Laser Gun):

    On a first look this problem seems increasingly difficult to analyze, how do you understand if an arbitrary trajectory is ever going to hit you or not?

    The key to this problem was revealed in the video by the presenter.
    The presenter mentions it is not too hard convincing yourself that only a countably infinite number of body guards are needed by considering reflections of the rectangle in the plane.

    So what does he mean by this?
    A countable infinity is in some sense smaller than an uncountable infinity (think of \mathbb{Z} vs \mathbb{R}).
    This would be good because it means there are only finitely many trajectories that can hit us with the property that it may only bounce n times.
    According to the presenter one can see this by looking at reflections of the rectangle.
    Following his suggestion we can see that instead of bouncing, we can think of the laser as continuing straight through the mirror into another room which is the mirror image of the current room.
    Each wall crossed by the laser represents a bounce in the actual trajectory.
    See picture below:


    This completely removes the complexity of having piecewise linear trajectories (i.e bouncing) since we can just look at straight lines in the infinite plane. For the enemy to hit us, he now needs to hit one of our countably infinite reflected copies in any of the virtually mirrored rooms using a straight laser shot.
    See picture below:


    This is good, but there are still infinitely many target points.
    It turns out this is fine, because one guard can block an infinite number of trajectories towards a target.
    Why? Because not only are we reflected in each mirrored room, so are our bodyguards.
    It therefore seems conceivable that a finite number might do given that there are only finitely many relative positions of ourselves in the rectangle across all reflected rooms (inspect above diagram and you will see).

    Let’s add some mathematical precision to this argument.

    Suppose the rectangle has dimensions a,b and the origin is set at the lower left corner.

    It follows that our reflections appear at the following points in the plane: (ma \pm x_1, nb \pm y_1) where m,n \in \mathbb{Z}

    The equations of the straight lines between the enemy at (x_2,y_2) and our reflections are given by (y-y_2) = \frac{(nb \pm y_1) - y_2}{(ma \pm x_1) - x_2}(x - x_2) in two point form.

    We now need to choose points in the rectangle in such a way that the points along with their infinite number of reflections are situated on each of the above lines.

    The laser beam directly hitting us must pass through the mid-line between ourselves and our enemy so we may initially consider x = \frac{x_1+x_2}{2}.
    Plugging this into the beams line equation (i.e setting m,n=0) we get y = \frac{y_1+y_2}{2}.
    The point ( \frac{x_1+x_2}{2},  \frac{y_1+y_2}{2}) lies in the room so we may place a bodyguard there.
    Incidentally one sees that all points of the form (ma + \frac{x_1+x_2}{2}, nb + \frac{y_1+y_2}{2}) lie on corresponding lines (y-y_2) = \frac{(nb + y_1) - y_2}{(ma + x_1) - x_2}(x - x_2).
    However only targets of the form (3ma + x_1, 2nb + y_1) are situated in the line of fire protected by this guard (or some reflection of it).

    Proceeding in similar fashion using symmetry, we can list all points covering required targets as:
    (x,y) = ((ma \pm \frac{x_1}{2}) + \frac{x_2}{2}, (nb \pm \frac{y_1}{2}) + \frac{y_2}{2} )

    Assuming x_1 < x_2, y_1 < y_2 it is enough to place guards at below 16 points since these are the points of the form above which have coordinates in the actual room (with remaining reflected points being collinear with one of them).
    (\frac{x_2}{2} \pm \frac{x_1}{2}, \frac{y_2}{2} \pm \frac{y_1}{2} )
    (\frac{x_2}{2} \pm \frac{x_1}{2}, b - \frac{y_2}{2} \pm \frac{y_1}{2} )
    (a - \frac{x_2}{2} \pm \frac{x_1}{2}, \frac{y_2}{2} \pm \frac{y_1}{2} )
    (a - \frac{x_2}{2} \pm \frac{x_1}{2}, b - \frac{y_2}{2} \pm \frac{y_1}{2} )

    The other cases differ only by switching the roles of x_1,x_2 and y_1,y_2.



    It might be interesting to look at other shapes and try to answer the same question.

    Here is an interesting example I came to think of:

    The Laser Gun – Ellipse Version:

    By definition an ellipse is the locus of points such that the sum of the distances to two fixed points (the foci) is a constant.


    It follows by miracle of geometry that any trajectory passing through one of the foci must reflect into the other focus.
    (btw if you want to construct the worlds easiest mini-golf course this may be it, you virtually can’t miss provided your putt has appropriate force)

    Thus if the enemy is positioned at one of the foci and you on the other, then the enemy can shoot in any given direction he wishes and still be able to hit you. Therefore an uncountably infinite number of bodyguards are needed in this case to protect you. If instead your position is anywhere else and the enemy is at the focus, then at most two bodyguards are needed. All you have to do is position one bodyguard to block the direct shot and place the second bodyguard at the other focus to block every reflection.



    Perhaps even more interesting would be to look at other n-sided polygons.
    In fact this could make an interesting mini-project.

    Clearly there is something funny going on when angles are not rational multiples of 2\pi.
    I will let you work out what it is.


    It may therefore be easiest to start with regular n-gons.
    For example in a room shaped like an isosceles triangle we would get a picture that looks something like the below:


    Perhaps it may be tempting to draw some kind of conclusion from this, but inaccurate pictures have fooled many!
    This needs a mathematical argument.

    Posted in Geometry, Uncategorized | Tagged , , , , , , , , | 1 Comment

    A Fair Coin, An Unfair Game


    Suppose you have an unbiased coin (i.e a coin with equal probability of landing heads or tails) but would like to construct an outcome of biased probability p, how would you do it?

    I remember this question (or something along these lines) being asked in a hiring interview where I used to work, using p = \frac{1}{3}.

    I’m not entirely sure what the expected answer was, but I believe they wanted candidates to answer something like “No, it is not possible because…”.

    Presumably people would have argued that the closest thing you can get to p = \frac{1}{3} is to flip the coin 2 times:
    If you hit Heads two times then the outcome is positive,
    If you hit Heads and Tails (in either order) then the outcome is negative,
    If you hit Tails two times then you repeat the procedure,
    arguing that this procedure could repeat itself indefinitely so that it would be impossible to construct this outcome because you might be flipping the coin forever.
    Although this does not exclude the possibility of a different working strategy it would probably have demonstrated enough analytical thinking for the interview exercise given the role the person was interviewing for.

    I think this raises an interesting question because it sort of comes back to a popular philosophical debate about the essence of probability. What do we mean by “impossible” in this case? How likely is it that the procedure will continue forever?

    A quick check will reveal that the probability of the procedure continuing up until the (2n)^{th} coin toss is (\frac{1}{4})^n. So the probability that the coin is tossed forever is lim_{k \rightarrow \infty} (\frac{1}{4})^k = 0.

    So the question is, what is the meaning of probability 0?
    Does it mean the procedure must stop eventually, in which case the expected answer should be “Yes, it is possible because…”? Or can an event of probability 0 actually happen? Must an event of probability 1 happen?

    Mathematicians have assigned a special (and wonderfully vague) term to describe this phenomenon. They would say here that the procedure will almost surely not continue forever.

    Therefore a more accurate answer to the question would be “Yes, one can construct this outcome almost surely” (i.e with probability 1).

    To understand the difference between “surely never” and “with probability 0” the standard metaphor is to consider a dart board. What is the probability of hitting any particular point with a dart?
    In this continuous setting we are asking what the probability is of hitting any one out of infinitely many points. The probability of that happening is zero of course. On the other hand the dart must hit the board somewhere (assuming you don’t miss). The point landed on by the dart had probability zero of being hit, and yet it just happened. So in this case hitting any particular point will almost surely never happen. On the other hand if something surely never happens, then it has the intuitive meaning one would associate with the word ‘never’ i.e that the event is not in the sample space so we can’t even begin to assign a probability to it (impossible in other words).

    Interestingly one can use a fair coin to construct any outcome of probability p with probability 1.

    Problem (The Car Conundrum):

    Suppose you and a friend play the lottery together and manage to win a car.
    You paid a fraction p > \frac{1}{2} of the lottery ticket and your friend the rest (i.e 1-p).
    Both of you have a claim on the car but only one of you can own it and it needs to be declared on the spot.
    You only have a regular unbiased coin in your pocket to aid you.

    How do you settle this fairly?


    To settle this fairly you should play an unfair game, biased in proportion to the amount each person contributed to the purchase of the winning lottery ticket.

    So this is exactly the problem we have been discussing.

    Let I(n) = \begin{cases} 1, & if\;\;n^{th}\;toss\;is\;a\;Head \\ 0 & if \;\;n^{th}\;toss\;is\;a\;Tail \end{cases}

    Write p in binary decimal form say, p = 0.d_1d_2d_3... = \sum_{n=1}^{\infty} d_n2^{-n} where d_i \in \{0,1\} \forall i.

    Now keep on tossing the coin as long as I(n) = d_n.

    Suppose for the first time, on the N^{th} toss, I(N) \neq d_N.

    Declare yourself a winner if d_N = 1 and your friend a winner if d_N = 0.

    The probability that the procedure is stopped after n tosses is clearly (\frac{1}{2})^n.

    Thus the probability that the procedure will stop after a finite number tosses is:

    \mathbb{P}( N < \infty ) = \sum_{n=1}^{\infty} (\frac{1}{2})^n = -1 + \sum_{n=0}^{\infty} (\frac{1}{2})^n = -1 + \frac{1}{1-\frac{1}{2}} = -1 + 2 = 1
    (note that the last sum is an infinite geometric series).

    Finally since you can only win if d_n = 1 we get \mathbb{P}( You\;Win ) = \sum_{n=1}^{\infty} d_n\mathbb{P}(N = n) = \sum_{n=1}^{\infty} d_n(\frac{1}{2})^n = p as required.


    Exercise (Surely or Almost Surely? – Easy):

    Determine whether the following events happen surely or almost surely.

    i) Selecting a number uniformly at random in [0,1] and not getting \frac{1}{2}.

    ii) Selecting a number uniformly at random in [0,1] and not getting 2.

    iii) (Infinite Monkey Theorem)
    A monkey randomly typing the complete works of Shakespeare given infinite amount of time.


    Posted in Probability, Uncategorized | Tagged , , , , , , , | 2 Comments

    Fermat’s Last Theorem (mod p)



    (Picture from the Simpsons episode “Treehouse of Horror VI” where Homer enters the world of 3D.
    Try entering the numbers in your hand calculator)



    The statement of the perhaps most famous mathematics problem over the past few centuries should not have slipped past many people with some interest in mathematics.
    I am of course talking about Fermat’s Last Theorem (FLT) which was finally resolved by Andrew Wiles in 1995 in what is considered one of the most remarkable mathematical achievements of modern day.
    FLT is one of those problems whose statement is so often understood by anybody on the street, but is very difficult to prove. The amount of machinery that lies behind Wiles proof is astonishing (more so in algebraic geometry than pure number theory).

    As a reminder I will state FLT and then begin discussing the contents of this post.

    Fermat’s Last Theorem:

    Let n \in \mathbb{Z},\;n > 2.Then there exists no x,y,z \in \mathbb{Z},\;x,y,z \neq 0 s.t x^n + y^n = z^n.

    In this post I wanted to treat a related question.
    Is FLT true over finite fields of characteristic p?
    That is, can we refute the existence of non-trivial integer solutions to x^n + y^n = z^n if we work (mod\;p) where p is a prime number?

    The answer is No.
    You can probably immediately spot counterexamples such as 2^3 + 2^3 \equiv 1^3 (mod\;3).

    More remarkably however, given n \in \mathbb{Z},\;n \geq 1 there exists non-trivial solutions to x^n + y^n = z^n (mod\;p) for every prime p sufficiently large.
    Schur came up with this result while trying to prove FLT.
    This result probably gives some idea as to why FLT was so hard to prove since it shows one cannot simply consider congruences like one often does when working with other types of Diophantine equations.

    Another interesting fact about this result is that it can be proved using both Fourier Analysis and Ramsey Theory. This may seem rather surprising since Ramsey Theory is combinatorics (a field dealing with countable discrete structures) whereas Fourier Analysis revolves around function approximation using continuous trigonometric functions, and both are used here to prove a number theoretic result. This shows just how differently mathematics problems can be approached if one can find a suitable phrasing in whichever language is being used.

    I would like to focus on the latter proof using Ramsey Theory.
    So what is Ramsey Theory?
    The slogan for Ramsey Theory is basically that any sufficiently large structure, no matter how disordered, contains some element of order. For graph structures one usually likes to look at edge colourings and say something about specific subgraphs where all edges are of the same colour.

    Let’s make an important definition to make this more precise:

    Definition (Ramsey Number):

    A graph on n vertices is complete (denoted K_n) if every pair of vertices is connected by a unique edge.

    Complete Graph

    Let R(r,s) denote the smallest number such that for any red/blue edge-colouring of K_{R(r,s)} there exists either a monochromatic red subgraph K_r or a monochromatic blue subgraph K_s.

    The numbers R(r,s) are called Ramsey Numbers.


    We may also define Generalized Ramsey Numbers R(r_1,...,r_k) as the smallest number such that any k-colouring of K_{R(r_1,...,r_k)} contains at least one monochromatic subgraph K_{r_i} of colour i for some 1 \leq i \leq k

    I have omitted one crucial detail: How do we know R(r,s) is well-defined?
    This is exactly what Ramsey’s Theorem asserts, that this number exists in the first place.
    The proof is not actually very difficult, one proceeds by proving an upper bound on R(r,s). One can then show well-definedness of the generalized Ramsey number by proving the recursive inequality R(r_1,...,r_k) \leq R(r_1,...,r_{k-2}, R(r_{k-1},r_k)) and using the well-definedness of R(r_{k-1},r_k) as part of an inductive argument.
    I will however omit the details of the proof and move on to the main result of this post.
    Before we get there we need an intermediate theorem which seemingly has little to do with our goal:

    Theorem (Schur):

    Let k > 1.

    Then \exists N \in \mathbb{Z} s.t n \geq N \Longrightarrow \forall k-colourings\; of\;\{1,...,n\},\;\exists\; x,y,z \in \{1,...,n\} of same colour s.t x+y=z



    A k-colouring of \{1,...,n\} is the same as a function C:\{1,...,n\} \rightarrow \{1,...k\}.

    Take n \geq R(3,...,3)\; (k\;times).

    Then by Ramsey’s Theorem we are guaranteed that any k-colouring of K_n contains a monochromatic triangle.

    Define a k-colouring on the edges of K_n by e = (i,j) \mapsto C(|i-j|).

    The existence of a monochromatic triangle in K_n then means \exists a,b,c \in \{1,...,n\} s.t C(|a-b|) = C(|b-c|) = C(|c-a|)

    W.l.o.g assume a < b < c and put x = b-a,\; y = c-b, z = c-a \in \{1,...,n\}.

    Then x + y = z as desired where C(x) = C(y) = C(z).


    Main Theorem:

    Let n \in \mathbb{Z},\; n > 1.

    Then there exists a prime p_0 s.t for every prime p \geq p_0

    x^n + y^n \equiv z^n (mod\;p) has a non-trivial integer solution i.e a solution where x,y,x \not\equiv 0 (mod\;p)


    The proof depends on a key fact which is much easier to understand if you know some Group Theory.

    The fact is that every integer in \{1,...p-1\} can be written as some (multiplicative) power of a fixed integer in the same set (mod\;p).
    [in other words \mathbb{Z}_p^* is a cyclic group]

    To informally look at this another way (without directly using notions from Group Theory), define A_n := \{ a^n\;(mod\;p): a \in \{ 1,...,p-1\} \}.

    The reason we want p to be a prime instead of just any integer is that we need to make sure we have no zero divisors i.e we need to ensure there exists no non-zero a,b s.t ab = 0 (mod\;p).
    One of the reasons for this is that we would like to make the definition below:

    Define an equivalence relation \sim on \{ 1,...,p-1\} by:
    x \sim y \Longleftrightarrow x \equiv yt \;(mod\;p) for some t \in A_n.
    (You can check this is well defined for p prime)

    The so called equivalence classes are defined as a_iA_n := \{ a_it\;(mod\;p) : t \in A_n \} for some representatives a_1,...,a_k \in \{ 1,...,p-1\}

    A standard fact from set theory tells us that equivalence classes on a set form a partition
    \{ 1,...,p-1\} = a_1A_n \cup ... \cup a_kA_n.
    After all, every integer in \{ 1,...,p-1\} must belong to some equivalence class and you can check that they do not intersect.

    Now we can start colouring the set \{ 1,...,p-1\}.

    Define the colouring C:\{ 1,...,p-1\} \rightarrow \{1,...k\} by C(x) = i iff x \in a_iA_n.

    By Schur’s Theorem \exists N \in \mathbb{Z} s.t p \geq N \Longrightarrow \exists a,b,c \in \{ 1,...,p-1\} s.t a+b \equiv c\;(mod\;p) with C(a)=C(b)=C(c).

    In other words we can find non-trivial a,b,c \in \{ 1,...,p-1\} s.t a,b,c \in a_iA_n for some i so that a = a_ix^n, b=a_iy^n,c=a_iz^n for some x,y,z \in \{ 1,...,p-1\} and a_ix^n+a_iy^n \equiv a_iz^n\; (mod\;p)

    Since a_i and p are co-prime we may divide away a_i to get x^n+y^n \equiv z^n\; (mod\;p).


    Exercise 1( R(3,3) -Easy ):

    Prove that R(3,3) = 6.

    Hint: Think of the pigeonhole principle for the upper bound

    (I was actually asked to prove this myself in one of my undergraduate admissions interviews, though it was stated differently in terms of friends and strangers instead of as a Ramsey Number)


    Exercise 2( R(3,3,3) -Hard ):

    Prove that R(3,3,3) = 17.

    (Note: Proving this was one of the questions in a past IMO competition, again stated very differently).

    Posted in Combinatorics, Number Theory, Uncategorized | Tagged , , , , , , | 2 Comments

    The Resistance


    Some time ago I was introduced to this board game called The Resistance.

    I very rarely play board games, so I am anything but an expert.
    My impression however of non-mainstream fantasy-themed board games is that they are usually very time consuming and quite difficult to grasp (and played only by people in long leather coats).

    On the contrary the designer behind this game might be on to something.
    The Resistance has many of the common features of a great mainstream board game.
    The rules are easy to understand, games are short, very social, there is no player elimination, involves deduction, deceit and it can probably be converted into a drinking game…. you name it.

    So how does the game work?


    The game can be played by any number of people between 5 and 10 (at least this is the recommendation).
    There are two opposing sides: The Resistance(the good guys), and the Spies(the bad guys).
    The sides are determined by cards that are shuffled and dealt among the players.
    The number of members on each side is determined according to a table you can find here.
    The game is set up so that spies know who each other are, but the resistance members don’t.
    It is in the interest of the resistance to try and identify who is a spy and who is not (of course there can be disagreement which is the basis of the whole game).
    The game consists of up to 5 rounds, called missions.
    Before each mission the leader(a rotating role) chooses a team, after which there is a round of voting to determine whether the team should be accepted for the mission or not.
    The majority vote wins (in case of a tie the team is rejected).
    The size of the team in each mission is decided by a table you can find here.
    If the team is rejected then the next leader chooses a new team and another round of voting takes place.
    There can be no more than 5 rejections over the course of the game, otherwise the spies win.
    If the team is accepted then each team member can decide whether to succeed or fail the mission.
    These decisions are recorded on cards that are shuffled and then shown (so that it is impossible to directly link a decision back to any player – deducing who is a spy is in essence what the game is about).
    If any of the cards have recorded a Fail, then the mission as a whole has failed (i.e Fail has veto).
    The Resistance need 3 successful missions to win the game, otherwise the spies win (best of 5 missions).
    Of course members of the Resistance have no reason to fail any mission, so only spies can fail missions.

    If you are still unclear about the rules you can watch a video here.
    The video also shows what the game pieces physically look like if you are interested.

    Besides the rules I wanted to mention some tactical remarks so you can get a feel for how the game is played:

  • Talking a lot is part of the game. If you are a spy you need to convince the resistance that you are not a spy and do everything you can to cause confusion in the game. This could take the form of false accusations, coming up with deceiving arguments and other sorts of manipulation to pretend you are an ally of the opposite camp.

  • Being in the resistance can be quite difficult. It is often easier for the individual members of the resistance to gather information for themselves as to who is a spy, since they can eliminate themselves from the equation. However any such information is not easily communicated to the rest of the resistance members, why should they trust you?

  • As part of being in the resistance you may want to carefully observe why some players tend to reject teams with certain members (or the absence of certain members). This is a source of information which I rarely used myself while playing as resistance. Of course the spies want to ensure that they are part of as many missions as possible so that they are in control. Therefore certain teams should look less attractive to them.

  • Being a spy you do not necessarily want to fail every mission you are a part of. Particularly if there are more than one spy in a mission it can be devastating to reveal this fact by for example double failing. If you fail a mission with a small team you tend to give away more information as a spy since the resistance can narrow down the search to fewer people. As a spy you can also choose not to fail a mission to gain trust in further missions and then sit back and watch the confusion as you fail subsequent missions for unapparent reasons. On the flip side by not failing you may give away some game equity in case the mission succeeds. Therefore any such move needs to be weighed carefully as a tradeoff.

    While playing it became obvious that the game seems heavily biased in favour of the spies (or we may have been playing it wrong). Out of the 5 or 6 times that the game was played I believe the resistance won only once, and that was because of a glaring mistake on part of the spies. I remember afterwards thinking whether there exists a better strategy the resistance could employ to increase their chances of winning?
    Is there even a strategy the spies can implement to win with certainty?

    I don’t know if any game theorist have ever taken a serious look at this game before, but I’ve at least managed to find a strategy that gives the resistance a guaranteed chance of winning with probability \frac{4}{15} \approx 27\% in a game of 6 people and a winning rate of \frac{3}{10} = 30\% in a game of 5 people.
    I do not recommend playing this strategy unless you want to kill all the fun in the game.
    I merely want to establish the existence of a strategy that attempts to answer the questions posed above in a neutral setting where no information other than what is on the board is used. There is also no claim of optimality for this strategy (in fact there is a chance you may be better off using social tells to aid you).

    A Strategy For The Resistance:

    The main idea behind the strategy is that the resistance can pre-adopt a decision tree deciding what teams should be sent on each mission depending on the outcome of previous missions.

    By fixing a decision tree no team can ever get rejected since the resistance are in majority.
    Moreover if a spy down-votes a team he has immediately revealed himself to be a spy since members of the resistance have no reason to down-vote.
    Therefore all votes must be positive.
    This makes team voting redundant.

    Since the resistance members are initially unaware of who is on their side the decision tree must be made public information i.e the spies know exactly how the teams are selected for each mission.

    We may therefore assume the spies choose the path that maximizes their gain in the decision tree.

    So we have translated the game into a situation where the resistance come up with the best possible decision tree and spies pick the worst possible path for the resistance.

    Of course the question is how often the spies can find a path that leads to at least 3 failed missions considering all possible spy locations around the table?

    The following decision tree can be used by the resistance to achieve certain win in 4 out of \left( \begin{array}{c} 6 \\ 2 \end{array} \right) = 15 possible spy configurations in a 6 player game.
    The tree is by no means unique.
    Here a filled black circle represents a player being selected to go on a mission.
    According to the game table the number of people required on each of the 5 missions is 2,3,4,3,4 for 6 players.
    It should also be noted that a “Fail” branch may only be taken if there is a spy in the team.
    Hence the spies may not always be able to reach 3 failed missions.



    Strategy Analysis:

    Let us analyze the game in which there are 6 players.
    The number of distinct decision trees for this problem is given by:

    \left( \begin{array}{c} 6 \\ 2 \end{array} \right)^{1}\left( \begin{array}{c} 6 \\ 3 \end{array} \right)^{2}\left( \begin{array}{c} 6 \\ 4 \end{array} \right)^{4}\left( \begin{array}{c} 6 \\ 3 \end{array} \right)^{8}\left( \begin{array}{c} 6 \\ 4 \end{array} \right)^{16} \approx 5.10^{37}

    Like in so many combinatorial games this is impossible to brute force.
    That is, finding an optimal tree by evaluating the win/loss distribution across every possible decision tree.

    In lack of better ideas, lets try to grow a tree on best effort and stop when things become impossible.
    We want to accommodate for as many spy locations as possible, so let’s start by considering two spies at 1,2 and make it very hard for them by not giving them any choice opportunities.
    We can do this by excluding them from the team three missions in a row.
    This way the resistance win all three missions, and hence the game.


    Now consider spies at location 1,3. We want to block all possible bad paths for the resistance by choosing new nodes appropriately. Given the current tree, the two first missions will automatically succeed since they do not include any spy. Once we reach (3) there will be a spy in the team but he cannot afford to succeed the mission without losing the game so he is forced to fail the mission. We can now add a node appropriately on the Fail-branch of (3) to make the next mission go in favour of the resistance – and hence the game.


    Next lets consider spies at location 1,4.
    Now things are getting slightly more constrained for us, but it is still possible to make the spies lose.
    We have two paths to consider given the current tree.
    If spies choose to fail the mission in (2) then we can make sure the resistance win the next two missions by choosing nodes appropriately.
    If spies choose to succeed (2) then they cannot afford to succeed (3) without losing the game.
    If they fail (3) then they will lose (4) and hence the game, so there is nothing more to do for this case.


    We can still do more.
    Consider spies in locations 2,3
    The first two missions will automatically succeed given current tree.
    In (3) we have our first spy in the team, but he cannot afford to lose the mission without losing the game.
    Moving on to (4) we have another spy in the team, and likewise he cannot afford to lose the mission so he needs to fail it. We can now add in another node to the fail branch of (4) to win the next mission for the resistance.


    It is now impossible to continue.
    Any other spy configuration can follow the path (1) \rightarrow (2) \rightarrow (3) \rightarrow (4) \rightarrow (7) to fail 3 missions.
    Thus remaining nodes are redundant and can be chosen arbitrarily.

    This does not constitute a proof of optimality (within the strategy) but I suppose it might be possible to formalize an argument along these lines.
    That 4 out of 15 could be optimal for any decision tree is also supported by the heuristic search algorithm I implemented to evolve a best possible solution tree (see Genetic Algorithms for more details).
    The tree generated by the algorithm can be seen in the full decision tree depicted above.
    Can you spot which spy configurations are the winning ones for the resistance?

    The argument for 5 players work in a similar fashion, I will leave this as an exercise.

    Out of accident I noticed that if we change the rules so that the spies must win 4 out of 5 missions to win the game then the strategy described in this post yields a completely fair game in the case of 5 players, and a near fair game (46.66667 \% for resistance) in a 6 player game.

    The game we have considered in this post is the absolute vanilla version of The Resistance.
    There are many other interesting variations of this game.
    Particularly the version with a “Merlin”, who is part of the resistance and know who the spies are.
    Spies are unaware of who the Merlin is and are given a chance at the end of the game to guess who it is.
    If they guess correctly then the spies win the game, regardless of whether the resistance have won more missions.
    So the Merlin character is under very high pressure during the game because on one hand he needs to communicate who the spies are to the other resistance members, but on the other hand this communication cannot be too obvious because then he gives the game away.
    It might be interesting to analyze how much this changes the winning probabilities for either team.
    Regardless of what happened during the game the spies have a 25 \% chance at taking home the game by pure luck (in a 6 player game) which intuitively seems unfair.

    Posted in Game Theory, Uncategorized | Tagged , , , , , | 3 Comments

    Not That Good, Will Hunting

    Many of you have probably seen or heard of the movie Good Will Hunting, starring eminent actors Stellan Skarsgård, Matt Damon and Robin Williams.
    I am not going to give a long film review though I will briefly need to set context for the problem.

    Will Hunting (played by Matt Damon) is a janitor at MIT who supposedly sits on extreme mathematical talent. One day Will Hunting solves a graph theory problem on a board in the hall posed by Professor Gerald Lambeau (Stellan Skarsgård) as a serious challenge to his students after somebody mysteriously (Will Hunting of course) had solved the initial problem set by the professor.
    The professor announces to the class that it recently required him and his colleagues over 2 years to prove the result, whereas Will Hunting seemingly writes down the answer in a few minutes at the end of his working shift.


    Of course Will Huntings mathematical ability is blown out of proportion in the movie.
    The actual problem itself is hardly a complicated research problem these days and was stated as follows:


    Problem (Will Hunting – Irreducible Trees):
    How many (non-isomorphic) homeomorphically irreducible trees are there on 10 vertices?

    The graph theory lingo used in the statement probably requires some explanation.
    First of all if a vertex v is incident to d edges, we say that the degree of v is d.
    A tree is a special kind of graph which is connected and has no cycles.

    A tree is homeomorphically irreducible if it has no vertex of degree 2.

    We are looking for non-isomorphic instances of homeomorphically irreducible trees.
    That is, trees which are essentially different in the sense that there is no bijective mapping f:V_1 \rightarrow V_2 between the vertices of the two graphs that preserve the edge structure i.e (v,w) is an edge in the first graph iff (f(v),f(w)) is an edge in the second graph.
    So in plain language two trees are non-isomorphic if one cannot get from one tree to the other by rotating or moving edges around in such a way that the same vertices remain adjacent.


    The answer is 10.

    Although Will identifies 8 of the 10 trees before he is chased away, he doesn’t actually answer the real question which is why the trees he found are non-isomorphic?

    In fact the more general question of how many homeomorphically irreducible trees there are on n vertices has been answered in terms of the coefficients of a generating function, in a paper entitled “The number of homeomorphically irreducible trees, and other species” by Harary and Prins published in Acta Mathematica 1959.

    The first few terms of that function is given by x + x^2 + x^4 + x^5 + 2x^6 + 2x^7 + 4x^8 + 5x^9 + 10x^{10} + 14x^{11} + 26x^{12} + O(x^{13}).

    I was thinking instead of proving the statement directly using a more elementary case argument on the vertex degrees. 10 vertices is small enough to make this feasible by hand.

    Any tree on n vertices has n-1 edges (Exercise 1).
    Thus there are 9 edges in a tree of 10 vertices, and since each edge by definition is incident to two vertices the total sum of all degrees must be 9.2 = 18 (The general result is known as the Handshaking Lemma).
    Let d_i denote the degree of vertex v_i for i = 1,...,10.
    W.l.o.g assume d_1 \geq d_2 \geq ... \geq d_{10}.
    This is btw called a degree sequence.
    Since a tree is connected clearly d_i \geq 1 and so d_i \leq 9 \;\; \forall i.

    What we need is a systematic way of distinguishing non-isomorphic trees from each other.
    Degree sequences will lead us in the right direction, for if two graphs have different degree sequences then they cannot be isomorphic i.e degree sequence is an isomorphic graph invariant (Exercise 2).
    Note that the converse is not true i.e same degree sequence on two graphs does not imply isomorphism (we will see examples below).

    Fortunately there are other graph invariants we can throw at the tree until we are able to fully distinguish it from the rest. As one would expect isomorphisms preserve distance between nodes in a tree, this is an immediate consequence of the definition of isomorphism (see Exercise 3). In particular since isomorphisms preserve the degree of a vertex, the distances between vertices of the same degree must remain unchanged in the isomorphic graph (we will use this fact later).

    Let’s now find the trees by systematically enumerating the degree sequences using the information we have:

  • d_1 = 9
  • The only possible degree sequence is 9 \geq 1 \geq 1 ... \geq 1 since d_i \geq 1\;\; \forall i.
    There is only one tree up to isomorphism with this degree sequence.
    See tree below:

  • d_1 = 8
  • Clearly this case yields no homeomorphically irreducible trees since again, given that that d_i \geq 1\;\; \forall i the only possible degree sequence is 8 \geq 2 \geq 1 ... \geq 1.
    This means there must be a vertex of degree 2 which is forbidden.

  • d_1 = 7
  • The possible degree sequences are:
    7 \geq 3 \geq 1 \geq ... \geq 1
    7 \geq 2 \geq 2 \geq 1 \geq ... \geq 1
    Only the the first leads to a homeomorphically irreducible tree, and there is clearly only one tree with this degree sequence up to isomorphism.
    See tree below:

  • d_1 = 6
  • The possible degree sequences are:
    6 \geq 4 \geq 1 ... \geq 1
    6 \geq 3 \geq 2 \geq 1 \geq ... \geq 1
    6 \geq 2 \geq 2 \geq 2 \geq 1 \geq ... \geq 1
    Only the the first leads to a homeomorphically irreducible tree, and there is clearly only one tree with that degree sequence up to isomorphism.
    See tree below:

  • d_1 = 5
  • The possible degree sequences are:
    5 \geq 5 \geq 1 ... \geq 1
    5 \geq 4 \geq 2 \geq 1 \geq ... \geq 1
    5 \geq 3 \geq 3 \geq 1 \geq 1 \geq ... \geq 1
    5 \geq 3 \geq 2 \geq 2 \geq 1 \geq ... \geq 1
    5 \geq 2 \geq 2 \geq 2 \geq 2 \geq 1 \geq ... \geq 1

    Only the first and third degree sequences are free from a degree 2 vertex.
    The first degree sequence clearly leads to one tree up to isomorphism:

    For the third degree sequence there are five possibilities since there are 3 non-leaf vertices two of which have the same degree:
    (1) v_2, v_3 are children of v_1
    (2) v_2 is a child of v_1 and a parent of v_3
    (3) v_1, v_2 are children of v_3
    (4) v_1 is a child of v_3 and a parent of v_2 or vice versa (5)

    Inspecting these cases we see that we get two non-isomorphic trees:
    (4) can be moved around to match (1) showing they are isomorphic (how?)
    (3) and (5) can be moved around to match (2) showing they are isomorphic (how?)
    (1) and (2) are not isomorphic since the distance between the degree 3 vertices is 2 in (1) and 1 in (2).
    See trees below:

  • d_1 = 4
  • The possible degree sequences are:
    4 \geq 4 \geq 3 \geq 1 \geq ... \geq 1
    4 \geq 4 \geq 2 \geq 2 \geq 1 \geq... \geq 1
    4 \geq 3 \geq 3 \geq 2 \geq 1 \geq ... \geq 1
    4 \geq 2 \geq 2 \geq 2 \geq 2 \geq 2 \geq 1 \geq ... \geq 1

    Only the first degree sequence leads to a homeomorphically irreducible tree.
    We have five possibilities to consider for this degree sequence since there are 3 non-leaf vertices, two of which have the same degree:
    (1) v_2, v_3 are children of v_1
    (2) v_2 is a child of v_1 and a parent of v_3 or vice versa (3).
    (4) v_1, v_2 are children of v_3
    (5) v_1 is a child of v_3 and a parent of v_2 (the reverse case is equivalent since v_1 and v_2 have same degree)

    Inspecting these cases we see that we get two non-isomorphic trees:
    (2) and (5) can be moved around to match (1) showing all three are isomorphic.
    (4) can be moved around to match (3) showing they are isomorphic.
    (2) is not isomorphic to (3) because the distance between the degree 4 nodes in in (2) is 1 whereas the distance between the degree 4 nodes in (3) is 2.
    See trees below:

  • d_1 = 3
  • The possible degree sequences are:
    3 \geq 3 \geq 3 \geq 3 \geq 1 \geq ... \geq 1
    3 \geq 3 \geq 3 \geq 2 \geq 2 \geq 1 \geq... \geq 1
    3 \geq 3 \geq 2 \geq 2 \geq 2 \geq 2 \geq 1 \geq ... \geq 1
    3 \geq 2 \geq 2 \geq 2 \geq 2 \geq 2 \geq 2 \geq 1 \geq ... \geq 1
    Only the first degree sequence leads to a homeomorphically irreducible tree.
    There are 4 different cases to consider for this degree sequence given all 4 non-leaves have degree 3:
    (1) v_2,v_3,v_4 are children of v_1
    (2) v_2 is a child of v_1 and v_3,v_4 are children of v_2
    (3) v_2,v_3 is a child of v_1 and v_4 is a child of v_2
    (4) v_2 is a child of v_1 and v_3 a child of v_2 and v_4 a child of v_3

    Inspecting these cases we see that we get two non-isomorphic trees:
    (1) can be moved around to match (2) showing they are isomorphic.
    (3) can be moved around to match (4) showing they are isomorphic.
    (1) and (3) are not isomorphic since (3) has a path of length 5 whereas (1) does not (isomorphisms are supposed to preserve distances).
    See trees below:

  • d_1 = 2
  • This case is degenerate by default since there has to be a vertex of degree 2.

  • d_1 = 1
  • This case too is degenerate since there are no trees on 10 vertices where the maximum degree of a vertex is 1.

    Hence we have exactly 1+0+1+1+3+2+2+0+0 = 10 homeomorphically irreducible trees on 10 vertices up to isomorphism.


    There were two more problems featuring in this film.
    Perhaps we will come back and complete the rest of Will Huntings homework in a future post.
    The problems are not terribly interesting, they’ve just received some popular fame because they appeared in the movie and turned out not to be as difficult as they were portrayed (don’t get fooled by Hollywood in other words).
    Nevertheless this is a great movie and it is fantastic to see that they used “actual” mathematics instead of dwelling over say, Fibonacci numbers or the number \pi
    (contrary to popular belief mathematicians don’t spend their careers looking for Fibonacci patterns in nature or memorizing decimals of \pi… ok, perhaps some do).

    Below are the graph theory exercises I have been referring to:

    Exercise 1 (Tree Edges – Easy):
    Prove that any tree on n vertices has n-1 edges.

    (Hint: Remove a leaf and argue by induction)


    Exercise 2 (Degree Sequence – Easy):
    Prove that isomorphic graphs have the same degree sequence.


    Exercise 3 (Node Distance – Easy):
    i) Prove that there is a unique path between any two vertices in a tree.

    The length of the path between two vertices v_1, v_2 is called the distance between v_1 and v_2.

    ii) Prove that the distance between any two vertices in a tree is preserved under isomorphism.

    Posted in Combinatorics, Uncategorized | Tagged , , , , , , | 1 Comment

    Conditional Convergence

    In this post I wanted to discuss an unintuitive fact related to infinite series which often leads to confusion among early students of Real Analysis.

    Even if you have not seen the formal definition of a series before you may have seen examples of them, such as the harmonic series \sum_{n=1}^{\infty} \frac{1}{n} or the infinite geometric series \sum_{n=0}^{\infty} a^n, perhaps calling them “infinite sums”.

    The notation using the greek letter sigma can be quite misleading, since series do not always behave as nicely as one might initially expect. We will discuss what strange things can happen if one is not careful.

    Formally, given an infinite sequence (a_n), a series is nothing but another special sequence (s_n) where s_n := a_0 + ... + a_n \;\;\;\forall n\in\mathbb{N}.
    This sequence is denoted \sum_{n=0}^{\infty} a_n and we call the s_n partial sums of the series.

    Apart from obvious issues, such as divergence to infinity (i.e the fact that the sum can grow uncontrollably large in either direction), there are other more subtle issues where series may behave differently from honest sums.

    Consider a permutation f:\mathbb{N} \rightarrow \mathbb{N} and a convergent series \sum_{n=0}^{\infty} a_n.
    We may ask: Does \sum_{n=0}^{\infty} a_n converge to the same value as \sum_{n=0}^{\infty} a_{f(n)}\;?
    Or to give a more concrete informal example:
    Is 1-\frac{1}{2} + \frac{1}{3} - \frac{1}{4} + \frac{1}{5} - \frac{1}{6} + \frac{1}{7} - ... +\frac{1}{n} - ... the same as (1 + \frac{1}{3}) - \frac{1}{2} + (\frac{1}{5} + \frac{1}{7}) - \frac{1}{4} + ... - \frac{1}{2n} + (\frac{1}{2n+3} + \frac{1}{2n+5}) - \frac{1}{2n+2} + ... \; ?

    Perhaps surprisingly the answer is No in general.
    The example I gave above was a rearrangement of the alternating harmonic series.
    It is a well known fact that this series converges to ln 2 and one can show that the rearrangement converges to something different, namely \frac{3}{2}ln 2.
    The diagram below shows the numeric value of the first 30 partial sums of either series.


    It seems like our intuition about series as ordinary sums break down here.
    In fact as we will show in the rest of this post, matters are even worse.
    It turns out that you can rearrange this series (and other series like it) to converge to any real number you like! If you wish you can even make it diverge to infinity.

    The defining property of series that converge in this way but are sensitive to rearrangement is called Conditional Convergence.
    Usually this property is stated in another way, as a series which is convergent but not Absolutely Convergent.
    A series \sum_{n=0}^{\infty} a_n is Absolutely Convergent if \sum_{n=0}^{\infty} |a_n| converges.
    The true meaning of this property is somewhat hidden behind this definition.

    As it happens,
    convergent series which are insensitive to rearrangement are precisely those which are absolutely convergent.

    Let’s prove this statement in one direction:

    Proposition 1:

    Let \sum_{n=0}^{\infty} a_n be an absolutely convergent series, converging to L.

    Then \sum_{n=0}^{\infty} a_{f(n)} also converges to L for every permutation f:\mathbb{N} \rightarrow \mathbb{N}.

    Informal Proof:
    Let s_n = \sum_{k=0}^{n} a_k and r_n = \sum_{k=0}^{n} a_{f(k)}.

    To show convergence our goal is to show that the quantity |r_n - L| can be made arbitrarily small for sufficiently large n, that is to say |r_n - L| \rightarrow 0 as n \rightarrow \infty.

    Since (s_n) converges to L we know that |s_n - L| \rightarrow 0 as n \rightarrow \infty.
    To understand if (r_n) also converges to L we should suggestively compare (r_n) with (s_n). For if (r_n) and (s_n) get very close to each other, knowing that (s_n) gets very close to L we ought to see that (r_n) gets very close to L as well for sufficiently large n.

    This type of idea in analysis is usually expressed via the triangle inequality:

    |r_n - L| = |r_n - L + s_n - s_n| \leq |r_n - s_n| + |s_n - L|\;\;\;(*)

    We know how to manage the quantity |s_n - L|.
    We want to know what happens to |r_n - s_n| as n grows large.

    Consider a fixed N \in \mathbb{N}.
    Since f:\mathbb{N} \rightarrow \mathbb{N} is a bijection (permutations are by definition bijections) \exists n_0,...,n_N \in \mathbb{N} s.t f(n_0) = 0, ..., f(n_N) = N.

    Let M := max\{n_0,...,n_N\} \geq N and consider the set of subscripts D := \{1,...,M\} \setminus \{n_0,...,n_N\} that do not belong to \{1,...,N\} after applying f.

    |r_M - s_N| = |\sum_{k=0}^M a_{f(k)} - \sum_{k=0}^N a_{f(n_k)} | = |\sum_{k \in D} a_{f(k)}| \leq^{(\bigtriangleup-ineq)} \sum_{k \in D} |a_{f(k)}| \leq^{(more\;positive\;terms)} \sum_{k=N+1}^{\infty} |a_k|

    This is where the hypothesis about absolute convergence kicks in.
    Absolute convergence implies \sum_{k=N+1}^{\infty} |a_k| \rightarrow 0 as N \rightarrow \infty.
    Thus since M \rightarrow \infty \; as \; N \rightarrow \infty we find that |r_n - s_n| \rightarrow 0 as n \rightarrow \infty.

    In combination with |s_n - L| \rightarrow 0 as n \rightarrow \infty we get from (*) that |r_n - L| \rightarrow 0 as n \rightarrow \infty.

    This shows that \sum_{n=0}^{\infty} a_{f(n)} converges to L.

    To prove the converse we need to show that all convergent series that are not absolutely convergent (i.e conditionally convergent) are non-invariant under rearrangement.

    From now on we will assume \sum a_n is conditionally convergent.

    As I hinted initially something stronger holds true for conditionally convergent series.
    Not only are the limits of these series sensitive to rearrangement, these series can be rearranged to converge to any real number. This result is commonly known as Riemann’s rearrangement theorem.

    The idea behind the proof is essentially to construct a rearrangement by indefinitely adding positive and negative terms from the original series in order to oscillate around the chosen real value, and show that these oscillations get smaller and smaller.

    Define a_n^+ = \begin{cases} a_n \;\; if\;\; a_n > 0 \\ 0 \;\;\;\;otherwise \end{cases} and a_n^- = \begin{cases} a_n \;\; if\;\; a_n < 0 \\ 0 \;\;\;\;otherwise \end{cases}

    There are a few technical issues that need to be resolved before we can claim the construction idea is well defined.

    1. Are there infinitely many non-zero a_n^+, a_n^- so we can indefinitely add positive and negative terms?
    2. Is \sum a_n^+ unbounded above and \sum a_n^- unbounded below so we have a chance at getting close to any real value?
    3. Does a_n^+, a_n^- both converge to 0 so that we can make oscillations arbitrarily small?

    Answer to all 3 questions need to be Yes for this construction to work.

    To see (1) we could argue by contradiction and e.g assume there are only finitely many non-zero a_n^- (the a_n^+ case works similarly). In fact let’s assume something even weaker and just suppose that \sum a_n^- is convergent.

    Then \sum |a_n| = \sum (a_{n}^{+} - a_{n}^{-})=\sum (a_{n}^{+} + a_{n}^{-})-2\sum a_n^-=\sum a_n -2\sum a_n^-

    By assumption both series on the right hand side converge, implying \sum |a_n| is convergent. But we assumed \sum a_n was not absolutely convergent so this is a contradiction. Therefore \sum a_n^- cannot converge and in particular this means the number of negative terms are infinite.

    From this we can also deduce (2) because \sum a_n^- is monotonically decreasing (being a series of non-positive terms) and if it was bounded below then a  standard result from real analysis (Monotone Convergence Theorem) says that \sum a_n^- needs to converge which we know it cannot. Hence \sum a_n^- cannot be bounded below.

    A symmetrical argument establishes (1) and (2) for \sum a_n^+.

    (3) follows directly from the fact that terms of convergent series necessarily tend to zero.
    To see this just write a_n = \sum_{k=0}^n a_k -\sum_{k=0}^{n-1} a_k and notice that the right hand side converges to zero as n \rightarrow \infty since both series converge to the same value.

    Hence we can answer all 3 questions in the affirmative.

    Now let’s sketch a proof of Riemann’s Rearrangement Theorem


    Riemann Rearrangement Theorem:
    Let \sum_{k=0}^{\infty} a_k be a conditionally convergent series and let L \in \mathbb{R}.

    Then there exists a permutation f:\mathbb{N} \rightarrow \mathbb{N} s.t \sum_{k=0}^{\infty} a_{f(k)} converges to L.

    Proof (Sketch):
    The proof is constructive (the best kind).

    Let L \in \mathbb{R}.

    By (2):

    since \sum_{k=0}^{\infty} a_k^+ is unbounded above we can choose the smallest n_1 \in \mathbb{N} s.t
    a_0^+...+a_{n_1}^+ > L.

    since \sum_{k=0}^{\infty} a_k^- is unbounded below we can choose the smallest m_1 \in \mathbb{N} s.t
    a_0^+...+a_{n_1}^++a_0^-...+a_{m_1}^- < L.

    We can continue to choose the smallest n_{2}\; \textgreater \;n_{1} s.t a_0^+...+a_{n_1}^++a_0^-...+a_{m_1}^-+a_{n_1+1}^+...+a_{n_2}^+ \; \textgreater \; L etc..

    Inductively if we denote P_i := a_{n_{i-1}+1}^+...+a_{n_i}^+ and N_i := a_{m_{i-1}+1}^-...+a_{m_i}^-
    we obtain a rearrangement P_1 + N_1 + ... + P_k + N_k + ... of \sum_{k=0}^{\infty} a_k s.t
    P_1 + N_1 + ... + P_k \; \textgreater \; L and P_1 + N_1 + ... + P_k + N_k \; \textless \; L \forall k \in \mathbb{N}.

    We can make this definition since we have an infinite supply of positive and negative terms by (1).

    An arbitrary partial sum in the rearrangement has one of below two forms:
    s_{n} = P_1 + N_1 + ... + P_k + N_k + a_{n_k+1}^+ + ... + a_{n_k+r}^+ or
    s_{n} = P_1 + N_1 + ... + P_k + a_{m_{k-1}+1}^- + ... + a_{m_{k-1}+r}^-.

    We want to show |s_n - L| \rightarrow 0 as n \rightarrow \infty.

    We clearly have P_1+N_1+...+P_k+N_k - L \leq s_n - L \leq P_1+N_1+...+P_k - L.

    By construction,
    P_1+N_1+...+P_k+N_k - a_{m_k}^- \geq L and
    P_1+N_1+...+P_k - a_{n_k}^+ \leq L.

    Therefore a_{m_k}^- \leq s_n - L \leq a_{n_k}^+

    By (3) we know that a_{n_k}^+, a_{m_k}^- \rightarrow 0 as k \rightarrow \infty so s_n -L is being squeezed between two sequences converging to zero and is therefore forced to converge to zero as well by the so called Sandwich Theorem.

    This shows that the rearranged series converges to L.
    Since L was arbitrary the result follows.


    To put back some intuition behind this phenomenon the proof tells us that conditionally convergent series are a result of two divergent series \sum a_k^+, \sum a_k^- cancelling out each other to allow for convergence.
    In the cancelling race between the two series we can give one series an “infinite head start” by pushing terms from the other series further away into infinity. This way we can accumulate terms quicker in the partial sums from one of the series, at a pace we set, and thus have it converge to anything we like. I am being purposely vague here, but hopefully this makes some intuitive sense.

    In a way one could think of the terms in a conditionally convergent series as being infinite streams of positive revenue and negative expenditure. By rearranging expenditures ahead in time we could alter our book balance (and in theory make infinite profit).
    Funny enough this is the basic idea behind Ponzi schemes.
    Of course in reality Ponzi schemes do not generate infinite profit.
    It inevitably falls apart because the criminal cannot find sufficiently many revenue sources to sustain the expenditure commitments of the scheme, so when the scheme finally starts to collapse the criminal has to run away with the money.
    The basic fallacy in applying this idea to real life is that things are usually not infinite even though people sometimes behave as if they are. Economic bubbles are a good real world example which behave very similar to Ponzi schemes but on a much larger scale. Eventually money is solely being made at expense of new investors entering the market, the profit being completely detached from the markets intrinsic value. Of course the number of investors nor their money is infinite, so sooner or later people are bound to lose money as the bubble implodes due to lack of new buying market participants (i.e there are only sellers left).

    Exercise 1( Alternating Harmonic Series – Intermediate):
    Prove that \sum_{n=1}^{\infty} (-1)^{n+1}\frac{1}{n} = ln(2).

    (Hint: Look at the partial sums s_{2n} and express it as a Riemann sum like in the previous post YAPP)


    Exercise 2( Conditional Divergence – Easy):
    Prove that every conditionally convergent series can be rearranged to diverge to infinity.


    Exercise 3( Huh? – Easy):

    What is the fallacy in below argument?

    0 = 0 + 0 + 0 + 0 + .... =
    (1-1) + (1-1) + (1-1) + (1-1) + .... =
    1 + (-1+1) + (-1+1) + (-1+1) + (-1+1) + .... =
    1 + 0 + 0 + 0 + 0 + .... = 1

    Posted in Analysis, Uncategorized | Tagged , , , , , , , , , , , | 1 Comment