## Lowest Unique Bid Auctions

A lowest unique bid auction operates under the following three main rules:

• Whoever bids the lowest unique positive amount wins.
• If there is no unique bid then no one wins.
• No participant can see bids placed by other participants.
•

As it turns out there are real sites out there that offer this type of auction, see e.g bidbudgie. Their income is usually generated by attracting lots of bidders and placing a charge for making bids. This way they are able to give away iPhones, Xboxes, Laptops and other expensive items for a tiny nominal amount and also make a profit provided enough people participate. In an article I read somewhere a gold bullion worth more than 1,000£ was sold away for only 1p because no one else bid that amount thinking it was “too obvious” (perhaps one of the best marketing gimmicks the auction company could ever wish for).

A natural question is: Are there any good strategies for participating in this auction?

On the face of it, this doesn’t look vastly different from any old lottery – you are paying a fee to choose a number on which you may or may not win.
One obvious difference is that this game has no proclaimed randomizing device such as would be found in a lottery to select between numbers, or as is the function of a deck in a card game or a die in a dice game. Instead chance is introduced exclusively via uncertainty about bids of other participants (i.e imperfect information). Therefore it seems more natural to take a game theoretic view rather than a purely probabilistic one since participants would want to take into account the strategy of others and adjust their choice accordingly. So instead of assuming a bidding distribution to model the situation we should be looking to understand what the distribution should be given some equilibrium state where everyone is reluctant to change their strategy because it would decrease their expected payoff.

Let’s make a game theoretic definition that distinguishes between two different kinds of strategies.

A pure strategy defines a specific move or action that a player will follow in every possible attainable situation in a game. Such moves may not be random, or drawn from a distribution, as in the case of mixed strategies (gametheory.net)

A pure strategy in this game is very simple since there is only one attainable situation in the game, and that is the choice of a unique fixed number $k$.
[Alternatively one can view this as a mixed strategy choosing $k$ with probability 100%]
To simplify the discussion a bit let us make the assumption that only one bid per person is allowed so that we may assume bids are placed independently of each other. We will discuss the general case briefly at the end of this post.

Let us first consider whether there can be a “magic number” which you could always select beforehand to maximize your chances of winning the auction.
In other words we are asking whether an optimal pure strategy exists.
Suppose an optimal pure strategy did exist.
Then every other player would want to adopt the same strategy since it is optimal.
But that means they will all choose the same number and are therefore guaranteed to lose the auction.
So any strategy choosing a different number would be better, and so the original strategy couldn’t have been optimal since we have found a better strategy.
Hence there can be no optimal pure strategy.

So having ruled out the existence of good pure strategies we turn our attention to mixed strategies. These strategies make choices from random distributions, i.e consist of some convex combination of pure strategies.

We want a strategy that somehow describes how rational participants should play this auction. One such definition is that of a Nash Equilibrium.
Informally a set of strategies is a Nash equilibrium if no player can do any better by unilaterally changing his or her strategy.

Let’s digress for a bit to discuss this concept and its potential weaknesses.

Game Theory Digression:

You may have seen the movie A Beautiful Mind featuring actor Russel Crowe which is a portrayal of the life of Nobel Laureate John Nash who invented this equilibrium.
In the movie it is (wrongly) illustrated by how a group of guys have nothing to gain by all hitting on the prettiest woman in the bar. If they all go for the prettiest woman then they would all supposedly block each other.
Thus no one gets lucky with the pretty woman nor with any of the friends she came with since no one likes to be deemed second choice. But if they all go straight for her less attractive friends then they would all stand a better chance at achieving their objective (only a guy could have invented this analogy:-).
However the Nash Equilibrium says the complete opposite in this case, that everyone should go for the pretty woman (assuming pure strategies). Whoever deviates from the strategy where no one goes for the pretty woman would actually improve, which cannot happen in an equilibrium state. The only strategy from which no one can improve is the strategy where everyone hits on the prettiest woman and presumably get rejected, despite mutual cooperation yielding a better outcome as is the situation described in the movie. Of course there are a whole series of unfounded social assumptions behind this analogy, like the fact that the women at the bar would automatically settle for the first (and only) guy that approaches them etc…

The situation described in the movie may possibly be viewed as a slightly twisted, more generalized version of the two player prisoner dilemma which is a classical problem in Game Theory. This game was actually played by the two final contestants in a British game show called ‘Golden Balls’ (< Insert immature joke >).
See below clips for interesting extracts from two different episodes.

In the second clip the contestant is essentially betting that greed is a stronger incentive than retaliation.
By announcing his intentions to steal and then offering to split the money afterwards, he is protecting himself from getting backstabbed i.e the burden of trust has been transferred to the other contestant. The second contestant has no reason to disbelieve the stealing intentions since if the first contestant did the opposite and decided not to steal, that would only be to the benefit of the second contestant. The second contestant is thus presented with a choice of not winning anything, or potentially winning half by trusting that the first contestant will split the money after the show.
In effect the first contestant wants to switch from playing the prisoners dilemma to playing the ultimatum game where the other contestant has every incentive to accept any positive amount from a game theoretic standpoint, yet alone a 50/50 offer.
Of course the second contestant doesn’t have to accept playing this game.
If he wanted to get out of his currently dominated position, he would have had at least two rebuttals.
One is to immediately turn the game into deadlock by offering the same ultimatum back.
The second more bolder move would be to instead play the familiar chicken game by undercutting the previous offer by e.g saying, “I am now fully committed to steal as well and will give you only 40% of the money if you cooperate, are you prepared to continue playing or do you want to cut your losses?”. Note that this game could easily deteriorate as each of the contestants offer smaller and smaller cuts of the jackpot, until they are both left with nothing and forced to steal.
Things would be much easier if the oral agreement of splitting the money after the show was legally binding, but I reckon it can’t be in the context of the game show.

So now back to our auction.
To simplify discussion again we shall only be looking for symmetric mixed strategy Nash equilibria i.e an equilibrium where everyone chooses the same strategy in such a way that anyone deviating from it would be worse off. We shall also require that only a countable number of bids are allowed, otherwise the auction gets ridiculous.
Online the convention seems to be that the smallest increment allowed is 1 cent (or 1p if it is a UK auction).

Finding the symmetric mixed strategy Nash equilibrium:

Suppose $S = (S_1,S_2,S_3,S_4,.....)$ is a mixed strategy where $S_i$ represents the probability of bidding $i$ cents.

Then $S_i \geq 0\;\; \forall i$ and $\sum \limits_{i=1}^{\infty} S_i = 1$.

Suppose you plus $n$ other people are participating in a lowest unique bid auction where everyone is using the same equilibrium strategy $S$.

Denote by $\mathbb{P}(win|j)$ your probability of winning by bidding $j$ cents.
Qualitatively $\mathbb{P}(win|j)$ is given by the probability that there is no unique person bidding less than $j$ cents and no other person bidding exactly $j$ cents.

By the partitioning rule $\mathbb{P}(win|S) = \sum \limits_{j=1}^{\infty} S_j \; \mathbb{P}(win|j$).

If $\exists j'$ s.t $\mathbb{P}(win|S) < \mathbb{P}(win|j')\;$ then adjusting the distribution so that $S_{j'} = 1$ is an even better strategy than $S$ but that is not allowed since $S$ is assumed to be a Nash Equilibrium.

Therefore $\mathbb{P}(win|S) \geq \mathbb{P}(win|j)\; \forall j$.

Let $j^*$ be such that $\mathbb{P}(win|j^*) = \max \limits_{j}\; \mathbb{P}(win|j)$.

Since $0 \leq S_j \leq 1\; \forall j$ it follows that $\mathbb{P}(win|S) \leq \mathbb{P}(win|j^*)$.
If $\exists j'$ s.t $S_{j'} > 0$ and $\mathbb{P}(win|j') < \mathbb{P}(win|j^{*})$ then adjusting the distribution by moving the positive weight over from $S_{j'}$ to $S_{j^{*}}$ would yield yet another improvement upon $S$ which cannot happen by Nash equilibrium assumption.

Therefore we must have $\mathbb{P}(win|j) = \mathbb{P}(win|j^{*}) \; \forall j$.

We have left to find an explicit expression for $\mathbb{P}(win|j)$.
As we said earlier $\mathbb{P}(win|j)$ is given by the probability that there is no unique person bidding less than $j$ cents and no other person bidding exactly $j$ cents.

There are two parts to work out.
One is the probability that $n-k$ people will choose a number greater than $j$ which is easy to find, it is just equal to $(1- \sum \limits_{i=1}^{j} S_i)^{n-k}$.
The second is that remaining $k$ people will choose a number less than $j$ such that there is no unique bidder.
This is somewhat more awkward since we first need to look at all possible combinations of $k$ people choosing a number less than $j$, a thing which is generated by $\left ( \sum \limits_{i=1}^{j-1} S_i \right )^k$, and then remove all combinations with unique bidders (i.e terms with $S_i$‘s of power 1), which we obtain via $\sum \limits_{i=1}^{j-1} S_i\left ( \sum \limits_{ r = 1, r \neq i}^{j-1} S_r \right )^{k-1}$.

In all we obtain the following rather nasty looking expression
$\mathbb{P}(win|j) = (1- \sum \limits_{i=1}^{j} S_i)^{n} + \sum \limits_{k=1}^{n} \left( \begin{array}{c} n \\ k \end{array} \right) \left [ (1- \sum \limits_{i=1}^{j} S_i)^{n-k} \left ( \left ( \sum \limits_{i=1}^{j-1} S_i \right )^k - \sum \limits_{i=1}^{j-1} S_i\left ( \sum \limits_{ r = 1, r \neq i}^{j-1} S_r \right )^{k-1} \right ) \right ]$

So to find the equilibrium strategy $S = (S_1,S_2,S_3,S_4,.....)$ we should solve the infinite non-linear equation system:
$\begin{cases} \sum \limits_{i=1}^{\infty} S_i = 1 \\ S_i \geq 0\;\; \forall i \\ \mathbb{P}(win|i) = \mathbb{P}(win|j)\;\; \forall i,j \end{cases}$

In the minimal non-trivial auction with yourself and two other participants (i.e $n=2$) we would get
$\mathbb{P}(win|j) = \left (1 - \sum \limits_{i=1}^j S_i \right )^2 + \sum \limits_{i=1}^{j-1} S_i^2$.

This means that either both other participants choose any number greater than $j$ or both choose the same number less than $j$. We can use Wolfram Alpha to compute an approximate solution to a finite truncation of this system which gives: $S_1 \approx 0.457784, S_2 \approx 0.251643, S_3 \approx 0.145286, S_4 < 0.145286$.
Unfortunately 4 variables is as far as you get before you run out of allotted (free) compute resources on the site. I should probably need to use Mathematica, Maple or some other math package to take a serious stab at computing a good distribution. Provided these numbers are ballpark accurate we could draw the conclusion that you should choose the least possible bid a little less than half the time and the next least bid approximately a quarter of the time in a rational auction with three people. Computing further points on the distribution I’m guessing there exists a tiny positive probability of even choosing one million in this equilibrium strategy which may seem rather surprising, both from a utility perspective and from an auction winning perspective. However on the former point we have not accounted for actually paying the bid in case we win, we are only concerned with winning the auction.

Within most lowest unique bid auctions in the real world you are allowed to place multiple bids. Perhaps it might be interesting to discuss some potential heuristics for participating in such an auction.

On the auction site I mentioned earlier (bidbudgie.co.uk) they auction MacBook Pro Laptops and charge 3£ per bid. They claim the laptop is worth 1,889£ (according to RRP). That means they need to attract at least 630 bids to make a profit assuming 1,889£ is what they paid for the laptop (which is probably not entirely true). Looking at it’s current status on the site at the time of writing, I see there are 234 bids already from 57 unique bidders after 37% of the auction time has elapsed.

The site lets you know your rank in the auction if your bid is unique (meaning if you still have a chance at winning). The site also gives you the opportunity to buy information such as whether your bid is too high or too low in relation to the current winning bid, which costs 20p.
This means you can locate the current winning bid using binary search for under 10x(3£ + 0.2£) assuming winning bid is under 10.24£.

Looking at the tips & tricks page on the site I can see that they have a recommended strategy which is to place bids according to some arithmetic progression, say 0.13p, 0.23p, 0.33p, … , 1.13£. Then starting from your lowest unique bid you have an idea in what range the winner is currently in. You can then for example place multiple bids from 1p up to your lowest unique bid to eliminate all other players making you the current winner. Of course if you are the only person employing this strategy it may not be very expensive in comparison to the prize. However if everyone is doing this (which presumably is what the site wants) then playing this game can get very expensive and you might get carried away.

Let’s look at another crude strategy.
Suppose you are willing to invest 300£ in buying this laptop which would still save you 1,589£ if you win.
Moreover let’s say you pick all bids from 1p to 1£.
How likely is it that you will win this auction if there are 1000 bids placed on the laptop and everyone except you bids no more than once?

In a paper entitled Equilibrium strategy and population-size eﬀects in lowest unique bid auctions [Pigolotti et al (2012)] the effects of the number of participants is investigated on the symmetric Nash Equilibrium. The paper establishes that provided the number of bidders in the auction varies according to a Poisson distribution then the number of bids on each number is also Poisson in a symmetric Nash Equilibrium. When compared to real data it was found that in the region of 200 bidders this model approximates the empirical distribution fairly well and that the winning chances are evenly spread among all numbers. This means the auction behaves like a lottery for this region. For a larger number of participants they found the data begins to deviate from the model and starts following an exponential distribution because players fail to adapt to the optimal strategy and so the winning chances are highly dependent on the number of participants. On page 4 they display a very interesting log distribution contrasting the empirical winning number from theoretical.

Reading the graph it appears the winning amount should be somewhere around 80p to 1£ with 1000 participants, both from a theoretical standpoint and typically according to the empirical data (if I am interpreting it correctly). So placing a series of bids in that region for the laptop may not be such a terrible investment in an auction with a moderate number of participants where you are the most aggressive bidder.

## Dynamic Programming

In computer science Dynamic Programming is a powerful algorithmic paradigm that can help solve seemingly intractable problems in viable ways.

The rather confusing name ‘Dynamic Programming’ originates from quite an amusing story.
First of all the word ‘Programming’ comes from a time when programmers weren’t programming computers, but were rather planners for logistic purposes (and such).
It is said that Richard Bellman invented the term as some sort of euphemism to avoid pissing off his military boss who hated the word ‘research’ and would turn violent if anyone used it in his presence. So to hide the mathematical character of his work he needed a term that wouldn’t get him into trouble for working on the planning of optimal multi-stage processes, so ‘Dynamic Programming’ it was.

In many algorithmic problems it is common to split a problem down into smaller sub-problems, solve each sub-problem and then re-combine solutions to a full solution for the original problem. The paradigm I just described is commonly referred to as Divide-and-Conquer. The methods to achieve this can vary, but very often some naive form of recursion can be thought out with relative ease. Unfortunately such solutions may lead to an exponential explosion of sub-problems if you allow the recursion to branch.
Certain problems however have the property that generated sub-problems begin to repeat themselves in an overlapping manner. This means the recursive solution does a lot of unnecessary work since it would be possible to avoid re-solving the same problem over and over again if the solution was saved the first time it was encountered and subsequently looked up in a table.

Dynamic Programming algorithms exploit this overlapping property in the way described above to create more efficient solutions.
For these reasons Dynamic Programming solutions can often bring down the runtime of a naive exponential time algorithm to polynomial time. For all practical purposes this can mean the difference between usable and not usable.

A basic example is that of computing the $n^{th}$ term of the Fibonacci sequence.

Problem 1 (Computing The Fibonacci Sequence)

The Fibonacci Sequence is defined via the recursion $FIB(n) = FIB(n-1) + FIB(n-2)$ where $FIB(1) = FIB(2) = 1$.

The recursive algorithm is straightforward and presented below in pseudo language.

$FIB(n)$
1. If $n \leq 2$
2. $return(1)$
3. Else
4. $return( FIB(n-1) + FIB(n-2) )$

As you can see there are going to be many overlapping subproblems when the recursion tree is expanded.
If we let $T(n)$ represent the number of additions we have to perform to compute $FIB(n)$ we get that $T(n) = T(n-1) + T(n-2) + 1$. This is a so called non-homogeneous linear difference equation which can be solved for $T(n)$ using methods I outlined in this post (note that so could $FIB(n)$ if we wanted to).
Most importantly however is the realization that $T(n)$ grows exponentially, which can readily be seen from the picture above.

If we save solutions to old subproblems as opposed to recomputing them, it seems like we should be able to bring down the number of additions drastically. Therefore instead of recursively finding a solution top down, we construct a solution bottom up, referring back to smaller sub-problems we solved along the way.
The bottom up approach is generally what characterizes Dynamic Programming algorithms.

Here is pseudo code for a Dynamic Programming solution to the Fibonacci-Recurrence

$FIB(n)$
1. $F[1] \leftarrow 1$
2. $F[2] \leftarrow 1$
3. For $i \leftarrow 3$ to $n$
4. $F[i] \leftarrow F[i-1] + F[i-2]$
5. $return( F[n] )$

Now our sub-problem graph looks much smaller:

In fact it is easy to see that only $O(n)$ number of additions are necessary using this algorithm.

Worth noting is that it would have also been possible to pass up solutions to smaller sub-problems in a
top down recursion. This technique is commonly known as memoization (no spelling error).
Each recursive call leaves a ‘memo’ for its parent caller with intermediate results so it can avoid expanding unnecessary branches. The main difference is that memoization is lazy which roughly means it won’t do any evaluation until it must. This can be problematic in practice because of stack depth limits. Dynamic Programming gives you a little bit more freedom as to which order sub-problems should be computed although memoization often feels more “natural” to people which is why some people prefer it.

One of the hallmark applications of Dynamic Programming is in solving optimization problems.
The problem should, apart from having overlapping sub-problems also be re-combinable. That is to say, optimal solutions to smaller problems make up optimal solutions to bigger problems. An optimization problem with this property is said to exhibit optimal substructure. These two properties together are pre-requisites for Dynamic Programming solutions.

Problem 2 (Longest and Shortest Paths)

Not all problems exhibit optimal substructure.
Consider the problem of finding the longest simple path in a directed graph with positive weights.

On the other hand the shortest simple path problem in a directed graph with positive weights has got optimal substructure.

The reason this argument doesn’t work for the Longest Simple Path problem is that concatenating two longest sub-paths may result in loops which are not allowed since paths have to be simple. On the other hand a path that has a loop can never be shortest (removing the loop yields a shorter path).

There are many algorithms for computing shortest paths in a directed graph.
Perhaps among the most well-known is Dijkstra’s Algorithm which only works for graphs with positive weights.
This algorithm exploits the Greedy Choice Property of the problem (which is for a different discussion).
The Dynamic Programming solution works by solving sub-problems of finding the lengths of all shortest paths from a source $s$ to any given node $v$ using at most $i$ intermediate nodes.
The algorithm can also be easily modified to output the actual path by maintaining a successor for every node.

This is what the Dynamic Programming algorithm looks like in pseudo language:

$SHORTEST-PATH(G,n,s,d)$
1. ForEach $node\; v \in V[G]$
2. $D[v][0] \leftarrow \infty$
3. $D[s][0] \leftarrow 0$
4.
5. For $i \leftarrow 1$ to $n-1\;\; \{$
6. ForEach $v \in V[G]$
7. $D[i][v] \leftarrow D[i-1][v]$
8. ForEach $(v,v') \in E[G]$
9. $D[i][v] \leftarrow min\{ D[i][v], D[i-1][v] + d(v,v') \}$
10. $\}$
11. $return( D )$

It’s runtime is $O(|V[G]||E[G]|)$ and its memory requirements $O(|V[G]|^2)$.

There is an improved version of the above code that’s called Bellman-Ford algorithm.
The worst-case runtime complexity of this algorithm is still $O(|V[G]||E[G]|)$ but runs substantially faster in practice. Its memory requirement is reduced to $O(|V[G]| + |E[G]| )$.
Like the above algorithm it can also handle negative weights as long as there is no negative cycle in the graph.

Also worth mentioning is the Floyd-Warshall algorithm which computes the length of the shortest path between every pair of nodes in a graph. This is also a Dynamic Programming algorithm which successively computes the length of the shortest path between nodes $i$ and $j$ given that the path only uses nodes from $\{1,...,k\}$.
The input is a weighted $n \times n$ adjacency matrix $G$.

$FLOYD-WARSHALL(G)$
1. $n \leftarrow \#Rows[G]$
2. For $i \leftarrow 1$ to $n$
3. For $j \leftarrow 1$ to $n$
4. $D[i][j][0] \leftarrow G[i][j]$
5.
6. For $k \leftarrow 1$ to $n$
7. For $i \leftarrow 1$ to $n$
8. For $j \leftarrow 1$ to $n$
9. $D[i][j][k] \leftarrow min\{ D[i][j][k-1], D[i][k][k-1] + D[k][j][k-1] \}$
10.
11. $return(\;(D[i][j])[n]\;)$

Both runtime and memory are clearly $O(n^3)$ above.
However memory can be reduced to $O(n^2)$ by removing the dimension for $k$ from $D$ (why?)
The algorithm can also be modified to output the shortest paths and not just their length (how?)

In the remainder of this post we will look at various other instances of Dynamic Programming problems.

Problem 3 ( The Hiring Problem )

Suppose you are a company manager and have just received an augmented budget to recruit new people into your team to compensate for the increased work load.

The company you work for is popular so there are an abundant number of people applying with all kinds of backgrounds and experience. It is expected that candidates with certain types of profile will provide more ‘value’ (a subjective integer metric) than others, but at the same time eat up more/less of the budget.

You have $n$ different types of profile categories, an array $v$ of the expected value they will provide and an array $c$ of how much they typically cost (in thousands of dollars).
Your additional budget is $B$ thousand dollars and you may choose any number of candidates from each profile category (as long as budget is not breached).

You are tasked to come up with a set of new hires that will provide the most value to the team.

Give an algorithm to compute the optimal hires and their total value added.

Note: This problem is more commonly referred to as the Integer Knapsack Problem with the popular metaphor being what value items a burglar should put in his knapsack if he is only capable of carrying a certain maximum weight.

Solution

The mathematical formulation of this problem can be expressed via the following Integer Program

Integer Knapsack Problem

Maximize $\sum \limits_{j=1}^n x_jv[j]$

subject to:

$\sum \limits_{j=1}^n x_jt[j] \leq B$

$x_j \geq 0, \;\;\;\;j \in \{ 1,...,n\}$

$x_j \in \mathbb{Z}, \;\;\;\;j \in \{ 1,...,n\}$

Let us check first whether this problem even needs a Dynamic Programming solution.
Couldn’t we just be greedy and choose the profiles that maximize value per cost?
The answer is no in general:

$REC-KNAPSACK(c,v,B)$
1. $return( max_j\{ \; c[j] \leq B \; ? \; REC-KNAPSACK(c,v,B-c[j]) + v[j] : 0\;\} )$

This algorithm has exponential worst case runtime complexity!

Let’s investigate a Dynamic Programming solution.
What are the sub-problems?

Define $D[k] :=$ Maximum value given budget $k$

Clearly if the optimal solution to $D[k]$ involves hiring a profile $j$ candidate then removing this hire leaves us with the problem of finding the optimal solution to $D[k-c[j]]$.
Since the optimal hire has to be one of the profiles which can yet fit into the budget we should look for $max_j\{ D[k-c[j]] + v[j] : c[j] \leq k \}$ similarly to how we did for the recursive solution.
However this time we also remember the optimal solutions to the smaller sub-problems.

In a sense if we look at the budgets as nodes and let the costs describe transitions between the nodes weighted by the profile value then this problem is equivalent to finding the longest path in a directed graph with positive weights.
But did we not say earlier that this problem hasn’t got the optimal substructure property?
The difference here is that the graph is acyclic and this eliminates the problem described earlier with concatenation of longest simple sub-paths.
Hence when we are dealing with directed acyclic graphs(DAGs) we do in fact always have optimal sub-structure.
A lot (if not all) Dynamic Programming problems related to optimization can be reduced to the problem of finding the longest/shortest path in a DAG so it is well worth remembering how to solve this problem.

Finally we can present the Dynamic Programming algorithm for solving our problem:

$INTEGER-KNAPSACK( c,v,B )$
1. $D[0] \leftarrow 0$
2. For $k \leftarrow 1$ to $B$
3. $D[k] \leftarrow max_j\{ D[k-c[j]] + v[j] : c[j] \leq k \}$
4. $return(D[B])$

To output a team that achieves this maximum you can modify the algorithm as follows:

$INTEGER-KNAPSACK'( c,v,B )$
1. For $k \leftarrow 1$ to $B\;\;\{$
2. $D[k] \leftarrow 0$
3. For $j \leftarrow 1$ to $n$
4. If $c[j] \leq k$ and $D[k] < (D[k-c[j]] + v[j])\;\;\{$
5. $D[k] \leftarrow D[k-c[j]] + v[j]$
6. $T[k] \leftarrow j$
7. $\}$
8. $\}$
9. $return(T)$

Then you can follow pointers back and unfold the optimal solution via $T[B], T[B - T[B]], T[B - (T[B]+T[B-T[B]])],...$.

The algorithm has runtime complexity $O(nB)$ and $O(B)$ memory requirements.

Is that really good?
Well it is better than exponential, but the runtime is unfortunately dependent on the value of the input rather than just the size of it. For that reason we say that the complexity is pseudo-polynomial.
For general $B$ this problem is actually NP-hard.

Now let’s look at a slightly trickier problem.

Problem 4 (The Firing Problem)

This time you are also managing a team but the company you are working for is not doing very well so they need you to fire people in order to maintain a tighter budget for your team.

Given are $n$ people in your team with values $v$ and costs $c$.

Your new budget is $B$ thousand dollars.

Give an algorithm for computing the set of people you should keep in order to maximize value while not breaching budget.

Note: This problem is more commonly referred to as the Binary Knapsack Problem

Solution

This problem looks awfully similar to the previous problem.
There is however one important difference.
You can only keep or fire a person once!

The corresponding integer program is given by

Binary Knapsack Problem

Maximize $\sum \limits_{j=1}^n x_jv[j]$

subject to:

$\sum \limits_{j=1}^n x_jt[j] \leq T$

$x_j \in \{ 0,1 \}, \;\;\;\;j \in \{ 1,...,n\}$

In the previous problem we were able to define sub-problems as

$D[k] :=$ Maximum value given budget $k$

Here this doesn’t work, how do we know if we have already kept an employee or not?
If we have, we cannot keep the employee twice so it seems we need to enlarge our state space to take this into account.

We can make the following new definition:

$D[k][j] :=$ Maximum value given budget $k$ using only employees from $\{ 1,...,j \}$

When we increment the second parameter we have the value of the optimal team using one less employee for the same budget. The new employee either is, or is not part of an optimal team with the given budget.
Whichever it is this employee must be added to a team which is optimal with respect to the other employees before it.
So with a budget of $k$ we can either gain or not gain by adding employee $j$ to the optimal solution for $D[k-c[j]][j-1]$ over not adding employee $j$ to the optimal team using only employees in $\{ 1,...,j-1 \}$.

We thus get the following expression for the optimal value of a subproblem:

$D[k][j] := max\{ D[k-c[j]][j-1] + v[j], D[k][j-1] \}$

Hence we can construct the following Dynamic Programming algorithm:

$BINARY-KNAPSACK(c,v,B)$
1. For $k \leftarrow 1$ to $B$
2. $D[k][0] \leftarrow 0$
3. For $j \leftarrow 1$ to $n$
4. $D[0][j] \leftarrow 0$
5.
6. For $k \leftarrow 1$ to $B$
7. For $j \leftarrow 1$ to $n\;\;\;\{$
8. $T[k][j] \leftarrow 0$
9. If $c[j] \leq k$ and $(D[k-c[j]][j-1] + v[j]) > D[k][j-1] \;\;\{$
10. $D[k][j] \leftarrow D[k-c[j]][j-1] + v[j]$
11. $T[k][j] \leftarrow 1$
12. $\}$
13. Else
14. $D[k][j] \leftarrow D[k][j-1]$
15. $\}$
16. $return( T[B] )$

The optimal team is then found by iterating over $T[B]$ firing every employee $j$ such that $T[B][j] = 0$ and keeping the rest (hopefully there are no bugs).

The algorithm has runtime complexity $O(nB)$ and $O(nB)$ memory requirements.

The time management problem where $n$ tasks are given taking $t[j]$ hours to complete and have value $v[j]$ for $j = 1,...,n$ is also equivalent to the Binary Knapsack problem. The knapsack capacity is given by a global deadline $T$ and the problem is to pick out the tasks that will maximize value before hitting deadline. In general any problem which fits the integer program specification can be solved using the same algorithm.

The final problem along with its many variations is a very intriguing problem which occurs naturally in many real-world situations. Unfortunately its general formulation is very hard to solve (NP-hard in fact).
However if we make some simplifying assumptions it can be solved optimally in polynomial time using a Dynamic Programming approach (though still very unpractical for larger inputs).

Problem 5 (The Parallel Task Scheduling Problem)

Given are $n$ tasks of $k$ different types with multiplicities $m[j]$ and processing times $t[j]$ for $1 \leq j \leq k$

You have a requirement that all tasks need to finish before time $T$.

To achieve this the tasks need to be scheduled to run in parallel on a number of identical machines.

Give an algorithm for computing the minimum number of machines needed.

Solution

This problem is a restricted version of the general 1-dimensional Bin Packing Problem.
In the bin-packing problem the assumption that there can only exist $k$ different item-types is relaxed to allow for any number of item sizes.

This problem can be specified via the following integer program:

Restricted Bin Packing Problem

Minimize $\sum \limits_{i=1}^n y_i$

subject to:

$\sum \limits_{j=1}^k \sum \limits_{l=1}^{m[j]} t[j]x_{_{i},_{f(j,l)}} \leq y_iT,\;\;\;\;i \in \{ 1,...,n\}$

$f(j,l) = l + \sum \limits_{r=1}^{j-1} m[r]$

$\sum \limits_{i=1}^n x_{i,j} = 1, \;\;\;\;j \in \{ 1,...,n\}$

$y_i \in \{ 0,1 \}, \;\;\;\;i \in \{ 1,...,n\}$

$x_{i,j} \in \{ 0,1 \}, \;\;\;\;i \in \{ 1,...,n\},\;j \in \{ 1,...,n\}$

Here $x_{i,j} = 1$ if task $j$ is scheduled for machine $i$ and $y_i = 1$ if machine $i$ is used.
The first constraint says that the processing time for all tasks scheduled on machine $i$ should not exceed $T$ and the third constraint that each task is scheduled on exactly one machine.

It is quite clear that a naive brute-force solution to this problem is intractable.
Effectively we would have to enumerate all partitions of $n$ tasks (of which there are exponentially many), check that none of the subsets of the partition exceed the time limit (in linear time) and then take the minimum over all such feasible partitions.

What about a Dynamic Programming solution?

A natural choice for our sub-problems are

$D[j_1]...[j_k] =$ Minimal number of machines needed with $j_r$ tasks of type $r$

The tricky part is how we should transition from smaller to larger problem instances.
We cannot simply remove a task and ask what the optimal solution is to the sub-problem since the number of required machines could either increase or remain the same through some potential rearrangement, so there is no obvious way to build on top of a smaller optimal solution with a single new task.

The easiest way to get to a bigger instance from a smaller one is by stepping one machine at a time i.e if we know all possible ways of combining tasks into a single machine then the optimal solution to $D[j_1]...[j_k]$ is obtained from the optimal solution to $D[j_1-a_1]...[j_k-a_k]$ by considering all possible task tuples $(a_1,...,a_k)$ s.t $a_1t[1]+...+a_kt[k] \leq T$ fitting into one machine.
So an optimal solution can be obtained by stepping forward with one machine from some optimal solution with one less machine whichever way it may be, we must consider all tuples that could possibly be assigned to that machine since any of the extensions could be optimal.

Define the set $A := \{ (a_1,...,a_k) \in \mathbb{Z}^k: a_1t[1]+...+a_kt[k] \leq T, 0 \leq a_i \leq m[i], i = 1,...,k \}$ of all ways of filling one machine with tasks satisfying the constraint $T$. Then $A$ has size at most $O(n^k)$

We can compute $A$ in $O(n^k)$ time via the following algorithm (which if implemented properly should rather be recursively defined with a parameter k)

$INITIALIZE(t,m,k,T)$
1. $A \leftarrow \emptyset$
2. For $a_1 \leftarrow 0$ to $m[1]$
3. For $a_2 \leftarrow 0$ to $m[2]$
4.
5. For $a_k \leftarrow 0$ to $m[k]$
6. If $a_1t[1]+...+a_kt[k] \leq T\;\;\;\{$
7. $A \leftarrow A \bigcup \{(a_1,...,a_k)\}$
8. $D[a_1]...[a_k] \leftarrow 1$
9. $\}$
10. $return(A,D)$

We can now compute the minimum number of required machines via:

$RESTRICTED-BIN-PACKING(t,m,k,T)$
1. $(A,D) \leftarrow INITIALIZE(t,m,k,T)$
2. $D[0]...[0] \leftarrow 0$
3. For $j_1 \leftarrow 0$ to $m[1]$
4. For $j_2 \leftarrow 0$ to $m[2]$
5.
6. For $j_k \leftarrow 0$ to $m[k]$
7. $D[j_1]...[j_k] \leftarrow \min \limits_{(a_1,...,a_k) \in A} \{ 1 + D[j_1-a_1]...[j_k-a_k] : j_i \geq a_i, i=1,...,k \}$
8. $return(D[m[1]]...[m[k]])$

For each of the at most $O(n^k)$ sub-problems we do at most $O(n^k)$ work by computing the min over all base configurations. Hence worst-case runtime complexity is $O(n^{2k})$ i.e polynomial in the size of the task input for fixed $k$. Space requirement is $O(n^k)$.

The bin packing problem arises naturally in many logistics problems such as the problem of finding the minimum number of lorries required to deliver goods of different sizes or the minimum number of shelves needed in a storage room. It also arises in advertisement problems, where you for instance want to fit a certain number of commercials of different length in a minimum series of 3 minute television breaks. As we have seen above it can also arise in scheduling problems. Wikipedia mentions another interesting variant of the Bin Packing problem where items could occupy less space in comparison to the sum of their individual sizes when packed together. This arises in the problem of minimizing physical servers hosting virtual machines (VMs), where resource sharing may occur on the host. For that reason this variation of the problem is referred to as VM packing.
Needless to say the Bin Packing problem is very important.

Many other nice variations of this problem also exists. For example one may be interested in looking at bins of multiple capacities, higher dimensional extensions of the bins/items, different geometrical shapes of the bins/items or looking at online versions of these problems (i.e where items arrive incrementally) and etc…

We saw that the Bin Packing problem admitted a polynomial time solution when restricted to $k$ distinct types of items. The performance however is still rather dreadful in practice. Often when dealing with difficult problems such as the Bin Packing problem one has to resort to heuristics and approximation algorithms.

One simple such heuristic is the Next-Fit packing algorithm which traverses the items once and checks whether the next item fits in the last bin. If it does, then it puts it there, if it doesn’t then it opens a new bin. This may seem like a rather crude way of going about it, but the algorithm has certain advantages.
For instance it is only linear time, works online and outputs at most twice the optimal number of bins [Exercise 1].

Yet another simple heuristic which is slightly slower [$O(n^2)$ or less depending on implementation] but gives better results is the First-Fit packing algorithm. This algorithm repeatedly traverses the list of items and inserts into the last bin the first item that fits into it. If no item fits, it opens a new bin and goes through the list again until all items are gone.

……

This concludes our discussion about dynamic programming.
There are many other interesting problems which fit this topic. You will see two of them in the exercises below, but I recommend signing up to places like TopCoder and check out their practice rooms for a large variety of other Dynamic Programming problems of varying difficulty.
If you are interested in algorithms in general you may also find it worthwhile to check out Wikipedia’s list of algorithms.
The standard book carried around by all computer scientists you can find here.

Exercise 1 (Next-Fit Bin Packing – Easy):

Prove that the Next-Fit bin packing algorithm never does worse than two times the number of bins in the optimal solution.

Exercise 2 (Maximum Sum Contiguous Subsequence – Easy):

Give an $O(n)$ algorithm for finding the value of the maximum sum contiguous subsequence in an array of integers.

[Think carefully about what the sub-problems should be in a Dynamic Programming solution and how you would transition between them]

Exercise 3 ( Which Meetings To Attend? – Intermediate):

Your day is very busy and your calendar is packed with meetings and various other commitments (all here referred to as “meetings”), some possibly overlapping.

You are given an array $s$ of starting times for each meeting, an array $f$ of corresponding finishing times and an array $w$ of corresponding importance of the meetings.

Since you can only attend one meeting at a time you wish to pick out a subset of non-overlapping meetings which maximizes the total importance.

Give a Dynamic Programming algorithm for computing the optimal meeting schedule.

(Hint: You may wish to do some pre-processing to compute an array which gives you the latest finishing meeting not overlapping with meeting $j$)

Example:

## How To Prove Somethin’

Perhaps the biggest difference between mathematics at school and at university is a change in style into reasoning in more abstract and rigorous ways. As a student of pure mathematics you will for instance spend a great deal of your first year real analysis course untangling definitions and proving facts you may have taken granted for years – in a sense re-learning everything from scratch in a more formal setting.
I myself got to spend my first lecture in real analysis seeing proofs of statements like

(i) $-0 = 0$
(ii) $-(-x) = x \;\; \forall x \in \mathbb{R}$

from the axioms of the real number system, and a few lectures later proving effectively that a continuous function taking positive and negative values must pass through the
x-axis (Intermediate Value Theorem).

I recall that the main difficulty faced in the very beginning by some people (including myself) came down to lack of training in constructing and appreciating formal arguments, the mechanics of it. Although some universities have introductory courses to pure mathematics not all do, and they are usually very short (mine was one week).
Therefore I wanted to write a few lines about what most schools and some universities may not explicitly cover, namely the common high level methods for proving mathematical statements.

I will also remark before continuing that this is not a discussion about proofs in the formal sense of proof theory (a branch of mathematical logic). This is mainly a discussion about proofs in everyday mathematical practice.
Here methods will be outlined to keep in mind when facing mathematical statements of a certain form.

1. Implication $(P \implies Q)$

An implication consists of a Hypothesis $P$ and a Conclusion $Q$ written $P \implies Q$, meaning “If $P$ then $Q$“.

An example of an implication is:

$x^3 + 3x^2 + 3x \geq 0 \implies x \geq 0\;\;\;\;\;(*)$

The truth table of the implication statement is defined as follows:

Method 1.1 (Trivial Proof):

From the truth table you can see that if you are able to show that the conclusion $Q$ is true then $P \implies Q$ is always true regardless of the truth value of the hypothesis $P$.

Consider the following statement: “If I live in London then I live in the universe”.
Since the conclusion is always true it does not matter where I live, the implication is still true.

Note that the notion of a trivial proof usually means something different from the above.
If a mathematician refers to a proof as trivial they usually mean that it is “obvious”/”easy to prove” (in relation to the assumed knowledge of the audience).
Though you should beware of people who do this too much, it is really not a substitute for a proof if it means you can’t reconstruct it.

Method 1.2 (Vacuous Proof):

From the truth table you can also see that if you are able to show that the hypothesis $P$ is false then $P \implies Q$ is always true regardless of the truth value of the conclusion $Q$.

This has the somewhat surprising meaning that a false statement can imply anything!

Consider the following statement “If pigs fly then you are president of USA”.
Pigs don’t fly so the hypothesis is false and so the implication is true.
But clearly you are not president of USA (unless you are Obama reading this in which case you should stop procrastinating and get back to work!) so how does this make sense?
The rationale behind the table is that an implication is defined to be true until proven false.
Who knows what happens if pigs do fly, maybe you will then be president?
We cannot know until there is a flying pig and therefore since we can’t say that the implication is false we must say it is true.

(DISCLAIMER: please don’t try throwing pigs out the window hoping they will fly, I am not responsible).

Method 1.3 (Direct Proof):

This is perhaps the most familiar method.

To show that the implication is true you assume $P$ is true and deduce (by some logical process) that $Q$ is true.

Here is how you can prove $(*)$ using the direct method:

Suppose $x^3 + 3x^2 + 3x \geq 0$.

Then $(x+1)^3 = x^3 + 3x^2 + 3x + 1 \geq 1$ using the hypothesis.

Therefore $x+1 \geq 1$ and so $x \geq 0$.

Hence the implication is proved.

Method 1.4 (Proof by Contrapositive):

To show that $P \implies Q$ you can show the logically equivalent contrapositive statement $\lnot Q \implies \lnot P$
i.e “P then Q” is equivalent to “Not Q then Not P”.
You may check that both implications have the same truth table confirming their equivalence.

So to prove the contrapositive of $P \implies Q$ we assume $Q$ is false and deduce that $P$ must be false.

Here is how you can prove $(*)$ via the contrapositive:

Suppose $x < 0$.

Then $x+1 < 1$ so $(x+1)^3 < 1$.

Expanding we get $x^3 + 3x^2 + 3x + 1 < 1$ and therefore $x^3 + 3x^2 + 3x < 0$.

Hence the contrapositive implication has been proved and so the implication is proved.

In a proof by contradiction you assume the hypothesis $P$ to be true AND assume the conclusion $Q$ to be false and show that it leads to something absurd i.e a logical contradiction.
This logical contradiction could be a violation to the hypothesis (which is assumed to be true) or a contradiction to some other true statement you establish throughout the proof.

Here is how you can prove $(*)$ via proof by contradiction:

Suppose $x^3 + 3x^2 + 3x \geq 0$ and $x < 0$.

Dividing by $x$ (which is negative) reverses the inequality so then $x^2 + 3x + 3 \leq 0$.
By completing the square we get $(x+\frac{3}{2})^2 + \frac{3}{4} \leq 0$.

But this is a contradiction since the left hand side is strictly positive.

Since $\lnot Q$ doesn’t make sense under the true assumption of the hypothesis $P$ we deduce $P \implies Q$.

2. Existence $(\exists x \in S)\; P(x)$

Existence statements require you to show the existence of some mathematical object $x$ in a set $S$ satisfying a certain statement $P$. Often such existence statements are combined with implications to say that under certain hypotheses we can assert the existence of an object satisfying a given statement.

A mathematical object can be anything from a number, matrix or graph to circles, spheres and manifolds in geometry to groups, rings and modules in algebra.

There are essentially two categories of existence proofs, constructive and non-constructive.
I will also mention a third which really belongs to one of the two above but is worth mentioning separately.

Method 2.1 (Constructive Proof):

The constructive method is based upon building a mathematical object directly by means of supplying an algorithm or procedure for the construction of the object.
Such proofs can give a wealth of insight and are therefore often preferred over all other methods.
However constructive proofs are not always easy to find.
Safe to say being destructive (or rather non-constructive) is many times easier than being constructive – true in life, true in mathematics.

Example:

– Prove that there exist a real root to $f(x) = x^3 + 3x^2 + 3x + 1$.

$x^3 + 3x^2 + 3x + 1 = (x+1)^3$ so $f(-1) = 0$ and hence we can exhibit $-1$ as a real root.
Here I have proven that $f(x)$ has a real root by explicitly finding one through trivial factorization, so the proof may be considered constructive.

A somewhat less trivial example is given by the Euclidean Algorithm. Not only does it show the highest common factor of two integers exist, it also tells you how to find them.

Method 2.2 (Non-Constructive Proof):

A non-constructive proof usually means showing that something cannot not exist, and therefore must exist.

This argument crucially relies on a philosophical principle dating back to Aristotle called “Law of the excluded middle” which is the third of the three classical Laws of Thought.

The law says that everything which has a truth value is either True or False, there is no middle ground.
Therefore if we can show that something cannot not exist (by means of contradiction) then the only alternative is that it must exist by the law of the excluded middle.
The law basically implies we are allowed to eliminate double negations.

Both proof by contraposition and proof by contradiction (see above) depend on this law for their validity.

One school of philosophy known as ‘Constructivism’ rejects the law of the excluded middle and therefore rejects this method altogether and all proofs that follow from it (imagine how devastating for mathematics!).

Nevertheless this law is generally accepted by most people throughout mathematics.

The disadvantage with this method is that it doesn’t usually give much insight into how to find the object, it just tells you it must exist. However on the plus-side it is often easier to argue by (and sometimes the only possible argument).

A proof can also be non-constructive if it refers back to a theorem which guarantees existence but in turn gives no constructive answer.

Examples:

– Prove that there exist a real root to $f(x) = x^3 + 3x^2 + 3x + 1$.

I mentioned the Intermediate Value Theorem earlier which implies that a continuous real-valued function taking positive and negative values must cross the x-axis at least once.
Now $f(x)$ being a polynomial is a continuous real-valued function (why?).
Moreover $f(-2) < 0$ and $f(2)\; \textgreater \; 0$.
Hence $f(x)$ has a real root by the Intermediate Value Theorem.

In fact a very similar argument can be applied to show that any odd-degree polynomial always has at least one real root (how?).
This proof is non-constructive because I have not given a method for finding the real root, just referred to a theorem which tells us it has to exist somewhere (in fact somewhere between -2 and 2).

– Prove that some digit in the decimal expansion of $\pi$ has to appear infinitely often.

We know $\pi$ is irrational so its decimal expansion is never ending.
Suppose there didn’t exist a digit occuring infinitely often.
Then each of the digits $0,1,...,9$ appears finitely many times, say at most $n$ times each.
But then there can be at most $10n$ decimals in the decimal expansion of $\pi$ which is a contradiction to its irrationality.
Hence there exist a digit in the decimal expansion of $\pi$ appearing infinitely often, but we don’t know which.

Method 2.3 (Probabilistic Proof):

The Probabilistic Method proceeds by showing that an object with given properties must exist with positive probability. It can then be claimed with certainty that the object has to exist since there is at least one object in the sample space pushing the probability above zero.

The method is non-constructive.

A full example is probably slightly out of scope for this post but perhaps among the simpler examples is a proof of a lower bound on Ramsey Numbers of the form $R(k,k)$ due to Paul Erdos.

3. “For All” $(\forall x \in S)\; P(x)$

‘For all’-statements assert that for all objects $x$ in a set $S$ some statement $P$ is true about x.
In mathematical notation the phrases ‘for all’, ‘for every’ and ‘for each’ are all denoted by $\forall$.

Depending on the set $S$ there are different methods available.

Examples of $\forall$-statements are:
$\forall A,B \subseteq X,\;\;\; (A \cup B)^c = A^c \cap B^c$
$\forall$ perfect n-level binary tree $T$ has $2^{n+1}-1$ nodes
$\forall n \in \mathbb{N},\; \sum \limits_{k = 1}^n k = \frac{n(n+1)}{2}$

Method 3.1 (Direct Proof):

In a direct $\forall$-proof you pick an arbitrary element $x \in S$ and show that $P$ is true for x.
Since the element was arbitrary the statement is true for all $x \in S$.

To give an example we can prove below de Morgan Law in this way:
– Prove that $\forall A,B \subseteq X,\;\;\; (A \cup B)^c = A^c \cap B^c$

Let $A,B \subseteq X$.
We need to show that $\forall x \in (A \cup B)^c$ we have $x \in A^c \cap B^c$ and vice versa.

Let $x \in (A \cup B)^c$.
Then $x \notin A \cup B$ so $x \notin A$ and $x \notin B$ i.e $x \in A^c$ and $x \in B^c$.
Therefore $x \in A^c \cap B^c$.

Conversely let $x \in A^c \cap B^c$.
Then $x \in A^c$ and $x \in B^c$.
Thus $x \notin A$ and $x \notin B$ so $x \notin A \cup B$.
Therefore $x \in (A \cup B)^c$

Hence $(A \cup B)^c = A^c \cap B^c$ since $x$ was arbitrary in both directions.

We deduce that $\forall A,B \subseteq X,\; (A \cup B)^c = A^c \cap B^c$ since $A,B$ were chosen as arbitrary subsets.

Method 3.2 (Proof by Case Exhaustion):

If the set $S$ is finite or can be divided into a number of finite cases then it may be feasible to check every case by hand or with the aid of a computer.
These arguments are usually used as a last resort if no other argument is available.
Alternatively it is intentionally used to structure a proof into more manageable and coherent pieces for mathematical or aesthetic reasons.

The famous Four Colour Theorem was settled by exhausting a set of 1,936 cases using a computer.

Many other theorems throughout mathematics are based on proving the same statement using different arguments for special subsets of $S$

Method 3.3 (Proof by (weak) Induction):

Induction is a method that can be applied to prove statements about sets which are well-ordered.
A well-ordering on a set is a total order with respect to which all non-empty subsets have a smallest element.

$\mathbb{N}$ (with $\leq$) is a prototypical example of a well-ordered set.
$\mathbb{R}$ (with $\leq$) is not well-ordered by for example considering the subset $(0,1) \subseteq \mathbb{R}$.

[A theorem which has historically sparked great controversy among mathematicians is the Well-Ordering Theorem which says that any set admits a well-ordering, even $\mathbb{R}$ (with a different ordering than $\leq$ of course).
Also worth noting is that if you accept the well-ordering theorem you also have to accept the Axiom Of Choice since they can be shown equivalent.
Accepting axiom of choice leads to some unintuitive consequences like the Banach-Tarski Paradox.
Induction on sets with cardinality higher than $\mathbb{N}$ is usually referred to as Transfinite Induction]

Throughout this post we will assume to be working over $\mathbb{N}$ when talking about induction.

Mathematical induction can be thought of as a formal replacement for a recursive argument ending with the phrase “and so on…”.

A proof by induction has essentially got two parts:
i) Prove that P(1) is true (base case)
ii) Assume P(n) is true and prove that P(n+1) is true (inductive step)

It is then easy to see that this enough to show that P(n) is true for all n.
For example if someone specifies an arbitrary number k then you can argue:
P(1) is true, so P(2) is true with n=1.
P(2) is true, so P(3) is true with n=2.
P(3) is true, so P(4) is true with n=3.
.
.
P(k-1) is true so P(k) is true with n=k-1.

Here is an example of a proof by induction:
– Prove that $\forall n \in \mathbb{N},\; \sum \limits_{k = 1}^n k = \frac{n(n+1)}{2}$
The base case is clear since $\sum \limits_{k = 1}^1 k = 1 = \frac{1(1+1)}{2}$.
Now suppose for some $n > 1$ that $\sum \limits_{k = 1}^n k = \frac{n(n+1)}{2}$ (this is the inductive hypothesis).
We need to show the statement is true for $n+1$.
$\sum \limits_{k = 1}^{n+1} k = n+1 + \sum \limits_{k = 1}^{n} k = (n+1) + \frac{n(n+1)}{2} = \frac{2(n+1)+n(n+1)}{2} = \frac{(n+1)(n+2)}{2}$.
Hence the statement is true for all $n$ by induction.

Method 3.4 (Proof by Strong Induction):

This is another flavour of induction which differs from the previous method in that the inductive hypothesis is “strengthened” to assume $P(k)$ holds true $\forall k \leq n$.

Logically however strong induction is equivalent to weak induction but the former may simplify the proof of the inductive step in certain cases.

Here is an example of where strong induction comes in handy:
– Prove that $\forall n \in \mathbb{N} \setminus \{0,1\}$, $n$ may be written as a product of prime numbers
(this is the existence part of the so called Fundamental Theorem Of Arithmetic)

The base case $n=2$ is clear since $2$ is prime.
Suppose now $n>2$ and $\forall k \leq n$ we may write $k$ as a product of primes.
If $n+1$ is a prime then we are done.
If $n+1$ is not a prime then $\exists a,b \in \mathbb{N}$ such that $n+1 = ab$.
Now $a \leq n$ and $b \leq n$ so by inductive hypothesis both $a$ and $b$ may be written as a product of primes.
Hence $n+1$ may be written as a product of primes (being a product of $a$ and $b$).

Method 3.5 (Proof by Minimal Criminal):

A proof by the method of minimal criminal is a blend between induction and an argument by contradiction.

The method has a slightly disturbing logic and works by assuming the existence of a minimal counter example (the minimal criminal) and then prove there is an even smaller counter example hence leading to a contradiction.
Thus there can be no counter example by well-ordering and so the statement is proved for all elements of the set.

The method is also sometimes referred to as proof by Infinite Descent (the idea developed by Fermat).

Proof by minimal criminal is for example used in the classical argument to show that $\sqrt{2}$ is irrational:

Suppose $\sqrt{2}$ is not irrational i.e rational.
Then $\exists p,q \in \mathbb{N}$ with $p$ minimal s.t $\sqrt{2} = \frac{p}{q}$.
Squaring both sides yields $2 = \frac{p^2}{q^2} \implies 2q^2 = p^2$.
Therefore since $2$ divides the left hand side it must also divide the right hand side.
This means $2|p$ and so $p = 2p'$ for some $p' \in \mathbb{N}$.
Thus $2q^2 = 4(p')^2 \implies q^2 = 2(p')^2 \implies 2|q$ so that $q = 2q'$ for some $q' \in \mathbb{N}$.
But then $\sqrt{2} = \frac{p'}{q'}$ where $p' < p$ contradicting the minimality of $p$
Hence $\sqrt{2}$ cannot be rational.

Method 3.6 (Proof by Forward-Backward Induction):

Forward-Backward Induction (sometimes just called Backward Induction) is a less commonly used technique but has uses in various niche circumstances.

The method works in 3 steps.

i) Prove that $P(1)$ is true (base case)
ii) Prove that $P(k)$ holds true $\forall k \in A$ where $A \subseteq S$ is an infinite subset (forward step).
For example you can prove that $P(2^m)$ is true $\forall m \in \mathbb{N}$.
iii) “Fill in the gaps” of the previous step by inducting backward, showing that $P(k) \implies P(k-1)$ (backward step)

Then $P(n)$ holds true for all $n \in S$

Perhaps the most commonly known application of this method is Cauchys proof of the AM-GM inequality.
A proof may be found in the following Wikipedia article

Method 3.7 (Proof by Two-Dimensional Induction):

Induction can be extended to higher dimensions.
Say we are looking to prove some statement $P(m,n)\; \forall m,n$.
There are essentially two ways this can be done, either by folding the two variables into a single variable and performing a regular 1-dimensional induction or it can be proved by doing nested 1-dimensional inductions with respect to each variable.

The method works as follows:

(i) Prove that $P(1,1)$ is true (base case)
(ii) Prove that if $P(m,n)$ is true whenever $m + n = k$ then $P(m,n)$ is true for $m + n = k +1$ (inductive step)

or

(i) Prove that $P(1,1)$ is true (base case)
(ii) Prove that if $P(m,1)$ is true for some $m >1$ then $P(m+1,1)$ is true (outer induction)
(iii) Prove that if $P(m,n)$ is true for some $m,n >1$ then $P(m,n+1)$ is true (inner induction)

Then $P(m,n)$ is true for all $m,n$.

The method can be used for instance to prove Ramsey’s theorem (see below).
Let $R(m,n)$ denote the smallest number such that in any group of $R(m,n)$ people there exists $m$ people who all know each other or $n$ people who all don’t know each other.
It is known that $R(m,n) \leq R(m-1,n)+ R(m,n-1)\; \forall m,n>1\;\;\;(*)$
(this is a lemma to be proved in the full argument).
– Prove that $\forall m,n \in \mathbb{N},\; R(m,n) \leq \left( \begin{array}{c} m+n-2 \\ m-1 \end{array} \right)$.

The base case is clearly true, $R(1,1) = 1 \leq \left( \begin{array}{c} 1+1-2 \\ 1-1 \end{array} \right)$.
Suppose the statement is true for all $m,n$ with $m+n = k$ for some $k > 2$.
We want to show the statement holds for all $m,n$ with $m+n = k+1$.
If $m=1$ or $n=1$ then the statement follows immediately so we may assume $m>1$ and $n > 1$.
Now notice that $k = (m-1) + n = m + (n-1)$ so by inductive hypothesis
$R(m-1,n) \leq \left( \begin{array}{c} (m-1)+n-2 \\ (m-1)-1 \end{array} \right)$ and $R(m,n-1) \leq \left( \begin{array}{c} m+(n-1)-2 \\ m-1 \end{array} \right)$.
Then by $(*)$ and a standard binomial identity we have
$R(m,n) \leq R(m-1,n) + R(m,n-1) \leq$
$\left( \begin{array}{c} m+n-3 \\ m-2 \end{array} \right) + \left( \begin{array}{c} m+n-3 \\ m-1 \end{array} \right) = \left( \begin{array}{c} m+n-2 \\ m-1 \end{array} \right)$

Hence the statement follows by induction on $m+n$.

4. Miscellaneous

Method 4.1 (Combinatorial Proof):

A combinatorial proof revolves around proving some identity by means of finding an explicit bijection or counting the same thing in two ways.

Combinatorial arguments are usually extremely more elegant and illuminating than their (usually) inductive counterpart.

Here is an example of a combinatorial proof:

– Prove that $\sum \limits_{k=0}^n \left( \begin{array}{c} n \\ k \end{array} \right) = 2^n$

On the right hand side we have the cardinality of the power set (i.e the set of all subsets) of $n$ objects.
On the left hand side we are summing over the number of all subsets of size $k$ for $0 \leq k \leq n$
These two are equal and hence the identity follows.
Now try proving this by induction!

More examples can be found in a previous article I wrote on the Orbit-Stabilizer Theorem and the Burnside Lemma.

Method 4.2 (Uniqueness Proof):

Statements which have the word “unique” in them require uniqueness proofs.

Uniqueness can mean different things in different contexts.
In number theory you may be talking about the only number, or the only number modulo n satisfying a given proposition.
In algebra you may be talking about the only algebraic object up to isomorphism (algebraic equality).
In topology you may be talking about the only topological space up to homeomorphism (topological equality).
In general one could prove uniqueness with respect to any class or invariant.

Proofs of uniqueness often goes hand in hand with proofs of existence.
The skeleton for uniqueness proofs are almost always the same.
One assumes non-uniqueness and then shows that the supposedly non-unique objects satisfying the given proposition must in fact be equal.

Here is a trivial (yet demonstrative) example of this procedure:

– Prove that $3x + 1 = 7$ has a unique solution

Suppose there are two numbers $x_1,x_2$ satisfying the above equation.
Then $3x_1 + 1 = 7$ and $3x_2 + 1 = 7$.
Therefore $3x_1 + 1 = 3x_2 + 1 \implies 3x_1 = 3x_2 \implies x_1 = x_2$.
Hence the solution is unique if it exists (this needs to be proved as well, but we can easily see it is 2)

Method 4.3 (Proving Equivalence):

Statements of the form $P \Longleftrightarrow Q$ are called equivalences between statements $P$ and $Q$.

In mathematical texts the double arrow is sometimes replaced with the phrase ‘if and only if‘ or ‘iff‘ for short.

If $P$ is equivalent to $Q$ that means $P$ is both necessary and sufficient for statement $Q$ to be true.

$P \Longleftrightarrow Q$ is logically equivalent to $P \implies Q$ AND $Q \implies P$.

Therefore all methods for proving implications are applicable to equivalences as well.
In particular to prove that $P \Longleftrightarrow Q$ one can show $\lnot P \Longleftrightarrow \lnot Q$

There are of course a myriad of other techniques and proof patterns specific to more narrow mathematical domains, but most of the methods you have seen above are used throughout mathematics.

It may also be worth mentioning how to disprove a mathematical statement.
To disprove a mathematical statement it is enough to produce exactly one counter example.
For instance if you are to disprove that $P(x)$ is true for all $x \in S$ you should not show that $P(x)$ is false for all $x \in S$. It is enough to exhibit one $x \in S$ such that $P(x)$ is false.
It is also important that you prove that your counter example is valid by proving that it satisfies all hypotheses AND does not satisfy the conclusion.

For example here is a false statement:
– “All real numbers are rational”

We can disprove this by exhibiting one number which we prove is real and cannot be rational.
Provided $\sqrt{2}$ exists we proved in Method 3.5 that $\sqrt{2}$ cannot be rational.
A proof that $\sqrt{2}$ exists as a real number can be found here.
These facts together disproves the statement.

How do you know if a statement is true or false?
There is no general answer, if there were mathematics wouldn’t be particularly interesting.
If you are proving exercises in class you may as well trust the problem setter unless they explicitly ask you to determine the true or falsehood of a statement.
In general having found a question that makes sense (which can be hard in itself) I would personally recommend going by your intuition (…whatever that really means). If you think it might be false you may want to start looking for a counter example immediately. If you have no clue then you should start proving the statement. While doing so you might stumble over something which breaks the statement and that can help you construct a counter example by exploiting what makes it fall apart. Otherwise you either succeed in proving the statement or you get stuck somewhere and what to do then is a topic for another discussion.

Finally, if you are interested in reading about some humorous false proof techniques see here. I am sure anyone who have sat in a math class could identify with at least some of these.

Here are some exercises you may wish to try:

Exercise 1 (Everything in the world is grey – Easy):

What is wrong with the following argument that attempts to prove every object in the world is grey?

——————————————————————————————————————–
Let $S$ be the set of all objects in the world.

For the base case, pick a grey T-shirt.
Now assume all subsets of $n$ objects are grey.
Consider a subset $A \subseteq S$ with $n+1$ objects.
Removing the last object from $A$ we can apply the inductive hypothesis to conclude the first $n$ objects from $A$ are all grey.
Removing the first object from $A$ we can apply the inductive hypothesis to conclude the last $n$ objects of $A$ are all grey.
Therefore all $n+1$ objects of $A$ must be grey.

Hence all objects in the world are grey by induction.
———————————————————————————————————————

What is wrong with the argument if we replace “every object is grey” with “every object is of the same colour”?

Exercise 2 (Binomial Identity – Intermediate):

Prove the binomial identity via induction.

$(x+y)^n = \sum \limits_{k = 0}^n \left( \begin{array}{c} n \\ k \end{array} \right) x^k y^{n-k}$.

Can you find a combinatorial proof?

Exercise 3 (Hungarian Mathematical Olympiad (2000) – Intermediate):

Find all positive primes $p$ for which there exists integers $x,y$ and $n$
such that $p^n = x^3+y^3$.

(Hint 1: Find the obvious solutions by inspection and prove there are no more using Minimal Criminal on $n$)

(Hint 2: In the minimal criminal argument prove that $p|x$, $p|y$ and that $n$ has to be at least 3. You can start by factorizing $x^3+y^3$ )

## Orbit-Stabilizer, Burnside and Pólya

Quite frequently Algebra conspires with Combinatorics to produce useful results.
In this post the intention is to discuss algebraic results such as Orbit-Stabilizer Theorem, Burnside’s Lemma and Pólya’s Enumeration Theorem to answer combinatorial questions such as:

“How many essentially different ways are there of colouring the faces of a cube in n colours?”
“How many different k-colour necklaces can be constructed using n beads?”
“How many isomers exist of a particular chemical compound molecule?”

These are all questions that depend on enumerating configurations (or patterns) under some notion of equivalence, e.g rotational symmetry. For example rotating or flipping a coloured necklace doesn’t produce a new necklace, even though the colour of the bead at the same position may have changed as a result of the operation.
Therefore a more generalized treatment is required from more elementary combinatorics questions such as “How many ways can you (statically) colour n distinguishable beads in k colours?”

As a brief note before continuing, this post may ask a little bit more of the reader than is usually the case in this blog.
In particular I will assume some basic familiarity with Groups and Group Actions.

Let’s start by reviewing two fundamental results about group actions, namely the Orbit-Stabilizer theorem and the Burnside Lemma.

Orbit-Stabilizer Theorem:

Let $G$ be a group acting on a set $\Omega$.

For $x \in \Omega$ let $Orb(x) := \{ gx : g \in G \}$ and let $Stab_G(x) := \{ g \in G : gx = x \}$.

Then there is a bijection $\phi: G/Stab_G(x) \rightarrow Orb(x)$.

In particular $|G| = |Orb(x)||Stab_G(x)|$.

Proof:

Define $\phi: G/Stab_G(x) \rightarrow Orb(x)$ via $\phi(gStab_G(x)) = gx$.

To check well-definedness suppose $g_1Stab_G(x) = g_2Stab_G(x)$.

Then $g_2^{-1}g_1 \in Stab_G(x)$ so $\phi(g_1Stab_G(x)) = g_1x = g_2(g_2^{-1}g_1)x = g_2x = \phi(g_2Stab_G(x))$.

Thus $\phi$ is well-defined.

Clearly $\phi$ is a surjection.

If $\phi(g_1Stab_G(x)) = \phi(g_1Stab_G(x))$ then $g_1x = g_2x \Longrightarrow g_2^{-1}g_1x = x \Longrightarrow g_2^{-1}g_1 \in Stab_G(x)$.

Thus $g_1Stab_G(x) = g_2Stab_G(x)$ which shows $\phi$ is also an injection.

Hence $\phi$ is a bijection and so $[G:Stab_G(x)] = |Orb(x)|$ i.e $|G|/|Stab_G(x)| = |Orb(x)|$ and the statement follows.

Suppose $\Omega$ is the set of $6$ faces of a cube.
You can act on these faces using rigid rotations i.e rotations that leave the cube in the same position but with faces permuted.
Such rotations form a group $G$.
Now if you decide to fix one face, then there are $4$ rotations that won’t send the face elsewhere. For example if you fix the bottom face then you can twirl the cube $4$ times at $90^{\circ}$ around a vertical axis while keeping the bottom face in its place. Any other rigid rotation will move the bottom face somewhere else. In other words the stabilizer of the bottom face has order $4$ under the action of $G$. The orbit of the bottom face has size $6$ since there are $6$ faces on the cube and you can send the bottom face to any other face using rigid rotations.
The orbit stabilizer theorem tells us that the order of the rotation group $G$ must be $6.4=24$.
It actually turns out the rotation group of the cube is isomorphic to the symmetric group $S_4$ but we will come back to this later.

As we will see, counting the number of distinct orbits of $\Omega$ under $G$ will be instrumental in answering the type of questions posed in the beginning. The next result called Burnside’s Lemma will help us do just that.

Burnside’s Lemma:

Let $G$ be a finite group acting on a set $\Omega$.

For every $g \in G$ define $Fix_{\Omega}(g) := \{ x \in \Omega : gx = x \}$.

Then $\#Orbits = \frac{1}{|G|} \sum_{g \in G} |Fix_{ \Omega }(g)|$.

Proof:

To prove this we use the classical counting argument of “counting the same thing in two ways”.

The natural thing to count is the set of all fixed pairs, namely $S = \{(g,x) \in G \times \Omega : gx = x\}$.

On one hand by iterating over the elements of $G$ we get $|S| = \sum_{g \in G} |Fix_{ \Omega }(g)|$.
On the other hand by iterating over elements of $\Omega$ we get $|S| = \sum_{x \in \Omega} |Stab_{ G }(x)|$.

Clearly every element of $\Omega$ belongs to exactly one orbit (in fact Orbit membership is an equivalence relation on $\Omega$). Therefore $\Omega = \bigsqcup_{i = 1}^{n} \Omega_i$ where $\Omega_1,...,\Omega_n$ are the distinct orbits in $\Omega$ under $G$.

Moreover by Orbit-Stabilizer theorem we know that $|Stab_{ G }(x)|= \frac{|G|}{|Orb(x)|}\;\;\forall x \in \Omega$.

Thus
$|S| = \sum_{x \in \Omega} |Stab_{ G }(x)| = \sum_{i=1}^n \sum_{x \in \Omega_i} |Stab_{ G }(x)| = \sum_{i=1}^n \sum_{x \in \Omega_i} \frac{|G|}{|Orb(x)|} = \sum_{i=1}^n \sum_{x \in \Omega_i} \frac{|G|}{|\Omega_i|} = \sum_{i=1}^n |G| = n.|G|$.

Hence $n = \frac{1}{|G|} \sum_{g \in G} |Fix_{ \Omega }(g)|$.

This lemma tells us that the number of orbits under an action is the same as the average number of elements fixed by the action. The motivation for counting orbits is quite clear in the context of cube colouring. If there are two colourings belonging to the same orbit then we should only count the colourings once because they are sitting on the same cube and can therefore be rotated to one another. The number of orbits thus represent the number of different colourings under rotational symmetry.

Let’s look at the problem of counting cube colourations more closely now that we have Burnside’s lemma at our disposal, which reduces the problem to finding the number of colourings fixed under a particular rotation.

Problem 1 (Cube Colourings)

How many essentially different n-coloured cubes are there?

Solution

We are set to count the number of orbits under the action of the 24 different rigid rotations (which we are about to enumerate). To use the Burnside Lemma we need to find the number of fixed colourings of each of the 24 rotations.
Fortunately these rotations belong to 3 classes:

If you are interested in viewing an animated depiction of the (13 total) rotation axes I found this excellent video on YouTube.

Suppose now we number the faces as below:

Then we can obtain the following table by examining the cycle decomposition of the face permutations.

Burnside’s Lemma now gives

$\#Orbits = \frac{1}{24}( n^6 + 6n^3 + 3n^4 + 8n^2 + 6n^3)$

For example there are $\frac{1}{24}( 2^6 + 6.2^3 + 3.2^4 + 8.2^2 + 6.2^3 ) = \frac{1}{24}(240) = 10$ ways of colouring the faces of a cube in Red and Black.

Burnside’s Lemma can help us understand in how many ways we can freely colour the faces of a cube, or the beads of a necklace. What if we want to impose some restrictions on the colourings? Say we want to know how many ways we can colour a 6-bead necklace in 3 Red, 2 Black and 1 White bead?
By pushing the Burnside Lemma a bit further we can answer this too. This is where Pólya pattern inventories come in.

Let’s make two important definitions before we proceed:

Definition (Cycle Index)

Let G be a group of symmetries (permutations) acting on $n$ objects.

Let $\sigma \in G$ and suppose $\sigma$ is a product of $k$ disjoint cycles of length $l_i$ for $1 \leq i \leq k$.

For such $\sigma$ define the monomial $M_{\sigma}(x_1,...,x_n) := \prod\limits_{i=0}^k x_{l_i}$ where $x_{l_1},...,x_{l_k}$ are indeterminates.

The Cycle Index of G is then defined via

\begin{aligned} C_G({\bf x}) := \frac{1}{|G|} \sum\limits_{\sigma \in G} M_{\sigma}({\bf x}) \end{aligned} where ${\bf x} = (x_1,...,x_n)$.

For example if $G = S_3$ then
$M_{(1)(2)(3)} = x_1^3$
$M_{(1\;2)(3)} = M_{(1\;3)(2)} = M_{(2\;3)(1)} = x_2x_1$
$M_{(1\;2\;3)} = M_{(1\;3\;2)} = x_3$
Thus $C_G = \frac{1}{6}( x_1^3 + 3x_2x_1 + 2x_3 )$.

Definition (Pattern Inventory)

Let G be a group of symmetries (permutations) acting on $n$ objects.

Let ${\bf v} = (n_1,...,n_k)$ be non-negative integers satisfying $n_1+...+n_k = n$
and let $a_{\bf v}$ represent the number of inequivalent colourings such that colour $y_i$ appears $n_i$ times.

Then the (Pólya) Pattern Inventory with respect to G is defined by

\begin{aligned} P_G(y_1,...,y_k) := \sum \limits_{{\bf v}} a_{\bf v}y_1^{n_1}...y_k^{n_k} \end{aligned}

The Pólya Enumeration Theorem (which we are about to prove) gives us a way of computing the coefficients $a_{\bf v}$ of the pattern inventory with respect to a group.
For example if we would like to calculate the number of inequivalent 5 Red, 4 Green, 1 Blue – colourings of 10 objects under symmetries of G we are interested in the $R^5G^4B$ coefficient $a_{(5,4,1)}$ in the inventory \begin{aligned} P_G(R,G,B) = \sum \limits_{i+j+k = 10} a_{(i,j,k)}R^{i}G^{j}B^{k} \end{aligned}.

Pólya Enumeration Theorem

Let G be a group of symmetries (permutations) acting on a set of n objects and
let $C_G$ denote the cycle index of G.

Then the pattern inventory for G using colours $y_1,...,y_k$ is given by

\begin{aligned} P_G(y_1,...,y_k) = C_G(\sum \limits_{i=1}^k y_i, \sum \limits_{i=1}^k y_i^2, ..., \sum \limits_{i=1}^k y_i^n ) \end{aligned}.

Proof

The following proof of the theorem is due to Richard P. Stanley.

Let ${\bf v} = (n_1,...,n_k)$ be a vector of non-negative integers summing to $n$.

Let $\Omega_{{\bf v}}$ denote the set of colourings of the $n$ objects where colour $y_i$ appears $n_i$ times.

If $\sigma \in G$ fixes a particular colouring then every object permuted by one of its cycles (in its disjoint cycle decomposition) must have the same colour.
For example if I permute the faces of a cube according to $(1\;2\;3\;4)$ (i.e 90 degree rotation around an vertical axis) then faces $1,2,3,4$ must all have the same colour, otherwise the colouring is not fixed under the rotational symmetry.

Therefore $|Fix_{\Omega_{{\bf v}}}(\sigma)|$ is the coefficient of $y_1^{n_1}...y_k^{n_k}$ in $M_{\sigma}( \sum \limits_{i=1}^k y_i,\sum \limits_{i=1}^k y_i^2, ..., \sum \limits_{i=1}^k y_i^n )$

[To see this, note that if $\sigma$ contains an r-cycle then every monomial term in the pattern inventory coming from $\sigma$ should have a colour $y_i$ with exponent at least $r$ so the product $M_{\sigma}$ should generate all such monomials for arbitrary colours $y_i$, hence the factor $\sum \limits_{i=1}^k y_i^r$]

Let $y_{{\bf v}} := y_1^{n_1}...y_k^{n_k}$ where ${\bf v} = (n_1,...,n_k)$.

Running over all permissable vectors ${\bf v}$ we get

$\sum \limits_{{\bf v}} |Fix_{\Omega_{{\bf v}}}(\sigma)| y_{{\bf v}} = M_{\sigma}(\sum \limits_{i=1}^k y_i,\sum \limits_{i=1}^k y_i^2, ..., \sum \limits_{i=1}^k y_i^n)$.

Running both sides over all $\sigma \in G$ and dividing by $|G|$ we get

$LHS = \frac{1}{|G|} \sum \limits_{\sigma \in G} \sum \limits_{{\bf v}} |Fix_{\Omega_{{\bf v}}}(\sigma)| y_{{\bf v}} = \sum \limits_{{\bf v}} (\frac{1}{|G|} \sum \limits_{\sigma \in G} |Fix_{\Omega_{{\bf v}}}(\sigma)|)y_{{\bf v}} =$

$= \sum \limits_{{\bf v}} a_{\bf v}y_{{\bf v}} = P_G( y_1^{n_1},...,y_k^{n_k} )$
where the next to last equality follows from Burnside’s Lemma (or rather a weighted version of it)

$RHS = \frac{1}{|G|} \sum \limits_{\sigma \in G} M_{\sigma}(\sum \limits_{i=1}^k y_i,\sum \limits_{i=1}^k y_i^2, ..., \sum \limits_{i=1}^k y_i^n) = C_G(\sum \limits_{i=1}^k y_i, \sum \limits_{i=1}^k y_i^2, ..., \sum \limits_{i=1}^k y_i^n )$

$LHS = RHS$ and so the result follows.

Now we can finally exploit the machinery we have developed to complete some nice applications.

Problem 2 (Necklaces)

Compute the pattern inventory for the number of n-bead, k-colour necklaces.

How many 6-bead necklaces may be constructed using 3-Red, 2-Green and 1-Blue beads?

Solution

We must first figure out the group of symmetries for a necklace.

I mentioned earlier that the symmetries of a necklace are expressed via rotations and flips and indeed all coloured necklaces obtained from such symmetries should be viewed the same.

If we let $\sigma$ denote a $\frac{2\pi}{n}$ clockwise rotation and $\tau$ denote a flip (i.e reflection) about the y-axis then one can show (geometrically or algebraically) that the full symmetry group is given abstractly by the presentation $\langle \sigma,\tau| \sigma^n = 1, \tau^2=1, \tau \sigma \tau^{-1} = \sigma^{-1}\rangle$.

This group has a name, it is called the Dihedral Group of size 2n (often denoted $D_{2n}$).
As a permutation group it is generated by $\sigma = (1\;2\;...\;n)$ and $\tau = (1\;n)(2\;n-1)...(\frac{n}{2}\;(\frac{n}{2}+1))$ with obvious adjustment for n odd.
You can check that these two generators satisfy the relators of the group presentation.
Moreover it can be checked that any element of $D_{2n}$ can be expressed in the form $\sigma^i \tau^j$ where $0 \leq i < n, 0 \leq j \leq 1$ by using repeated applications of the relation $\tau \sigma \tau^{-1} = \sigma^{-1}$ on a general element.

To apply Pólya Eumeration Theorem we need to compute the cycle index of $D_{2n}$.
To compute this index we need to understand the cycle structure of an arbitrary permutation in $D_{2n}$.

We can split the investigation into two pieces:

1. Rotations with flip (i.e permutations $\sigma^i \tau,\;\; 0 \leq i < n$)
2. Simple rotations (i.e permutations $\sigma^i,\;\; 0 \leq i < n$)

In 1 the symmetries are really realized as simple flips over a different symmetry axis (see picture below).

As you can see from the picture there are two kinds of axes, those which go through opposite edges and those which go through opposite beads. There are equally many of both types when n is even ($\frac{n}{2}$ of both).

In the case where the symmetry axis goes through a pair of opposite edges, a flip entails swapping pairs of beads on the necklace i.e we are dealing with a product of $\frac{n}{2}$ disjoint transpositions.
In the other case where the symmetry axis goes through a pair of opposite beads then we are also swapping pairs of beads, except for the beads lying on the axis which are fixed by the reflection.

The cycle index monomial for n even is therefore given by
$M_{\sigma^i \tau} = x_2^{\frac{n}{2}}$ in the former case and
$M_{\sigma^i \tau} = x_1^2x_2^{\frac{n-2}{2}}$ in the latter.

If n is odd the symmetry axes go through opposite edge and bead as depicted below:

Thus there is only one type of reflectional symmetry swapping pairs of beads while fixing the bead crossed by the symmetry axis.

The cycle index monomial for n odd is therefore given by
$M_{\sigma^i \tau} = x_1x_2^{\frac{n-1}{2}}$

In summary the full cycle index for the permutations belonging under 1 is given by

$\frac{1}{|D_{2n}|} \sum \limits_{\sigma^i \tau \in D_{2n} \setminus \langle \sigma \rangle} M_{\sigma^i \tau}({\bf x}) = \begin{cases} \frac{1}{2n} (\frac{n}{2} x_2^{\frac{n}{2}} + \frac{n}{2} x_1^2x_2^{\frac{n-2}{2}}) & if\;\; n\; even \\ \\ \frac{1}{2n} (n x_1x_2^{\frac{n-1}{2}}) & if \;\;n\; odd \end{cases}$

We have left to classify permutations under 2.
Here we are really asked to compute the cycle index of the cyclic group of order $n$ (i.e the subgroup generated by $\sigma$ alone).

Consider an element $\sigma^i$ where $0 \leq i < n$ and let us examine its disjoint cycle decomposition.
Suppose the shortest cycle in $\sigma^i$ has length $m$ and let $x$ be a number moved by this cycle.
Let $y \neq x$ be a number moved by a different cycle.
We want to show that the cycle to which $y$ belongs have length $m$ also.

Clearly since $\sigma$ is a cycle permutation $\exists r \in \mathbb{N}$ such that $\sigma^r(x) = y$.
Now $(\sigma^{i})^m(y) = \sigma^{im}(\sigma^r(x)) = \sigma^r \sigma^{im}(x) = \sigma^r(x) = y$.
Therefore $y$ belongs to a cycle of $\sigma^i$ of length dividing $m$.
But $m$ was assumed to be minimal and therefore the cycle to which $y$ belongs must have length $m$ as well.

We have thus shown that all cycles in the cycle decomposition of $\sigma^i$ have same length.
Hence if $\sigma^i$ has order $d$ then $\sigma^i$ must be a product of $\frac{n}{d}$ disjoint $d$-cycles.

Let $\phi(d) := |\{ k \in \{1,...,d\} | gcd(k,d) = 1 \}|$ (this is the so called Euler Totient Function)

The number of elements of order $d$ in $\langle \sigma \rangle$ is given by $\phi(d)$.
(Explicitly these are given by $\sigma^{\frac{kn}{d}}$ where $gcd(k,d) = 1$)

Conversely by Langrange’s Theorem the order of every element in $\langle \sigma \rangle$ must divide $n$.

Hence the cycle index of $\langle \sigma \rangle$ is given by

$\frac{1}{|D_{2n}|} \sum \limits_{d|n} \phi(d) x_d^{\frac{n}{d}}$.

Hence the cycle index of $D_{2n}$ is given by

$C_{D_{2n}}({\bf x}) = \begin{cases} \frac{1}{2n} \left(\sum \limits_{d|n} \phi(d) x_d^{\frac{n}{d}} + \frac{n}{2} x_2^{\frac{n}{2}} + \frac{n}{2} x_1^2x_2^{\frac{n-2}{2}} \right) & if\;\; n\; even \\ \\ \frac{1}{2n} \left(\sum \limits_{d|n} \phi(d) x_d^{\frac{n}{d}} + n x_1x_2^{\frac{n-1}{2}}\right) & if \;\;n\; odd \end{cases}$

By Pólya Enumeration Theorem the pattern inventory for $D_{2n}$ is given by

$P_{D_{2n}}({\bf y}) = C_{D_{2n}} ( \sum \limits_{i=1}^k y_i,\sum \limits_{i=1}^k y_i^2, ..., \sum \limits_{i=1}^k y_i^{n} )$

Restricting to the special case where $n=6$ we may use Wolfram Alpha to expand:
$P_{D_{12}}((R,G,B)) =$
$C_{D_{12}}(R+G+B, R^2+G^2+B^2, R^3+G^3+B^3, ..., R^{6}+G^{6}+B^{6})=$
$\frac{1}{12} [ \phi(1)(R+G+B)^6 + \phi(2)(R^2+G^2+B^2)^3 + \phi(3)(R^3+G^3+B^3)^2 +$
$\phi(6)(R^6+G^6+B^6) + 3(R^2+G^2+B^2)^3 + 3(R+G+B)^2(R^2+G^2+B^2)^2 ] =$

$= B^6+B^5 G+B^5 R+3 B^4 G^2+3 B^4 G R+3 B^4 R^2+3 B^3 G^3+6 B^3 G^2 R+6 B^3 G R^2+3 B^3 R^3+3 B^2 G^4+6 B^2 G^3 R+11 B^2 G^2 R^2+6 B^2 G R^3+3 B^2 R^4+B G^5+3 B G^4 R+6 B G^3 R^2+ {\bf 6 B G^2 R^3} +3 B G R^4+B R^5+G^6+G^5 R+3 G^4 R^2+3 G^3 R^3+3 G^2 R^4+G R^5+R^6$

From the pattern inventory expansion we can read off the number of colourings possible for a 6 bead necklace using 3 Red, 2 Green and 1 Blue beads (see bold term).

Hence there are 6 such necklaces.

Pólya counting theory also has applications in chemistry (among several other areas).
Chemists frequently use the enumeration formula to calculate the number of different isomers of a compound molecule.
Here is a simple taste of such an application.

Naphthalene $\left( C_{10}H_8 \right)$ is an organic compound mainly used in pesticides to keep insects away but has also got other niche applications such as for pyrotechnical special effects where it is used for simulated explosions and black smoke. Naphthalene has ten carbon atoms arranged in a double hexagon, binding eight hydrogen atoms (see below).

Problem 3 (Naphthol)

Naphthol is obtained from Naphthalene by replacing one of the Hydrogen atoms with a hydroxyl group (OH).

How many different isomers of Naphthol exists?

Solution

One easily verifies that the double hexagon has two axes of reflective symmetry and also that the product of the reflections represent a 180 degree rotation in the plane. This gives 4 distinct permutations which we can list explicitly given the numbering below:

$\iota = (1)(2)(3)(4)(5)(6)(7)(8)$
$x = (1\;4)(5\;8)(6\;7)(2\;3)$
$y = (1\;8)(2\;7)(3\;6)(4\;5)$
$xy = (1\;5)(2\;6)(3\;7)(4\;8)$

(sorry for the sloppy animation!)

As an abstract group the symmetries have presentation $G = \langle x,y\;|\;x^2=1,y^2=1,xy=yx \rangle \cong C_2 \times C_2$.
$C_2 \times C_2$ acts on the 1 Hydroxyl and 7 Hydrogen bindings (in this context the different bindings and their multiplicity represent our “colour weight”).

We can now compute the cycle index of $C_2 \times C_2$ as follows

$C_{C_2 \times C_2}({\bf x}) = \frac{1}{4}(x_1^8 + 3x_2^4)$.

Therefore with $A$ representing Hydrogen and $B$ representing a Hydroxyl group we get

$P_{C_2 \times C_2}(A,B) = \frac{1}{4}C_{C_2 \times C_2}( A+B, A^2+B^2,...,A^8+B^8 ) =$
$\frac{1}{4}((A+B)^8 + 3(A^2+B^2)^4) =$
$= A^8+{\bf 2 A^7 B} +10 A^6 B^2+14 A^5 B^3+22 A^4 B^4+14 A^3 B^5+10 A^2 B^6+2 A B^7+B^8$

Thus there are two isomers of Naphthol (called 1-Naphthol and 2-Naphthol respectively)

This concludes our discussion of this topic.
Below are some exercises for you to try.

Exercise 1 (Dice -Easy)

How many different 6-sided dice can you build if each number in $\{ 4,5,6 \}$ must appear on exactly two faces?

(Hint: Look carefully at Problem 1 where we effectively calculated the cycle index of the symmetries of a cube – use it )

Exercise 2 (Tetrahedron – Intermediate )

How many different ways are there of colouring the faces of a Tetrahedron in 3 colours under rotational symmetry?

Exercise 3 (Triphenylamine – Intermediate )

Triphenylamine $(C_6H_5)_3N$ is a molecule with 3 carbon rings attached to a central nitrogen atom and 15 hydrogen atoms binding remaining free carbons.

How many isomers can be obtained by replacing 6 Hydrogen atoms with 6 Hydroxyl (OH) groups?

(Tip: You may want to use Wolfram Alpha to expand the pattern inventory)

## Games Of Chance

In this post we’ll discuss games with elements of uncertainty.

Games of chance have a distinctive place in the theory of probability because of its close ties with gambling.
There is however an important distinction to be made.
Games such as Roulette, Craps or ‘Snakes and Ladders’ can be deemed “boring” because they are memoryless i.e your future success has no dependence on how the game was played in the past, only on where you are currently (so they resemble Markov Chains).
In addition these games have no intermediate decision making so in summary we may safely claim there is no skill involved.

Other games such as Poker and Bridge involve uncertainty and imperfect information but are generally considered games of skill partly because it is possible to distinguish between good and bad decisions and choosing between them requires skill.

Of course games like poker depend on so much more than just probabilistic decision making. It also involves things like psychology, emotional control and exploitation of your opponents tendencies. In recent time there has been great dispute in many courts as to whether poker is a game of inherent skill or a game of pure luck. The distinction has great legal (and reputational) implications, since games that are considered gambling are usually only allowed in casinos. For example the federal judge in New York last year ruled that poker is indeed a game of skill, partly because the opposition failed to convincingly explain why so many players who had been identified as winners/losers after a certain number of hands, kept on consistently winning/losing over time. The moral being you would not be able to make such two-sided predictions from a random walk. One of the charts shown to the judge may be viewed in this article.

Below you can see a simple Excel simulation of how the $\approx 2.7\%$ casino edge plays out in a game of Red/Black Roulette-betting over 10,000 spins. The chart shows the 5 best and the 5 worst paths out of an initial population of 100. Each bet is 1$. Here of course everyone will lose money eventually because the game is not fair (expected value is -0.027$/spin with a 1$bet). The point is that you cannot predict future success for say the purple player after observing 1,000 spins, despite the player seemingly beating the game at that point (not knowing the rest of the story). It is not always obvious determining whether a game requires skill, and if so, what strategies are most favourable. Here are a few interesting made-up games that illustrate this. Problem 1 (Game Show): Suppose you are part of a game show and have just received the chance to win a car. There are three doors. Behind two of them is a goat, and behind the remaining door you will find the car. The host lets you choose one door after which he proceeds to open a different door. It turns out that the door opened by the host had a goat behind it. The host now offers you to keep the door you have or switch. What should you do? Solution: This is a bit of a trick question if you have the relevant background on this game. The game is actually not in your favour and requires no skill in the sense that all strategies have equal probability $\frac{1}{2}$ of winning the car. Did you think this was the classic Monty Hall problem? Actually it isn’t because you were never told what the show host was going to do and hence there is no conditional probability involved. The door that was opened turned out to have a goat behind it. In general you need to know the protocol of the host to derive probabilistic information. If you don’t know then the chances are plain and simple. For those of you who were not caught by this and have not heard of Monty Hall before I recommend you read the wikipedia article in the link above. If you know that the host is going to show you a goat after choosing your door then things change. Your chances of winning by switching will then have increased to $\frac{2}{3}$. This version of the game appeared on an actual TV show called Let’s Make a Deal which mainly aired on NBC for many years. I once heard a story (though sounding more like an urban legend) regarding a candidate interviewing for a role at a wall street investment bank, offered to play the same game as above except the prize was different (and no, the interviewer did not use goats). If the candidate won he would immediately advance to the final round for this highly competitive job, otherwise he would have to go home. The candidate having heard about the Monty Hall problem confidently accepted because he did not estimate his chances of advancing to exceed $\frac{2}{3}$. According to the story the interviewer told the candidate to step outside the room while he put a coin under one of three paper cups. When the candidate was called back he chose the wrong cup. The interviewer then revealed the coin under a different cup and told the candidate to go home. The candidate was confused and started arguing that an empty cup was supposed to be revealed. The interviewer merely stated that they were looking for candidates who could pay attention to details while under high pressure… The next two games are related to guessing the colour of the next card in a deck. Let’s investigate if the games are “boring” or whether there exist any good strategies. Problem 2 (Red Or Black? – Version 1): Given is a perfectly shuffled deck of playing cards (26 Black, 26 Red). Each card is revealed to you one by one, and at any point in time you can stop the dealer and bet 1$ on the next card being Red.

You are allowed to bet only once throughout the game and if you don’t bet you are automatically betting 1$on the last card (so you are effectively paying 1$ to play the game).

If you win you get back 2$. What would be your best strategy? Solution: It sounds like you should be able to gain an advantage by waiting for the number of Red cards in the deck to exceed the number of Black cards and then bet, but of course that may never happen and then you will lose in the end. It turns out this game is fair and no matter what strategy you chose you can never gain an advantage but also never lose any advantage. This is actually an immediate consequence of the Martingale Stopping Theorem which (in this context) says that the chance of winning by betting at the very beginning is the same as the chance of winning by betting at any other time (in expectation). In probability theory Martingales are essentially models of fair games. Your winning probability in this game is a discrete-time martingale. There is however a much nicer argument. Suppose you have some arbitrary strategy. Instead of betting on the next card being Red you bet on the last card being Red. At any point during the game the probability of the next card being Red is of course the same as the probability of the last card being Red, so the two versions of the game are equivalent. But this new version is completely uninteresting because you win anyway if the last card is Red regardless of the strategy you chose i.e the last card doesn’t change just because you choose a different strategy. The next game is much more interesting and now the question is instead how much money you can make. Problem 3 (Red Or Black? – Version 2): Again a dealer perfectly shuffles a standard deck of 52 cards with equally many Red and Black cards. You start with 1$.

The dealer turns over one card at a time and gives you the opportunity to bet any fraction of your current worth on the colour of the next card.

The dealer gives you even odds on every bet, meaning he will return you double the bet if you win regardless of the composition of the deck.

Clearly you can wait and bet nothing until the last card when you are guaranteed to win
and go home with 2\$.

The question is if you can do any better, and if so, what is the maximum guaranteed amount you can leave with?

Solution:

Consider the following “suicide” strategy:
Suppose you take for granted that the colours in the deck are distributed in some specific way and bet everything you have that the cards will follow that fixed distribution on every turn.

In the unlikely event that you don’t go broke your fortune will be massive, namely $2^{52}$ dollars.

Despite this, the expected return of the strategy (mathematically speaking) is
$\mathbb{E}(X) = 2^{52} / \left( \begin{array}{c} 52 \\ 26 \end{array} \right) \approx 9.08$ dollars since any of the other remaining distributions will leave you with $0$ dollars.

Suppose you clone yourself into $N := \left( \begin{array}{c} 52 \\ 26 \end{array} \right)$ mini-me’s that each play this game following the same strategy using a different fixed distribution out of the $N$ possible.

Any strategy must amount to splitting up your initial worth between the clones in some way.
To see this suppose your strategy entails betting $d$ dollars on Red on the first turn.
Then you split $0.5+\frac{d}{2}$ dollars equally between the clones that bet on Red and split $0.5-\frac{d}{2}$ dollars equally among the clones betting on Black on the first turn. You then continue distributing the remaining part of your dollar in the same way by adding funding to the surviving clones according to your strategys decision tree.
In the end every clone has it’s own stake of the initial $1$ dollar.

Thus any strategy is composed of some funding allocation of $a_k$ dollars to each clone $k$ where $\sum_{k=1}^{N} a_k = 1$.

The expected return of the strategy is therefore given by $\mathbb{E}(S) = \sum_{k=1}^{N} a_k \mathbb{E}(X_k)$ where $X_k$ is the random variable for the return of clone $k$ starting with $1$ dollar.
Each of the clones have the same expected return so $\mathbb{E}(X_1) = \mathbb{E}(X_2) = ... = \mathbb{E}(X_N) \approx 9.08$.
We therefore get $\mathbb{E}(S) \approx 9.08$.

Hence remarkably it does not matter how the money is initially split between the clones as every such strategy have the same expected value. In particular every such strategy is optimal in expectation.

We can now guarantee ourselves of receiving the expected return by dividing the initial dollar equally among all clones. This way we know that one of them will get the distribution right and hence survive all the way through with total expected return of $9.08$.

Since we can never guarantee anything better than the expected return this is the best possible guaranteed return.

The above argument shows that exactly one of the clone players will have assumed the correct deck colour distribution starting with $1/\left( \begin{array}{c} 52 \\ 26 \end{array} \right)$ dollars.
That players wealth will therefore grow like $1/\left( \begin{array}{c} 52 \\ 26 \end{array} \right), 2^1/\left( \begin{array}{c} 52 \\ 26 \end{array} \right),2^2/\left( \begin{array}{c} 52 \\ 26 \end{array} \right),...,2^{52}/\left( \begin{array}{c} 52 \\ 26 \end{array} \right) \approx 9.08$.

In practise you can only make one bet per card so you would then simply look at the net bet made by all clones and adopt that bet for yourself. That means an aggregated bet of $x$ dollars on Red and $y$ dollars on Black among the clones, amounts for you to a bet of $x-y$ dollars on Red if $x > y$ and $y-x$ dollars on Black if $y > x$.

The last game I wanted to discuss may also seem intuitively surprising.

You may have seen the classic cult movie Highlander from 1986. If you haven’t it’s not that important, it’s mainly about violent Scottish swordsmen who are hunting each other down to gain each others powers.

Problem 4 (There Can Be Only One):

The Macleods and the Frasers are two Highlander clans (and sworn enemies).

They have currently spotted each other somewhere in Glenfinnan on the shores of Loch Shiel in the Highlands of Scottland.

The members of the Macleod clan have strengths $p_1,....,p_n$ and the Frasers have strength $q_1,...,q_m$.

The Highlanders fight one-on-one until death, two people at a time.

If two Highlanders, one with strength $x$ battles a Highlander with strength $y$ then the former wins with probability $\frac{x}{x+y}$ and the latter with probability $\frac{y}{x+y}$.

Once a Highlander cuts off the head of another Highlander the winner must yell “There can be only one!!!” and will then gain the strength of the killed Highlander.

After each match, the Frasers put forward a combatant of their choosing among those still alive in the clan, and the Macleods do the same right behind.

The winning clan is the clan that remains alive with at least one person.

Do the Macleods have a favourable strategy, getting to put forward their candidate last?

Solution:

Interestingly the game is actually fair.

To see this note that each individual combat is fair in the sense that the expected gain or loss of strength in each combat is zero for both clans:
$y\frac{x}{x+y} - x\frac{y}{x+y} = 0 = x\frac{y}{x+y} - y\frac{x}{x+y}$.

Let $p$ be the probability that Macleods win the battle.
If they win they would of course have accumulated all Frasers strength into their clan and if they lose they will have given away their total accumulated strength, leaving them with no strength.
Therefore since Macleods are not expected to gain any strength over the course of the battle we get:
$p.(\sum_{i=0}^n p_i + \sum_{i=0}^m q_i) + (1-p).0 = \sum_{i=0}^n p_i \Longrightarrow p = \frac{ \sum_{i=0}^n p_i}{\sum_{i=0}^n p_i + \sum_{i=0}^m q_i}$

Hence the probability of winning is $p$ for Macleods regardless of lineup.

Why not you try below problems?

Exercise 1 (Three Way Duel – Easy):

Alice, Bob and Carol are participating in a three-way duel, meaning any dueller could be aiming at any of the other two duellers when it is their turn to shoot.

Alice goes first, then Bob and then Carol in that order until there is only one person left who is declared the winner.

Alice hits her target $\frac{1}{3}$ of the time, Bob $\frac{2}{3}$ of the time and Carol never misses.

Does Alice have a best strategy, assuming the remaining players act rational?

Exercise 2(£ Attraction – Intermediate):

This problem is not really a game of strategy, but rather asks how much you would be willing to pay to play the game.

A million british pounds are thrown into one of two piggy banks in the following way:

The piggy banks initially contain $1$ pound each.

If at some point one piggy bank contains $x$ pounds and the other $y$ pounds then the next pound coin will be dropped into the former piggy bank with probability $\frac{x}{x+y}$ and into the latter with probability $\frac{y}{x+y}$.

How much would you intuitively pay for the contents of the piggy bank with the least number of pounds?

Now compute the expected value of that piggy bank.

(Hint: Is the coin distribution really that complicated? Find the distribution and deduce the answer.)

Posted in Game Theory, Probability, Uncategorized | | 1 Comment

## Puzzles With Handicap

If you by any chance read about The Laser Gun puzzle you may have also watched the embedded video featuring the presenter Peter Winkler who is currently a professor in Mathematics and Computer Science at Dartmouth college.
Beside his regular occupation as a professor he is also a keen puzzlist, having published two books with some of his first couple of hundred (or so) favourite puzzles.

I purchased one of his books the other day from Amazon for 20 bucks.
It is called “Mathematical Puzzles: A Connoisseur’s Collection” and contains around 100 challenging puzzles.

As Peter points out puzzles and mathematics are not quite the same thing.
Although puzzles may be constructed based on mathematical principles, puzzles are for our amusement and makes no direct attempt at building theory or solving important problems. Puzzles are so often called puzzles because they are elegant, entertaining and exhibit some non-obvious mathematical truth without necessarily containing a lot of mathematical “meat”. On the other hand many of the problems in the book once existed as unsolved mathematical problems before someone found a sufficiently elegant solution for it to make it into the book, so puzzles can certainly originate from serious work.

I have only looked at a fraction of all the puzzles in the book, but I am already certain I will find many more excellent problems to write about as I get more time to work on them.

In this post I was planning to write about a few algorithmic problems that imposes some sort of handicap, like human memory constraints or communication of secrets in a non-private situation. These are the kind of problems you can tell your friends about at lunch and have a realistic chance at getting solved while “on your feet”.

I was personally able to complete the argument for the first two but was defeated by the last, although found it to be a very interesting problem to think about.
The solutions presented here describe my own thought process, so discussions may be a bit less concise and elegant than the intended solution you can find in the book (apologies in advance).

Problem 1 (Find The Missing Number):

All but one number between $1$ and $100$ is called out exactly once every 10 seconds in some arbitrary order.

How can you find the number which is missing if you have no aid (such as paper and pen)?

Solution:

Clearly if you have memory like an average person it is very difficult to memorize a set of $99$ numbers.
Fortunately there is a better way!

Quite often when information is sent over a “noisy channel”, i.e a line of communication which may mess up or lose data, data integrity is verified using checksums (particularly if the data is numeric).

Checksums are useful because they can cheaply tell you if data has been compromised by comparing some short amalgamated representation of the received data with an expected checksum at the other end.
Conversely however if checksums match, it does not necessarily mean the data is uncompromised since two integrity failures may for instance “cancel out” to create a matching checksum.

In this problem we know precisely how “noisy” the channel is – it will not distort data but erase exactly one number.
Since there is exactly one number missing we may be able to design a checksum which after comparison can allow us to correct for the error.
This way we would only need to maintain one number, namely the checksum.

Quite clearly the most sensible option for the checksum is to take the plain sum of all numbers called out.
Since the missing number is one between $1$ and $100$ we know the checksums are going to differ by most $99$.
We therefore don’t even need to maintain a very large sum.
We may equally well work $(mod\;100)$ since this makes arithmetic mind calculations significantly faster and this is important since we only have $10$ seconds between each number being announced.

The expected checksum is obtained by first summing all numbers between $1$ and $100$.
Most of you are probably familiar with the identity
$\sum_{k = 1}^{n} k = \frac{n(n+1)}{2}$.
If not, you can prove it by induction or solve the Difference Equation $s_n = s_{n-1} + n$.

The expected checksum is therefore $\frac{100.101}{2} = 5050 \equiv 50\;(mod\;100)$.

Once all $99$ numbers have been called out we are left with a checksum between $1$ and $99$.
The expected checksum is $50$ so the missing number is given by the smallest positive integer congruent to $50 - checksum\;(mod\;100)$.

The next problem (slightly reworded from the book) relates to finding the majority vote using limited resources.
According to Peter Winkler this topic arose as a serious problem in computer science not too long ago.
Devising efficient online algorithms with limited resources (i.e algorithms dealing with streams of data) is a major area of research in compute science. Indeed a lot of data in the real world is available incrementally as opposed to being statically accessible offline.
This is an example of an online problem with a very elegant solution.

Problem 2 (Which Restaurant?):

Suppose you are organizing a large lunch takeaway (say for a whole department).

You would like to know which restaurant people would like to order from, so at the end of the large department meeting you stand at the exit and ask everyone which restaurant they prefer, one at a time as they pass.

Your boss thinks that a restaurant should only be selected if it receives a majority of the votes.
Otherwise he/she thinks you should select an arbitrary restaurant.

You have bad memory so you can only remember the name of one restaurant at a time.
At your disposal you also have a counter which can be incremented or decremented.

How can you figure out which restaurant, if any, received the majority of the votes (i.e more than half)?

Solution:

In this problem you have a long stream of information that is made available incrementally, and you have extremely small storage capacity.

It is clear that the last name you store in memory needs to be the majority restaurant, if there is one.

With the available resources there is really only one way to proceed, and that is to increment the counter if you hear the restaurant currently stored in memory and decrement it if you don’t.

The key observation is that once the counter has been decremented to zero, you replace the current name stored in memory with the name you just heard and increment the counter to one.
That way when the majority restaurant is in memory the counter is incremented more often than it is decremented.
Any other restaurant in memory is going to get decremented to zero since they are covered by decrementing votes for the majority restaurant alone.
You are therefore guaranteed to end up with the majority restaurant in memory by the end (if there is one), since it is the only restaurant that can survive with a positive count at the end.
If there is no majority then you end up with some arbitrary restaurant, but that is fine according to the problem statement as long as there is no majority.

Hence you choose the restaurant you last had in memory after everyone has had their say.

The final problem is a stunner and related to how two people who have never met can encrypt common knowledge into a secret over an insecure channel. Think about the implications of this for a moment!
Peter Winkler himself is actually the man behind this fact, and applied this knowledge to the game of bridge in his research. According to some author info I found on the web, his methods are currently illegal for tournament play in most of the western world.

Problem 3 (The Detectives):

Two detectives who have never spoken to each other are investigating a murder separately.

They both have the same list of eight suspects and through hard detective work they have both narrowed down the field to two suspects each.

They have now been told to talk to each other over the phone to compare information, and if their pairs overlap in exactly one suspect, identify the murderer and arrest him. If not then both detectives need to go back to work.

The complicating factor is that the mob has tapped the phone line and can hear everything that is being said in the conversation. They also know the list of eight suspects the detectives are talking about but are not aware of the two main suspects identified by each of the detectives.

If as a result of the conversation, the mob can determine with certainty who the common suspect is then they will kill off that suspect before he is arrested and get a chance to tell on the whole criminal organization.

What conversation can take place between the detectives so that they both know their common suspect if there is one, but the mob is still left in the dark?

Solution:

The obvious problem is that the detectives have never spoken before so somehow they need to manufacture their secret in public using the private information held by them both.

When I first read this problem, the first thing that came into my mind was whether this is any different from any old public-key encryption protocol like RSA or Diffie-Hellman.

Then I figured that the definition of secure depends on the context in which it is used.
Both RSA and Diffie-Hellman impose very difficult practical obstacles for breaking the encryption, like prime factorization of large integers. These schemes are not mathematically secure because given sufficient compute resources one can find trivial brute-force attacks on both protocols. Integer prime factorization is certainly not a mathematical impossibility. In fact it is always made possible by the Fundamental Theorem of Arithmetic.

Here we are looking for some mathematically secure way, in which anyone subscribing to the insecure channel, is unable to unlock the secret with certainty in any way logically conceivable.

Clearly if the first detective gives away both his suspects over the phone the second detective will know the answer but has no way of securely communicating it back, since now the mob has exactly the same information as the first detective, and hence can reach any conclusion made by that detective.

Somehow based on the private information held by both detectives they need to create some ambiguity for the mob.
After a couple of failed attempts of trying to construct certain ambiguity I decided to look up the answer in the book (hey, I couldn’t miss the football game…..excuses).

The intended solution went along these lines:

Since everyone has the same list they may agree to number the suspects from $1$ to $8$.

Suppose the detectives, let’s call them A and B, also agree to organize the possible pairs of suspects into below table as follows:

Let’s assume detective A holds the pair $\{1,2\}$.
Then detective A will tell detective B that his pair is in the first column.

Either detective B finds his own pair in the same column or he does not.
If he does, then he knows both pairs either coincide or are completely disjoint, since pairs in every column are disjoint. Either way there is no point in continuing the conversation so he may just say so.
If he doesn’t find his pair in the same column then either the pairs are disjoint or they intersect in exactly one suspect (if they intersect at all). The latter case needs to be communicated without the mob finding out.

Detective B may now look at the two pairs in the first column that contain either of his numbers.
If he has $\{1,4\}$ for example, these pairs would be $\{1,2\}$ and $\{3,4\}$.
Detective B would then split the eight numbers up into two pieces based on the two pairs as $\{1,2,3,4\}, \{5,6,7,8\}$ and tell detective A that his pair is either in $\{1,2,3,4\}$ or in $\{5,6,7,8\}$.

Detective A can now deduce that if their pairs intersect then detective B‘s pair must lie in $\{1,2,3,4\}$, since his own pair belong to that piece.
However the mob on the other hand only have the information that detective A‘s pair is in the first table column and that detective B‘s pair is in one of the two pieces.

Thus the detectives have constructed their own common secret since they can from now on refer to “The Piece” without the mob possibly knowing which piece they are talking about, $\{1,2,3,4\}$ or $\{5,6,7,8\}$?

This is supposed to be the key idea.
Now they can keep on referring to “The Piece” and exploit this ambiguity to transfer other information.

Detective A can now respond and say to detective B that his pair is either $\{1,2\}$ or $\{5,6\}$.

Since they both know which piece their pairs belong to (if they intersect), detective B now knows that detective A‘s pair must be $\{1,2\}$, whereas the mob is clueless because the conversation thus far would sound exactly the same to them if detective B was holding the pair $\{5,6\}$.

If detective B‘s pair does not intersect with $\{1,2\}$ then there is no point continuing the conversation, so he might as well say their pairs are disjoint.

If detective B notices that their pairs intersect, say he holds the pair $\{1,4\}$ then he tells detective A that his pair is either $\{1,4\}$ or $\{5,8\}$.

Detective A now knows detective B‘s pair to be $\{1,4\}$ since it is the only pair in the relevant piece, and again the mob cannot tell which.

Hence both detectives know which suspect they have in common without the mob knowing for certain.

Here are a couple of more puzzles from the book that I liked on similar theme:

Exercise 1 (Field Soliders – Easy):

An odd number of soldiers are stationed in a field, in such a way that all the pairwise distances are distinct.

Each soldier can only keep an eye on the nearest other soldier.

Prove that at least one soldier is not being watched.

Exercise 2 (Prisoners Problem – Intermediate):

Each of n prisoners will be sent alone into a certain room, infinitely often, but in some arbitrary order determined by their jailer.

The prisoners have a chance to confer in advance, but once the visits begin, their only means of communication will be via a light in the room which they can turn on or off.

Help them design a protocol which will ensure that some prisoner will eventually be able to deduce that everyone has visited the room.

## Difference Equations

In this post we will discuss Difference Equations.

One could say difference equations are the discrete analogue of differential equations.
Indeed the techniques used to solve them very much resembles the approach one takes for solving ordinary differential equations (ODEs).
An example of a difference equation is the Fibonacci sequence $F_n = F_{n-1}+F_{n-2},\; F_0 = F_1 = 1$ which we will learn how to solve later.

Difference equations are often not as extensively treated in undergraduate education as their continuous counterpart.
They are nevertheless important in all sorts of disciplines.
In probability theory difference equations (or “recurrence relations”) often arise when describing probabilistic transitions between a number of discrete states. In science and economy one is often interested in discretizing a differential equation into a difference equation so one can solve it numerically in a computer. There is a whole research area devoted just to approximating solutions to differential equations in this way so I am not going to delve much deeper into this.
Instead we will look at how to solve certain types of difference equations exactly and apply the theory to some nice puzzles, like Tower of Hanoi or finding out the probability of a mouse surviving in a filthy student corridor with a cat (a metaphor for finding the absorption probability of a discrete Markov Chain).

Let’s start off with some theory for the simplest kind of difference equations:

First Order Difference Equations:

The general form of a first order difference equation with constant coefficients is:

$y_{n} + ay_{n-1} = b\;\;\;\;\;\;(*)$
where $a,b$ are a constants

To solve this equation we may first consider what happens in the case where $b = 0$, the so called homogeneous case.

Unwinding the equation iteratively we get $y_{n} = -ay_{n-1} = a^2y_{n-2} = -a^3y_{n-3} = ... = (-a)^ny_0$.
This has the general form $C(-a)^n$ for some constant $C$.

To find the general solution when $b \neq 0$ this complementary solution of the homogeneous equation needs to be translated in some direction. To understand how this should be done we need to find a particular solution.

We could try the simplest possible solution, namely $y_n = k$ for some constant $k$.
Plugging into (*) we get $k+ak=b \Longrightarrow k = \frac{b}{a+1}$, which is well defined as long as $a \neq -1$

If $a = -1$ we may instead try $y_n = kn$ which after insertion into (*) yields
$k(n+1)+akn=b \Longrightarrow k = \frac{b}{n(a+1)+1} = b$.

The general solution to (*) thus becomes
$y_n = \begin{cases} C(-a)^n + \frac{b}{a+1} & if\;\; a \neq -1 \\ C(-a)^n + bn & if \;\;a = -1 \end{cases}$

We can indeed verify this solution is valid since in the former case (say) we have:
$y_{n} + ay_{n-1} = C(-a)^{n} + \frac{b}{a+1} + a(C(-a)^{n-1} + \frac{b}{a+1}) =$
$C(-a)^{n} + \frac{b}{a+1} - (C(-a)^{n} - \frac{ab}{a+1}) = \frac{b}{a+1} + \frac{ab}{a+1} = \frac{b(a+1)}{a+1} = b$.

Notice also the resemblance to ordinary differential equations where we also look for homogeneous and particular solutions in a similar way.

Moreover if we have an initial condition $y_0 = C_0$ then we can determine the general coefficient $C$.

Let’s move on to an application:

Problem 1 (Tower of Hanoi):

The Tower of Hanoi is a puzzle that consists of three vertical poles and $n$ number of discs in different size.
Each disc has a hole in the middle so it can be moved between the poles.
Initially the discs are situated on the middle pole, ordered by size, with the largest disc at the bottom.
The discs are only allowed to be moved in such a way that a larger disc never sits on top of a smaller disc. Other than that discs are allowed to be moved freely between the poles.
The puzzle is solved when every disc either sits all on the left pole or all on the right pole.

What is the minimum number of moves required to solve the puzzle?

Solution:

Let $y_n$ denote the minimum number of moves required to solve the puzzle.

Clearly $y_0 = 0, y_1 = 1, y_2 = 3$.

In order to move the bottom disc to an empty pole we must first move the remaining $n-1$ discs to a different empty pole. By definition this requires $y_{n-1}$ moves.

Once the $n-1$ discs have been moved aside and the bottom disc moved to its finial location, we need to move back the $n-1$ discs on top of the bottom disc to solve the puzzle. This requires another $y_{n-1}$ moves.

In total we therefore need to use $y_n = 2y_{n-1} + 1$ moves.
This describes a first order difference equation.
According to our earlier discussion it has the general solution $y_n = C2^n - 1$.
Our initial condition $y_0 = 0$ yields $0 = C2^0 - 1 \Longrightarrow C = 1$.

Hence the minimum number of moves required to solve the puzzle is $y_n = 2^n -1$.

According to the legend the world will end once monks, fulfilling an ancient prophecy, have finished solving the puzzle for $n=64$.
According to our analysis the minimum number of moves required is $2^{64}-1 \approx 1.84 \times 10^{19}$.
Even if they carry out one move per second it will take them approximately $5.82 \times 10^{11}$ years to complete, so I think we are safe for a while longer.

In order to look at remaining applications we need to go over some theory of second order difference equations:

Second Order Difference Equations:

A Second Order Difference Equation with constant coefficients have general form:

$y_{n} + ay_{n-1} + by_{n-2} = c\;\;\;(*)$
for $n \geq 2$ where $y_0,y_1$ are given and $a,b,c$ are constants

Here too the general solution depends on finding complementary and particular solutions.

Let’s first consider the homogeneous case where $c = 0$.
With understanding of first order difference equations we might believe a solution of the form $y_n = Cx^n$ can work here too. Lets see how that works out if we assume $x \neq 0$:
$y_{n} + ay_{n-1} + by_{n-2} = 0 \Longleftrightarrow Cx^n + aCx^{n-1} + bCx^{n-2} = 0$
$\Longleftrightarrow x^2 + ax + b = 0 \Longleftrightarrow x = -\frac{a}{2} \pm \sqrt{(\frac{a}{2})^2 - b}$.

Excellent! So if we choose $r$ to be a root of the auxiliary equation $x^2 + ax + b = 0$ we get that $y_n = Cr^n$ is a solution to $y_{n} + ay_{n-1} + by_{n-2} = 0$ for any constant $C$.

Ignoring the case where the auxiliary equation has no real roots we have two cases:

i) $x^2 + ax + b = 0$ has two distinct real roots $x_1,x_2$.
You may easily verify that
$y_n = C_1x_1^n + C_2x_2^n$
is a solution to the homogeneous equation for arbitrary constants $C_1,C_2$.

ii) $x^2 + ax + b = 0$ has a repeated real root $x_1$.
You may easily verify that
$y_n = (C_1+nC_2)x_1^n$
is a solution to the homogeneous equation for arbitrary constants $C_1,C_2$.

A particular solution may be found in similar fashion to how they were found for first order difference equations, by trying something simple like $y_n = k$ or $y_n = kn$.

Complementary and particular solutions are finally added together for a general solution to (*).

The constants $C_1,C_2$ can be determined by plugging in values for $y_0,y_1$ and solving the resulting equation system in two unknowns $C_1,C_2$.

Again this bears striking resemblance to solutions of second order ordinary differential equations.

A nice direct application of this is that we may compute a closed form expression for the $n^{th}$ Fibonacci number.

Problem 2 (Fibonacci Number Formula):

The Fibonacci sequence is given by $F_n = F_{n-1}+F_{n-2},\; F_0 = F_1 = 1$.

Solve this difference equation to get a closed formula for the $n^{th}$ Fibonacci number.

Solution:

The Fibonacci recurrence is a homogeneous second order difference equation.
The auxiliary equation is given by:
$x^2-x-1 = 0$.

The roots of the quadratic equation are given by
$x_1 = \frac{1+\sqrt{5}}{2} = \phi$ (Golden Ratio) and
$x_2 = \frac{1-\sqrt{5}}{2} = 1 - x_1 = 1 - \phi$.

This yields the general solution $F_n = C_1\phi^n + C_2(1-\phi)^n$.

Since $F_0 = 1, F_1 = 1$ we can determine $C_1,C_2$ via the equation system

$\begin{cases} 1 = C_1\phi^0+C_2(1-\phi)^0 & \\ 1 = C_1\phi^1 + C_2(1-\phi)^1 & \end{cases}$

Solving for $C_1,C_2$ we get $C_1 = \frac{\phi}{2\phi-1}$ and $C_2 = \frac{\phi-1}{2\phi -1}$.

Hence $F_n = \frac{\phi^{n+1} - (1-\phi)^{n+1}}{2\phi-1}$ where $\phi = \frac{1+\sqrt{5}}{2}$.

It is indeed not obvious that this expression needs to be an integer.

Let’s now turn our attention to a probabilistic application:

Problem 3 (Cat And Mouse):

A mouse finds its way to a filthy student corridor with $N$ rooms as depicted below.

The mouse is small enough to creep through the tiny gap under each door of the corridor.
It initially starts in room $k$ and moves every minute to an adjacent room with equal probability (out of fear for an equally afraid student).

Unfortunately for the mouse there is a cat owner in room $0$.
If the mouse enters room $0$ it will get instantly killed by the eager cat.
If on the other hand, the mouse reaches past the last room (room $N-1$) it will escape out the corridor and survive.

(i) What is the probability that the mouse will survive?

(ii) How long is it expected to take before we get a resolution to the mouse’s fate?

Solution:

(i)
Let $p_k$ denote the probability that the mouse will survive given that it is currently in room $k$.

Since the mouse is equally likely to go left or right we get the following homogeneous second order difference equation:

$p_k = \frac{1}{2}p_{k-1} + \frac{1}{2}p_{k+1}$.

Since the mouse is guaranteed to die if it reaches room $0$ we get $p_0 = 0$.
Similarly since it will escape if it reaches the end on the opposite side of the corridor we get $p_N = 1$.

We can now solve this difference equation using the methods we have been discussing.
The auxiliary equation is given by:
$x = \frac{1}{2} + \frac{1}{2}x^2 \Longleftrightarrow x^2 - 2x + 1 = 0 \Longleftrightarrow (x-1)^2 = 0 \Longleftrightarrow x = 1$.

This time we have a repeated root so the general solution takes the form
$p_k = (C_1+kC_2)1^k = C_1+kC_2$.

The initial condition $p_0 = 0$ implies $C_1 = 0$.
The condition $p_N = 1$ implies $C_2 = \frac{1}{N}$.

Hence $p_k = \frac{k}{N}$.

(ii)
To compute the expected duration of the mouse’s activity in the corridor we may begin by introducing some notation.
Let $D_k$ denote the expected duration until death or escape given that the mouse is currently in room $k$.

Since the mouse stays in each room for one minute before switching to an adjacent room (with equal probability) we obtain the non-homogeneous second order difference equation:

$D_k = \frac{1}{2}D_{k-1} + \frac{1}{2}D_{k+1} + 1$.

Moreover $D_0 = 0 = D_N$ since activity ends at either end state.

We have already seen that the homogeneous equation has general solution $D_k = C_1+kC_2$.

We now need to find a particular solution.
To spare you the trial and error computation, it turns out that both $D_k = C$ and $D_k = Ck$ lead to degenerate solutions, meaning we cannot determine the constant $C$.

The obvious next candidate to try is $D_k = Ck^2$ where $C$ is a constant to be determined.
Inserting yields $Ck^2 = \frac{1}{2}C(k-1)^2 + \frac{1}{2}C(k+1)^2 + 1 \Longrightarrow C = -1$

Therefore the general solution is given by
$D_k = C_1 + kC_2 - k^2$

The initial condition $D_0 = 0$ yields $C_1 = 0$.
The second condition $D_N = 0$ gives $0 = C_1 + NC_2 - N^2 \Longrightarrow 0 = N(C_2-N) \Longrightarrow C_2 = N$

Hence we expect the mouse to run around in the corridor for $k(N-k)$ minutes before either getting killed by the cat or running out of the corridor, provided that the mouse starts in room $k$.

Those of you who have studied elementary probability might have recognized already that the problem above was simply a restatement of the Gamblers Ruin Paradox. Instead of a mouse and a cat we have two opponents each holding $k$ and $N-k$ dollars respectively. The opponents play a fair game (say they flip a fair coin) and if one opponent loses, he has to give away one dollar to the other opponent. They play the game until one of them holds all $N$ dollars. The question is what the probability is of the first opponent losing all his money (i.e the mouse getting eaten by the cat).

The seeming “paradox” here being despite playing a fair game the first player will still lose all his money if he plays a very rich opponent, say a casino. Indeed if we assume for arguments sake that the casino has infinite amount of money, then the probability that the first player loses all his money is $lim_{N \rightarrow \infty} (1 -\frac{k}{N}) = 1$ i.e the game will (almost surely) not continue forever, despite even chance of winning or losing, no matter what finite amount of chips the first player starts with.

Below are some neat exercises for you to try.
They can all be solved by identifying a difference equation and solving it.

Exercise 1 (Stair Climbing – Easy):

In how many different ways can you climb a stair of $n$ steps by taking either one or two steps at a time?

For example you can climb a stair of $4$ steps in any of the following 5 ways:
$(2,2),(2,1,1),(1,2,1),(1,1,2),(1,1,1,1)$

(Hint: Do you recognize the difference equation from somewhere?)

Exercise 2 (Dividing A Pizza – Intermediate):

What is the maximum number of pieces you can get if you divide a pizza using $n$ straight cuts?

Note: pieces need not be of the same size, nor contain any crust

(Hint: What happens if you make a cut crossing all previous cuts, yet not crossing any of the existing intersection points?)