I used to get cross during English lessons at school. This happened for various reasons, but one regular cause was poetry analysis. Specifically, I objected to apparently being asked to say “Ah, yes, alliteration” every time two or more consecutive words began with the same letter or sound, regardless of whether it was reasonable to suppose that the poet was deliberately aiming for an alliterative effect. It seemed to me that since there is only a very limited range of letters available, there were bound to be some patterns that were just due to coincidence. Leaving aside my English lessons, was I right? If I just write words at random, will there necessarily be some patterns in the first letters of those words? Obviously there won’t necessarily be consecutive words with the same first letters, but perhaps there will be other patterns.
Let’s start by making the problem simpler (a favourite strategy of mathematicians!). Let’s say that we have only two letters, A and B, and let’s suppose that we have some long string of these letters. Here are some examples.
What patterns can we see here? Strings 1 and 2 have obvious patterns. What about string 3? Here’s one pattern:
The red letters are evenly spaced. That’s the sort of pattern we found in string 2: evenly spaced As (or Bs). Of course, there were a lot more evenly spaced As in string 2; I had to stop colouring As in string 3 where I did because the next letter at the right spacing is a B. But nonetheless there is some pattern there.
Incidentally, I wanted to choose a third string that looked pretty “random”, but that you would be able to check was “random”, to prove that I hadn’t faked it to have a pattern! So I looked at the first thirty digits after the decimal point in , and wrote A if the digit was odd and B if it was even (at least if I didn’t make a slip along the way!).
Will there always be some sort of pattern of evenly spaced letters, whatever the sequence is? How many evenly spaced letters can we expect to find?
Finding two evenly spaced letters that are the same
Let’s start with a relatively simple observation. Let’s stick to our two letters, A and B, and let’s aim to find two evenly spaced letters that are the same. Well, “two evenly spaced letters” really just means “two letters”. So we want to have two letters that are the same. If we only have two letters in the string, there might not be two letters that are the same (the string could be AB). But if we have three (or more) letters, each A or B, then we must have two the same. Hopefully this is fairly obvious to you, but one can be rather formal and say that it follows from the rather wonderfully named pigeonhole principle.
Finding three evenly spaced letters that are the same
What if we still only have two letters, A and B, but now we want to know that there will be three evenly spaced letters that are the same? We’re going to need enough letters in the string, or there might not be these evenly spaced letters (just as when we looked for two letters that are the same we needed at least three letters in the string). So let’s assume that we have a long string (I think that at least 350 letters will turn out to be enough for this argument), and then hopefully we’ll be able to show that it must contain three evenly spaced letters that are the same.
So far, we know that in any block of three letters there must be two that are the same. How can we use that to help us?
One scenario would be that we have two points that are the same, and then if we hop along to the next letter with the same spacing, it’s the same letter. For example, ABBAABA — the third red letter is conveniently the same as the first two, and so we have three evenly spaced letters that are the same.
But what if that doesn’t happen? That is, we’ve got two letters that we know are the same (say A), but the third in that sequence (whatever the spacing is) is different (so B). I’ll call these As and B “special”, to distinguish them from the other letters in the block. Let’s represent the block by a diagram.
These are three evenly spaced letters in a block of some length.
What would happen if we had two identical blocks like this?
(There might be some letters between the two blocks, but that’s not important.)
We need to measure some distances here. Let’s say the distance between the first two As is (so there’s an A, then other letters, then an A). And let’s say that the distance from the start of the first block to the start of the second is .
Now we want to look for three evenly spaced letters that are the same.
We’re not going to get anything by taking both As in one block. Let’s think about what happens if we take the first special A in the first block and the second special A in the second block. What’s the distance between them? It’s .
What can we say about the letter that’s letters to the right of the second special A in the second block? If it’s an A, then we’ve got three evenly spaced As, right?
What if it’s a B? Then we have three Bs; let’s see what the spacings are.
Ah, the spacings are both — so we have three evenly spaced Bs.
Either way, we win!
Are we finished? Not quite: we imagined that we had two identical blocks, but we didn’t prove that this will always be the case. Let’s do that now.
We can assume that we’re looking at blocks of five letters. Why? Well, in the first three letters of a block of length five, two must be the same, and then if we hop along by the appropriate amount to find the third point, we’ll still be inside the five letters of the block.
How many possible strings of five letters are there? There are two possibilities for the first letter, two for the second, two for the third, two for the fourth, and two for the fifth. So there are possible strings.
There are 32 possible blocks, and we want to guarantee that we have two that are the same. So let’s take 33 blocks, and that will do the job. (This is using the pigeonhole principle, just as we did earlier when we argued that in a block of three letters, two must be the same.) That’s why I said earlier that 350 letters in the string would be enough. We can split up the first letters into blocks of five letters, and find two blocks that are the same. Whichever those blocks are, the third point that we generate will still be in the string; that’s why I allowed more letters.
Summary of the proof
Let’s recap the argument. We split the first half of our long string (of at least 350 letters) of As and Bs into blocks of length five. We have at least thirty-three blocks, so two must be the same. If this repeated block contains three evenly spaced letters that are the same, then we’re done. If not, then we have a pattern like
in these two identical blocks. Then we look at the sequence of three evenly spaced letters starting with the first special A in the first block and the second special A in the second block. (Note that we have so many letters that we really can always find a third letter with the right spacing.) If the third letter in this sequence is an A, then we’ve got three evenly spaced As. If it’s a B, then we check the distances and find that the special Bs in the blocks and this extra B form a sequence of three evenly spaced Bs. Either way, we’re done.
We’ve proved a special case of a theorem of Bartel van der Waerden. Let me tell you the general theorem.
Theorem (van der Waerden) Imagine that we have an alphabet of letters, and fix a positive whole number . If we have a sufficiently long string of letters from this alphabet, then there are evenly spaced letters that are the same.
(Here, “sufficiently long” depends on both and .)
So the theorem tells us that whatever string we have, if it’s long enough then it must contain evenly spaced letters that are the same!
We had and proved the cases and . (The case of is not very interesting!) The proof of the general theorem uses the same ideas that we used above, using the result for smaller values of and to deduce the next step up. (This is a rather vague description of proof by induction.)
By the way, finding what “sufficiently long” above means is an interesting problem. There are various bounds known, but they’re pretty bad. The 350 I gave above was a huge overestimate for this case: the answer is actually 9!
We could ask what happens if we take an infinite string of letters from a fixed (finite) alphabet. You might like to think about whether it’s necessarily possible to find infinitely many evenly spaced letters that are all the same.
I learnt about van der Waerden’s theorem in a lecture course in Cambridge by Imre Leader. I’m afraid that I don’t really know what good books there are on the subject. Feel free to leave recommendations below! The Wikipedia page has some information about what bounds are known — a good example of very large numbers cropping up in actual mathematics!