Archive for the ‘Theorem’ Category

Number Theory: Lecture 23

December 2, 2013

In which we think about how to find a factor of a large number.


Theorem 43: the Steinitz Exchange Lemma

November 7, 2012

There are some results in mathematics that are clearly very deep or interesting or surprising or useful.  There are others that seem rather innocuous at first, but that turn out to be just as deep or interesting or surprising or useful.  It seems to me that the Steinitz Exchange Lemma is definitely in the latter camp.

In order to write about this result, I have to say something about what a vector space is.  Informally, a vector space is a set of objects (vectors) that we can add and that we can multiply by scalars.  And the addition behaves nicely (officially, the set of vectors forms a group under addition), and the multiplication behaves nicely, and the two operations interact nicely.  In essence, everything behaves as we think it ought to.  The scalars come from another set, which has to be a field: we can add, subtract, multiply and divide scalars (except that we can’t divide by 0).  Let me give a couple of my favourite examples of vector spaces, which I hope will help.

One example is \mathbb{R}^n, which is a vector space over the field \mathbb{R} of the reals.  The vectors in this space are n-tuples of real numbers.  For example, when n = 2 this is just the familiar space of vectors in the plane.

Another is \mathbb{F}_p^n, which is a vector space over the field \mathbb{F}_p of integers modulo p (here p is a prime).  Vectors in this space are n-tuples of integers modulo p.

In order to get to the main topic of this post, I need to introduce a couple more concepts.  (This is all a good illustration of the flavour of university-level mathematics, where one has to get to grips with a bunch of concepts before being able to put them together in theorems!)

One is the idea of linear independence.  Let’s start with the familiar plane \mathbb{R}^2, and let’s take two non-zero vectors \mathbf{v} and \mathbf{w} in the plane.  (It’s conventional to typeset vectors in bold.)  They might point in completely different directions.  Or they might point in essentially the same direction: we might have \mathbf{w} = \lambda \mathbf{v} for some scalar \lambda.  In the former case we say that they’re linearly independent; in the latter that they’re linearly dependent.

If we take three vectors, then we could have something like \mathbf{u} = \left(\begin{array}{c} 1 \\ 0 \end{array}\right), \mathbf{v} = \left(\begin{array}{c}0 \\ 1 \end{array}\right), \mathbf{w} = \left(\begin{array}{c} 1 \\ 1 \end{array}\right).  In this case, any two of these vectors point in completely different directions, but if we consider all three then they are related: \mathbf{u} + \mathbf{v} = \mathbf{w}.  It isn’t quite that they point in the same direction, but there is a dependence between them: we can write \mathbf{w} in terms of \mathbf{u} and \mathbf{v}.  So any two of them are linearly independent, but the three of them are linearly dependent.

Formally, we say that the vectors \mathbf{v_1}, \dotsc, \mathbf{v_n} are linearly independent if the only way to have \lambda_1 \mathbf{v_1} + \dotsb + \lambda_n \mathbf{v_n} = \mathbf{0} is when \lambda_1 = \dotsb = \lambda_n = 0.  Putting that another way, we can’t write one of them in terms of the others.  (One can extend this definition to a set of infinitely many vectors, but let’s stick to finite collections in this post.)

I said that I needed to introduce a couple more concepts; here’s the second.  Let’s go back to our favourite example of the plane \mathbb{R}^2.  There’s something really special about the vectors \mathbf{u} = \left(\begin{array}{c}1 \\ 0 \end{array}\right) and \mathbf{v} = \left(\begin{array}{c} 0 \\ 1 \end{array}\right): we can write every vector in \mathbb{R}^2 as a linear combination of them.  For example, \left(\begin{array}{c} 3 \\ -7 \end{array}\right) = 3\mathbf{u} - 7 \mathbf{v}, and \left(\begin{array}{c} a \\ b \end{array}\right) = a \mathbf{u} + b \mathbf{v}.  We’re quite used to doing that, because \mathbf{u} and \mathbf{v} are familiar vectors.

Can we do the same sort of thing with u = \left(\begin{array}{c} 1 \\ 0 \end{array}\right) and w = \left(\begin{array}{c} 1 \\ 1 \end{array}\right)?  Well, \left(\begin{array}{c} 3 \\ -7 \end{array}\right) = 10 \mathbf{u} - 7 \mathbf{w}, and more generally \left(\begin{array}{c} a \\ b \end{array}\right) = (a-b) \mathbf{u} + b \mathbf{w}.  So yes we can!  But we couldn’t do it if we just had \mathbf{u}, because we can’t possibly write every vector in \mathbb{R}^2 as a multiple of \mathbf{u}.

We say that the vectors \mathbf{v_1}, \dotsc, \mathbf{v_n} span a vector space if every vector in the space can be written as a linear combination \lambda_1 \mathbf{v_1} + \dotsb + \lambda_n \mathbf{v_n} for some scalars \lambda_1, \dotsc, \lambda_n.  (Again, one can extend this definition to infinite collections of vectors, and again I’m not going to worry about that here.)

Now that we know what the words mean, we can get down to business…

So the question is: given a vector space, how large/small can a collection of linearly independent vectors be, and how large/small can a spanning collection of vectors be?  I strongly encourage you to think about this, perhaps with the help of some examples.



Theorem 42: Fermat’s Last Theorem

October 25, 2012

I’ve previously written about Fermat’s Little Theorem, but today it’s time for his (perhaps more famous) Last Theorem.  Today’s blog post comes in audio form, with help from Melvyn Bragg, Marcus du Sautoy and Samir Siksek.


Theorem 41: Gauss’s lemma

July 31, 2012

This result has a very uninformative name, I’m afraid, and it doesn’t identify the result uniquely, but it is the standard name so I’d better stick with it.

I’m going to write this post on the assumption that you know about Fermat’s little theorem and Euler’s criterion.  Fortunately, those links take you to places on this blog where you can read everything you need to know, and then you can come back here.

Ready?  Are you sitting comfortably?  Then we’ll begin.

Let’s fix an odd prime p, and some number a that is coprime to p.  We’d like to know whether a is a quadratic residue modulo p.  Putting that another way, we’d like to find the value of the Legendre symbol \genfrac{(}{)}{}{}{a}{p}.  Helpfully, Euler‘s criterion tells us that \genfrac{(}{)}{}{}{a}{p} \equiv a^{\frac{p-1}{2}} \pmod{p}.  I say “helpfully”, but it rather depends what we’re doing.  Euler’s criterion is great for some things.  For example, it makes it easy to compute \genfrac{(}{)}{}{}{-1}{p}.  But it’s not quite so easy to calculate \genfrac{(}{)}{}{}{2}{p} — we’d need to know 2^{\frac{p-1}{2}} modulo p.

So let’s think about how we might work out a^{\frac{p-1}{2}} modulo p (obviously in a way that leads to an answer that isn’t just the Legendre symbol again!).  We had a strategy for working out a^{p-1} in one proof of Fermat‘s little theorem (Proof 1 here).  Could we adapt that somehow, or use a similar idea?  I encourage you to try it before reading on. (more…)

Theorem 40: Sperner’s lemma about coloured triangles

February 25, 2012

I have a difficulty with this post, namely that I’ve previously written about a result called Sperner’s lemma, and this is a different one!  I hope this won’t cause any confusion.

As the title suggests, this post is about coloured triangles.  That’s not quite accurate, actually, but it was the best I could do in a short title.  Now that I have more space, let me be a bit more precise.

Draw a triangle.  Any triangle.

Subdivide it into little triangles, in any way you like.

Like this:  or like this:  for example.

Now you need three coloured pens.  I’m going to assume that they are blue, red and green.

Colour one vertex of your triangle blue, one red, and one green.

OK so far?

Now you’re going to colour the vertices of all the little triangles.  There are just two rules.

If a vertex is on the edge with red at one end and blue at the other, then you can colour it either red or blue but not green.  And similarly for the other edges: you can use either of the colours at the end of the edge, but not the third.

And you can colour the vertices inside the triangle in any way you choose.   Here’s my example.

All done?

Now I predict that … actually, I’ll wait to make my prediction.  Why don’t you see whether you can guess what my prediction is, and then you can check over the fold.

You might also like to try the Triangle Game on NRICH.


Theorem 39: Euler’s criterion

February 4, 2012

I’ve previously written about Fermat’s Little Theorem, which tells us that if p is a prime and a is not divisible by p, then a^{p-1} \equiv 1 \pmod{p}.  That’s not terribly exciting when p = 2, so let’s concentrate on odd primes p in this post.  Then \frac{p-1}{2} is a whole number, so it makes sense (in the modular arithmetic world) to let x = a^{\frac{p-1}{2}}.  Then Fermat‘s Little Theorem tells us that x^2 \equiv 1 \pmod{p}.

How many possibilities for x are there (modulo p, of course), if we know that x^2 \equiv 1 \pmod{p}?

Well, that congruence tells us that (x - 1)(x + 1) \equiv 0 \pmod{p}.  But, in the jargon, “there are no zero-divisors modulo a prime”.  That is, if mn \equiv 0 \pmod{p}, then m \equiv 0 \pmod{p} or n \equiv 0 \pmod{p}.  (Note that we really do need p to be prime here — try some examples!)

So if x is such that x^2 \equiv 1 \pmod{p}, then x \equiv \pm 1 \pmod{p}.  Which is pretty nice.  (Again, we really needed p to be prime here.  Try working modulo 8 instead, and see what happens there!)

Back to where we started.  We said that if a is not divisible by the odd prime p, then (a^{\frac{p-1}{2}})^2 \equiv 1 \pmod{p}, so we know know that a^{\frac{p-1}{2}} \equiv \pm 1 \pmod{p}.

That narrows down the possibilities a lot.  It’s no good trying to tell me that 19^{50} \equiv 7 \pmod{101} — I simply shan’t believe you (even without doing any calculations).  But tell me that 19^{50} \equiv 1 \pmod{101}, or that 19^{50} \equiv -1 \pmod{101}, and I have a problem — how do I know which is correct? (more…)

Theorem 38: There is a primitive root modulo a prime

October 23, 2011

This is something that I was lecturing on the other day.  There I was talking to third-year Cambridge undergraduates, who know quite a bit of mathematics about groups and the like, and so I presented the topic accordingly.  My challenge in this post is to describe some of the ideas in a way that assumes less technical background, since my aim is wherever possible to make these posts accessible to people who haven’t studied maths at university.  I am going to assume that people know a small amount of modular arithmetic, but that’s OK because I’ve written about it previously.

So let’s begin with numerical examples, since one of the nice features of this sort of number theory is that one can get stuck in and try ideas explicitly quite easily (the objects we’re studying are, after all, just natural numbers).

Let’s work modulo 7 (for no particular reason, I just wanted a smallish prime).  I’m interested in powers.

For example, modulo 7 the powers of 2 are 1, 2, 4, 1, 2, 4, … (dots because it seems to be repeating).

And the powers of 3 are 1, 3, 2, 6, 4, 5, 1, 3, … (oh, once I get back to 1, I can see that it will keep repeating).

And the powers of 4 are 1, 4, 2, 1, 4, ….

Powers of 5: 1, 5, 4, 6, 2, 3, 1, 5, ….

Powers of 6: 1, 6, 1, 6, … (interesting, I got back to 1 very quickly that time).

Lots of interesting things to explore there.

One observation is that we always seem to get back to 1 at some point.  But we know from Fermat’s Little Theorem that this should be the case: we know that a^{p-1} \equiv 1 \pmod{p}, so in this case we were bound to get back to 1 after 6 steps (if not sooner).  In this post, I’d like to think about when we get back to 1.  In the case of 6, we got back to 1 very quickly; in the case of 3 it took rather longer.

We have a name for this idea, this number of steps it takes to get back to 1.  It’s called the order.  Officially, the order of the element a modulo p is the least positive integer d such that a^d \equiv 1 \pmod{p}.  So the order of 2 modulo 7 is 3, and the order of 3 modulo 7 is 6.

If you know about Lagrange’s theorem about finite groups, then you’ll know that the order of an element divides the order of the group.  In our case, we’re looking at the multiplicative group of non-zero integers modulo a prime p, which has order p-1.  So the order of a must divide p-1.  (In the example above, you can check that the order of each element divides 6.)  In this post, I want to concentrate on how many elements have order d, for each divisor d of p-1.

At this point you might like to go and try some more examples, to make some conjectures before you read on.


Theorem 37: Sperner’s lemma

October 10, 2010

I’ve written about various theorems in combinatorics so far, but I don’t think any of them have the flavour of today’s theorem.

We’re going to concentrate on the set \{1, 2, \dotsc, n\}, which I might sometimes write as [n] (that’s standard notation in combinatorics).  This set has lots of subsets, such as \{1, 5\}, and [n] itself, and \emptyset (the empty set, which contains no elements at all).  How many such subsets? 

Well, there are n elements, each of which either belongs or does not belong to any given subset.  So there are 2 possibilities for whether a given element belongs to a given subset, so 2^n subsets in total.  Write out the subsets of \{1, 2, 3\}, say, if you don’t believe me!

How can we think about these subsets?  There are two helpful pictures I have in mind; sometimes one is more convenient, sometimes the other.

Here’s the first, which might explain why mathematicians sometimes talk of this object (the family of subsets of [n]) as the cube.  My limitations when it comes to graphics software mean that I’ve written \{\} for \emptyset, the empty set.

The subsets of {1,2,3} displayed as a cube

Hopefully this picture will make it all clear, and I don’t need to say anything.  But there is one thing I want to say.  This picture shows a graph (in the combinatorics sense), in which the vertices are the subsets of [n], and two subsets are joined exactly when they differ by just one element.  By positioning the vertices carefully, I end up with a picture that we all recognise as a cube.  But if you’re struggling to imagine a 15-dimensional cube, don’t worry about it: we can think about it more abstractly, without needing to draw a picture.  (I can draw representations of the 4-, 5- and 6-dimensional cubes if necessary, though — I’ll leave that for you to think about.)

What’s the other way of thinking about this family of subsets?  I’m going to call this the layered model (I don’t know of a standard term for it).  Each layer contains all the subsets of the same size.  So here it is for n=3.  I’ve written in what the various subsets are, but normally I’d just draw the blobs without the subsets.

Layered model

Notice that the top and bottom blobs are the same size because they contain the same number of sets, and similarly the second and third blobs are the same size.  In general, the picture would look something like this.

The layered model

The picture on the left is for n even; the picture on the right is for n odd.  Why the difference?  If n is even, then the largest layer will correspond to subsets of size n/2 (exercise: prove this!).  If n is odd, then there are two largest layers, corresponding to subsets of size \lfloor n/2 \rfloor and \lceil n/2 \rceil (that is, (n-1)/2 and (n+1)/2).  Again, I’ll leave this for you to prove.

Now, what sort of questions can we ask about the family of subsets of [n]?

Here’s one.  The family contains some chains.  Here’s a linguistic analogy.  Have you come across those word puzzles where each answer uses the letters of the previous answer, plus a new letter?  For example, the chain of answers might be: A, AT, PAT, TAPS, PASTE, PLATES, STAPLER, PRATTLES (to give an example off the top of my head).  A chain in the family of subsets uses the same idea, but adding new elements of [n] each time, rather than new letters.  For example, here’s a chain in [4]: the collection \emptyset, \{1\}, \{1,2\}, \{1,2,3\}, \{1,2,3,4\}.  And here’s another: \{2\}, \{2,4\}, \{1,2,3,4\}.

Formally, a chain is a collection of subsets where for each pair of subsets A and B in the chain, either A \subseteq B or B \subseteq A.  So the subsets are nicely nested.    Get the idea?

So what’s the question?  If you’re like me, you’ll have looked at my chain of words above, and wondered whether we could add another word on the end.  (I didn’t try very hard, so it may well be possible.)  And could I have made a longer chain, if I’d started in a different place or chosen a different option?  In fact, what’s the longest possible chain of words like this (bearing in mind that we have 26 letters to choose from)?

Well, we could ask the same question about chains in [n].  (And since we’re only after subsets, rather than valid English words, it might be an easier problem!)  So: what’s the length of the longest chain in [n]?

It’s pretty easy to see that we can always have a chain of length n+1.  I gave an example of such a chain when n=4, and hopefully you can see how this generalises.  (Notice that I’m not claiming this is the only example of a long chain: there are lots!)

Can we ever have a longer chain?

Here’s where the layer picture helps us a lot.  I’m imagining a chain as a line working from top to bottom (or bottom to top!) through the layers.  Could the chain ever have two sets from the same layer?  Err…quick think…no.  (Do you agree?)  So any chain contains at most one subset from each layer.  And how many layers are there?  Err…quick think…n+1, right?  Oh, but that means any chain has length at most n+1, and since we’ve already seen that length n+1 is possible, that must be the answer!

That didn’t detain us for very long.  (I bet it’s much harder to solve the problem with words!)

Let’s ask a different question, perhaps the opposite question, in a way.  What happens if, instead of insisting that every subset contains or is contained in every other subset, we instead say that no subset can contain or be contained in another?  (I feel confident that one could generate a very, very long list of words like that, so I’m not even going to bother!)  We’ll call the resulting objects antichains.  (Some people call them Sperner families, but I prefer `antichain’ because it seems more descriptive.)

Here’s an example in [4]: \{1,2,3\}, \{1,2,4\}, \{2,3,4\}.  No subset contains or is contained in any other, right?  Could we make that antichain longer?  Yes, we could add \{1,3,4\}, but not anything else.  (Check this!)  But could we have made a longer antichain by starting in a different place?  What’s the length of the longest possible antichain?  And what’s the length of the longest possible antichain for general [n]? (more…)

Theorem 36: the Cantor set is an uncountable set with zero measure

September 30, 2010

This week’s post is by Laura Irvine, who is just about to start her second year reading Mathematics at Murray Edwards College, Cambridge.  Thanks very much, Laura!

First of all, what is the Cantor set?

To form the Cantor set, start with the closed interval [0,1] (this means 0 and 1 are included in the interval) and remove the middle open third of the interval (i.e. remove (\frac{1}{3}, \frac{2}{3}) where the curved brackets mean the interval is open, so \frac{1}{3} and \frac{2}{3} are not themselves in the interval).  You should be left with two disjoint closed intervals: [0, \frac{1}{3}] and [\frac{2}{3}, 1].  I’m going to call this step the first iteration.

Then do the same thing to each of these intervals: remove the middle third of each to get the new intervals as [0, \frac{1}{9}], [\frac{2}{9}, \frac{1}{3}], [\frac{2}{3}, \frac{7}{9}], [\frac{8}{9}, 1].  Then remove the middle third of each of these intervals.  Keep repeating this process and the Cantor set is the set of all the points in the interval [0,1] that are never removed.  So the first few steps of the process look like this:

The Cantor set

Do you understand how this set is formed?  Can you think of some points that are in the Cantor set?

Well, 0 will never be removed: the first closed interval after the n^{\textrm{th}} iteration would be [0, \frac{1}{3^n}] so 0 will be in the infinite intersection of the first interval of each step.  The other interval endpoints are points in the set too.  It turns out that there are also points that are in the set that aren’t interval endpoints.

An interesting and, in my opinion, rather surprising property of the Cantor set is that it has measure 0, despite being an uncountable set!  The fact it is uncountable means there is no way of writing all the numbers in the Cantor set in a list.

So, intuitively, this is saying that if all the points in the Cantor set were lined up next to each other, the line would have length 0 and yet there are infinitely many points in the set.  How weird is that?!


Theorem 35: the best rational approximations come from continued fractions

September 8, 2010

Like Theorem 31, this post is based on a session that I did with some school students.  And as I did there, I encourage you to try the activities for yourself (if you haven’t done them before).  I think that the students enjoyed the session; I hope that you enjoy the post!

Here’s a strange looking thing (if you haven’t seen anything like it before):

1 + \frac{1}{1 + \frac{1}{1 + \frac{1}{1 + \frac{1}{1 + \frac{1}{1 + ...}}}}}

What can you say about it?  Can you compute it?  Does it make sense?

That was an infinite thing (that’s what the dots were supposed to indicate).  Here are the first few finite chunks of it:

1, 1 + \frac{1}{1}, 1 + \frac{1}{1 + \frac{1}{1}}, 1 + \frac{1}{1 + \frac{1}{1 + \frac{1}{1}}}, …

What can you say about them?  Any patterns?  Can you explain any patterns?  Can you predict what the nth one will look like?  I recommend also looking at the differences between consecutive terms in the sequence.  Can you say anything interesting about them?