Archive for the ‘IB Groups, Rings and Modules’ Category

Theorem 43: the Steinitz Exchange Lemma

November 7, 2012

There are some results in mathematics that are clearly very deep or interesting or surprising or useful.  There are others that seem rather innocuous at first, but that turn out to be just as deep or interesting or surprising or useful.  It seems to me that the Steinitz Exchange Lemma is definitely in the latter camp.

In order to write about this result, I have to say something about what a vector space is.  Informally, a vector space is a set of objects (vectors) that we can add and that we can multiply by scalars.  And the addition behaves nicely (officially, the set of vectors forms a group under addition), and the multiplication behaves nicely, and the two operations interact nicely.  In essence, everything behaves as we think it ought to.  The scalars come from another set, which has to be a field: we can add, subtract, multiply and divide scalars (except that we can’t divide by 0).  Let me give a couple of my favourite examples of vector spaces, which I hope will help.

One example is \mathbb{R}^n, which is a vector space over the field \mathbb{R} of the reals.  The vectors in this space are n-tuples of real numbers.  For example, when n = 2 this is just the familiar space of vectors in the plane.

Another is \mathbb{F}_p^n, which is a vector space over the field \mathbb{F}_p of integers modulo p (here p is a prime).  Vectors in this space are n-tuples of integers modulo p.

In order to get to the main topic of this post, I need to introduce a couple more concepts.  (This is all a good illustration of the flavour of university-level mathematics, where one has to get to grips with a bunch of concepts before being able to put them together in theorems!)

One is the idea of linear independence.  Let’s start with the familiar plane \mathbb{R}^2, and let’s take two non-zero vectors \mathbf{v} and \mathbf{w} in the plane.  (It’s conventional to typeset vectors in bold.)  They might point in completely different directions.  Or they might point in essentially the same direction: we might have \mathbf{w} = \lambda \mathbf{v} for some scalar \lambda.  In the former case we say that they’re linearly independent; in the latter that they’re linearly dependent.

If we take three vectors, then we could have something like \mathbf{u} = \left(\begin{array}{c} 1 \\ 0 \end{array}\right), \mathbf{v} = \left(\begin{array}{c}0 \\ 1 \end{array}\right), \mathbf{w} = \left(\begin{array}{c} 1 \\ 1 \end{array}\right).  In this case, any two of these vectors point in completely different directions, but if we consider all three then they are related: \mathbf{u} + \mathbf{v} = \mathbf{w}.  It isn’t quite that they point in the same direction, but there is a dependence between them: we can write \mathbf{w} in terms of \mathbf{u} and \mathbf{v}.  So any two of them are linearly independent, but the three of them are linearly dependent.

Formally, we say that the vectors \mathbf{v_1}, \dotsc, \mathbf{v_n} are linearly independent if the only way to have \lambda_1 \mathbf{v_1} + \dotsb + \lambda_n \mathbf{v_n} = \mathbf{0} is when \lambda_1 = \dotsb = \lambda_n = 0.  Putting that another way, we can’t write one of them in terms of the others.  (One can extend this definition to a set of infinitely many vectors, but let’s stick to finite collections in this post.)

I said that I needed to introduce a couple more concepts; here’s the second.  Let’s go back to our favourite example of the plane \mathbb{R}^2.  There’s something really special about the vectors \mathbf{u} = \left(\begin{array}{c}1 \\ 0 \end{array}\right) and \mathbf{v} = \left(\begin{array}{c} 0 \\ 1 \end{array}\right): we can write every vector in \mathbb{R}^2 as a linear combination of them.  For example, \left(\begin{array}{c} 3 \\ -7 \end{array}\right) = 3\mathbf{u} - 7 \mathbf{v}, and \left(\begin{array}{c} a \\ b \end{array}\right) = a \mathbf{u} + b \mathbf{v}.  We’re quite used to doing that, because \mathbf{u} and \mathbf{v} are familiar vectors.

Can we do the same sort of thing with u = \left(\begin{array}{c} 1 \\ 0 \end{array}\right) and w = \left(\begin{array}{c} 1 \\ 1 \end{array}\right)?  Well, \left(\begin{array}{c} 3 \\ -7 \end{array}\right) = 10 \mathbf{u} - 7 \mathbf{w}, and more generally \left(\begin{array}{c} a \\ b \end{array}\right) = (a-b) \mathbf{u} + b \mathbf{w}.  So yes we can!  But we couldn’t do it if we just had \mathbf{u}, because we can’t possibly write every vector in \mathbb{R}^2 as a multiple of \mathbf{u}.

We say that the vectors \mathbf{v_1}, \dotsc, \mathbf{v_n} span a vector space if every vector in the space can be written as a linear combination \lambda_1 \mathbf{v_1} + \dotsb + \lambda_n \mathbf{v_n} for some scalars \lambda_1, \dotsc, \lambda_n.  (Again, one can extend this definition to infinite collections of vectors, and again I’m not going to worry about that here.)

Now that we know what the words mean, we can get down to business…

So the question is: given a vector space, how large/small can a collection of linearly independent vectors be, and how large/small can a spanning collection of vectors be?  I strongly encourage you to think about this, perhaps with the help of some examples.