Theorem 43: the Steinitz Exchange Lemma

There are some results in mathematics that are clearly very deep or interesting or surprising or useful.  There are others that seem rather innocuous at first, but that turn out to be just as deep or interesting or surprising or useful.  It seems to me that the Steinitz Exchange Lemma is definitely in the latter camp.

In order to write about this result, I have to say something about what a vector space is.  Informally, a vector space is a set of objects (vectors) that we can add and that we can multiply by scalars.  And the addition behaves nicely (officially, the set of vectors forms a group under addition), and the multiplication behaves nicely, and the two operations interact nicely.  In essence, everything behaves as we think it ought to.  The scalars come from another set, which has to be a field: we can add, subtract, multiply and divide scalars (except that we can’t divide by 0).  Let me give a couple of my favourite examples of vector spaces, which I hope will help.

One example is \mathbb{R}^n, which is a vector space over the field \mathbb{R} of the reals.  The vectors in this space are n-tuples of real numbers.  For example, when n = 2 this is just the familiar space of vectors in the plane.

Another is \mathbb{F}_p^n, which is a vector space over the field \mathbb{F}_p of integers modulo p (here p is a prime).  Vectors in this space are n-tuples of integers modulo p.

In order to get to the main topic of this post, I need to introduce a couple more concepts.  (This is all a good illustration of the flavour of university-level mathematics, where one has to get to grips with a bunch of concepts before being able to put them together in theorems!)

One is the idea of linear independence.  Let’s start with the familiar plane \mathbb{R}^2, and let’s take two non-zero vectors \mathbf{v} and \mathbf{w} in the plane.  (It’s conventional to typeset vectors in bold.)  They might point in completely different directions.  Or they might point in essentially the same direction: we might have \mathbf{w} = \lambda \mathbf{v} for some scalar \lambda.  In the former case we say that they’re linearly independent; in the latter that they’re linearly dependent.

If we take three vectors, then we could have something like \mathbf{u} = \left(\begin{array}{c} 1 \\ 0 \end{array}\right), \mathbf{v} = \left(\begin{array}{c}0 \\ 1 \end{array}\right), \mathbf{w} = \left(\begin{array}{c} 1 \\ 1 \end{array}\right).  In this case, any two of these vectors point in completely different directions, but if we consider all three then they are related: \mathbf{u} + \mathbf{v} = \mathbf{w}.  It isn’t quite that they point in the same direction, but there is a dependence between them: we can write \mathbf{w} in terms of \mathbf{u} and \mathbf{v}.  So any two of them are linearly independent, but the three of them are linearly dependent.

Formally, we say that the vectors \mathbf{v_1}, \dotsc, \mathbf{v_n} are linearly independent if the only way to have \lambda_1 \mathbf{v_1} + \dotsb + \lambda_n \mathbf{v_n} = \mathbf{0} is when \lambda_1 = \dotsb = \lambda_n = 0.  Putting that another way, we can’t write one of them in terms of the others.  (One can extend this definition to a set of infinitely many vectors, but let’s stick to finite collections in this post.)

I said that I needed to introduce a couple more concepts; here’s the second.  Let’s go back to our favourite example of the plane \mathbb{R}^2.  There’s something really special about the vectors \mathbf{u} = \left(\begin{array}{c}1 \\ 0 \end{array}\right) and \mathbf{v} = \left(\begin{array}{c} 0 \\ 1 \end{array}\right): we can write every vector in \mathbb{R}^2 as a linear combination of them.  For example, \left(\begin{array}{c} 3 \\ -7 \end{array}\right) = 3\mathbf{u} - 7 \mathbf{v}, and \left(\begin{array}{c} a \\ b \end{array}\right) = a \mathbf{u} + b \mathbf{v}.  We’re quite used to doing that, because \mathbf{u} and \mathbf{v} are familiar vectors.

Can we do the same sort of thing with u = \left(\begin{array}{c} 1 \\ 0 \end{array}\right) and w = \left(\begin{array}{c} 1 \\ 1 \end{array}\right)?  Well, \left(\begin{array}{c} 3 \\ -7 \end{array}\right) = 10 \mathbf{u} - 7 \mathbf{w}, and more generally \left(\begin{array}{c} a \\ b \end{array}\right) = (a-b) \mathbf{u} + b \mathbf{w}.  So yes we can!  But we couldn’t do it if we just had \mathbf{u}, because we can’t possibly write every vector in \mathbb{R}^2 as a multiple of \mathbf{u}.

We say that the vectors \mathbf{v_1}, \dotsc, \mathbf{v_n} span a vector space if every vector in the space can be written as a linear combination \lambda_1 \mathbf{v_1} + \dotsb + \lambda_n \mathbf{v_n} for some scalars \lambda_1, \dotsc, \lambda_n.  (Again, one can extend this definition to infinite collections of vectors, and again I’m not going to worry about that here.)

Now that we know what the words mean, we can get down to business…

So the question is: given a vector space, how large/small can a collection of linearly independent vectors be, and how large/small can a spanning collection of vectors be?  I strongly encourage you to think about this, perhaps with the help of some examples.

Let’s see.  In our favourite example of the plane, \mathbb{R}^2, we had examples of one or two linearly independent vectors (e.g. \mathbf{u} is linearly independent, and also \mathbf{u}, \mathbf{v} are linearly independent), but we didn’t have three linearly independent vectors.

And in the same example we had examples of pairs of vectors that span the space (e.g. \mathbf{u}, \mathbf{v}), but we didn’t have a single vector that would span the space.

And that example is rather typical.

Thinking about it slightly more abstractly, it seems that it ought to be quite easy for a small collection of vectors to be linearly independent, but that it might be rather harder for a large collection of vectors to be linearly independent.  The more vectors we chuck in, the more likely it is that one of the new ones can be written in terms of the old ones.

And it seems that it might be rather easy for a large collection of vectors to span a vector space, but that it might be harder for a small collection of vectors to span the space.  The more vectors we remove from the collection, the more likely it is that we can no longer write every vector in the space using the remaining ones.

So, roughly speaking, we expect collections of linearly independent vectors to be ‘small’, and spanning sets of vectors to be ‘large’.

But how do ‘small’ and ‘large’ compare?  Could we have a collection of linearly independent vectors that’s larger than a spanning set of vectors?  Or could we have a collection of linearly independent vectors that’s the same size as a spanning set of vectors?

This is where the Steinitz Exchange Lemma comes in.  I’ll state it formally, and then try to unravel it in terms of our thinking so far.

Theorem (Steinitz Exchange Lemma): Let \mathbf{v_1}, \dotsc, \mathbf{v_m} be a collection of linearly independent vectors in a vector space V.  Let \mathbf{w_1}, \dotsc, \mathbf{w_n} be a collection of vectors that span V.  Then m \leq n, and moreover, possibly after reordering the \mathbf{w_i}, the set \mathbf{v_1}, \dotsc, \mathbf{v_m}, \mathbf{w_{m+1}}, \dotsc, \mathbf{w_n} spans V.

Let me try to decode that.  There are two assertions here.

The first is that m \leq n.  In words, this says that any linearly independent set is at most as big as any spanning set.  There is no linearly independent set that is bigger than a spanning set.  That matches our intuition that linearly independent sets are ‘small’ and spanning sets are ‘large’.

The second part is that if we take any old bunch of linearly independent vectors, and any old spanning set of vectors, then we can borrow some of the spanning set to extend the linearly independent set into a set that will span the whole vector space.  That’s rather important: without this result, it might have been that there were some linearly independent sets that were so awkward that they could never be extended to span the whole space.  (The business about reordering the \mathbf{w_i} is because it might be that we haven’t numbered them helpfully.  What the result says is that we can take n-m of the \mathbf{w_i} and use them to turn the linearly independent set into a spanning set.  It doesn’t say anything about which \mathbf{w_i} we should use.)

The really profound conclusion of the Steinitz Exchange Lemma is that it makes sense to talk about the dimension of a vector space.  You might have been wondering why I wasn’t mentioning the dimension — but we didn’t previously know that it made sense!  We say that a set is a basis if it is both linearly independent and spanning.  (So \{\mathbf{u}, \mathbf{v}\} and \{\mathbf{u}, \mathbf{w}\} are both bases for the plane \mathbb{R}^2, for example.)  We’d like to say that the dimension is the size of a basis, so that the dimension of the plane \mathbb{R}^2 would be 2, for example.  But in order for that definition to make sense, one has to know that every basis has the same size.  It would be a disaster if our vector space was both 7-dimensional and 8-dimensional!  (One also has to know that every vector space has a basis.  But that’s not too hard, at least for vector spaces that have finite spanning sets — and in this post we’re ignoring vector spaces that are larger than that.)

Let’s suppose that we have two bases for a vector space.  Then the Steinitz Exchange Lemma immediately tells us that they have the same size (by taking each in turn as the linearly independent/spanning set in the theorem).  So our notion of ‘dimension’ does make sense.  Phew!

The Steinitz Exchange Lemma also tells us that if we take a vector space with a finite spanning set, and a linearly independent set within that vector space, then we can extend our linearly independent set to a basis for the whole space.  Which is extremely useful.

One final comment.  We are sometimes interested in vector spaces where we can add, subtract and multiply scalars but where we cannot necessarily divide by scalars.  For example, it might be that the scalars come from the ring of integers, where we cannot divide by 2 (because doing so takes us outside the realm of the integers).  The resulting objects aren’t called vector spaces (because we have to be able to divide by non-zero scalars for them to be vector spaces), but are instead called modules.  It turns out that the proof of the Steinitz Exchange Lemma uses the fact that we can divide by non-zero scalars in a really crucial way, and there is no analogue of the Steinitz Exchange Lemma for modules.  We can’t talk about the dimension of a module.  This has all sorts of exciting and unexpected consequences!

So, although it doesn’t look as exciting or surprising as some results, the Steinitz Exchange Lemma is in fact vital.  Don’t overlook it!

Further reading

Of course, Wikipedia has something to say about the Steinitz Exchange Lemma.  And any book or set of notes on linear algebra must touch on this result somewhere, so find your favourite!

[I am grateful to Gareth Taylor for his helpful comments on an initial version of this post.]

Advertisements

4 Responses to “Theorem 43: the Steinitz Exchange Lemma”

  1. kindageeky / Steve H (@kindageeky) Says:

    Thanks for this post, it really helped me grok the exchange lemma and the relation to basis. Honestly the best explanation I’ve ever seen, very accessible.

  2. Ritvik Agarwal Says:

    Dear Sir,
    I found your post really very helpful.
    I understood the lemma before ending up at this site, but couldn’t feel it in my mind.
    Your post cleared the blurred picture I had made in my mind.

  3. Anthony Fleming Says:

    This was incredibly helpful. Thanks so much.

  4. David Says:

    Great exposition of linear algebra. I’d been trying to teach it to myself with Wikipedia, but I got stuck on the equivalence of row and column rank because the proof they give rests on assuming Steinitz (I’ve now added a note to that effect). Before I knew that the theorem had a name, I was able to come up with my own proof of it, though it was convoluted. I showed that n linearly independent vectors in a space with an n basis span the space, by applying specific linear operations. Thus, no independent set can be larger than any basis. Since every vector space has a basis, and every spanning set can be reduced to a basis, this means that the cardinality of a spanning set is always greater than or equal to that of an independent set. But I was surprised and delighted to see the other part about augmenting an independent set with a spanning set. Anyhow, the standard proof by induction makes a lot more sense than mine.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s


%d bloggers like this: