## Theorem 16: the inclusion-exclusion principle

I’m supervising first-year undergraduates for Probability this term, and found myself discussing the Inclusion-Exclusion Principle with some of them earlier.  It’s a very nice idea.

I grew up in a house with three cats.  At meal times, each had his or her own bowl.  But being cats, they never wanted to eat from their own bowls: the bowl next door always looked nicer (despite the food having come from the same tin).  In how many ways could they all eat from the wrong bowl?

(With apologies to the cats, who were all very beautiful tabbies, and not those colours at all.  My skills at drawing pictures on my computer are limited!)

Here they are with their rightful bowls.  In how many ways can we mix them up so that no cat gets the right bowl?  Well, grey cat had better not have the blue bowl, so it can have the green bowl or the yellow bowl.  Let’s think about the former of those first.  If grey cat has the green bowl, then we’d better make sure that ginger cat has the blue bowl (because it isn’t allowed the yellow one).  And then white cat must have the yellow bowl.  So once we’ve decided that grey cat has the green bowl, there’s only one possibility: this one.

What if grey cat takes a fancy to the yellow bowl?  Then white cat must get the blue bowl (because it’s not allowed the green bowl), and ginger cat must get the blue bowl.  So again just one possibility, namely this one.

So just two ways for each cat to get the wrong bowl, and it was pretty easy to work that out (and be sure that we’d found them all).

What would have happened if there had been more cats in the house (apart from an increase in the mayhem at mealtimes)?  For example, if there had been 10 cats, in how many ways could they all have got the wrong bowls?

I think we need a new strategy.  It’s going to be much harder to list them this time (try it, if you don’t believe me!).  So let’s try to find another way to understand the problem with 3 cats, and hopefully it’ll be more amenable to generalisation.

### A new strategy

We’ve got a whole bunch of ways of matching bowls to cats.  In some of those ways, grey cat gets its own bowl (the blue one), and in some it doesn’t.  Also, in some ways white cat gets its own bowl, and in some it doesn’t.  And finally in some ways ginger cat gets its own bowl, and in some it doesn’t.  We want to count the ways in which grey cat doesn’t get its bowl, and white cat doesn’t get its bowl, and ginger cat doesn’t get its bowl.  We can represent this in a Venn diagram.

This rectangle is supposed to represent all the ways in which we can match bowls to cats.  The set labelled G (for grey) contains those ways in which the grey cat gets the right bowl, and similarly for the other colours.  The yellow and black areas together are supposed to represent the intersection of $O$ (for orange) and $W$, for example, so between them they contain the ways in which both the ginger and the white cats get their correct bowls.  So we want to count the ways in the outside blue region.

How can we do that?  Let’s warm up with a slightly easier problem: just two sets.  Here’s the picture.

Let’s try to find the size of the yellow region.  I’ll write $X$ for the whole rectangle.  So the size of the yellow region is the size of $X$, written $|X|$, minus the size of the bit covered by $A$ and $B$.  Let’s concentrate on that last bit, called the union of $A$ and $B$, written $A \cup B$.

We could think about $|A| + |B|$, the size of $A$ plus the size of $B$.  Is that what we want?  Not quite, because we’ve counted the purple region (the intersection of $A$ and $B$, written $A\cap B$) twice.  So we actually want $|A| + |B| - |A\cap B|$.

Then we can find the size of the yellow region: it’s the size of the whole rectangle minus the size of the region covered by $A$ and $B$.  That is, the size of the yellow region is $|X| - |A \cup B| = |X| - |A| - |B| + |A \cap B|$.

Let’s do a similar thing with three sets, and then we’ll go back to the problem with the cats and check our answer.  Again, I’ll write $X$ for the whole rectangle.  This time, we want to take the size of $X$ and remove the size of $G \cup W \cup O$, the region covered by the three sets in the middle.

What’s the size of that union?  Again, our first guess might be $|G| + |W| + |O|$.  But again that’s not quite right.  We’ve counted the overlaps twice, so we should subtract them, to get $|G| + |W| + |O| - |G \cap W| - |W \cap O| - |O \cap G|$.  Is that right?  Not quite, because we’ve subtracted the central black region (the intersection of all three sets) one too many times.  So the size of the union is actually $|G \cup W \cup O| = |G| + |W| + |O| - |G \cap W| - |W \cap O| - |O \cap G| + |G\cap W \cap O|$.

So the size of the yellow region is $|X| - |G \cup W \cup O| = |X| - |G| - |W| - |O| + |G \cap W| + |W \cap O| + |O \cap G| - |G \cap W \cap O|$.

Let’s check this.  What’s the size of $X$ in our cat problem?  It’s the number of ways of matching bowls to cats.  But that’s quite easy to work out.  There are 3 ways to choose the bowl for grey cat, then 2 for white cat, then 1 for orange cat.  So $|X| = 3\times 2 \times 1 = 6$.  So far so good.

What’s $G$?  It’s the set of ways in which grey cat gets the right bowl.  How many of those are there?  Well, we’ve fixed it so that grey cat gets the right bowl, so then there are just 2 possibilities for white cat, and 1 for ginger cat, so $|G| = 2\times 1 = 2$.  In just the same way, $|W| = 2\times 1 = 2$, and $|O| = 2\times 1 = 2$.

What about the intersection of $G$ and $W$?  It’s the set of ways in which grey and white cats both get the right bowl.  But then ginger cat must get the right bowl too, so there’s only one way for that.  That is, $|G\cap W| = 1$.  In the same way, $|W\cap O| = 1$  and $|O\cap G| = 1$.

Finally, the intersection of all three sets is the set containing the one way in which each of the three cats gets the right bowl.  So $|G \cap W \cap O| = 1$.

Now, let’s compute the expression we hope will give the answer, namely $|X| - |G| - |W| - |O| + |G \cap W| + |W \cap O| + |O \cap G| - |G \cap W \cap O|$.  That gives $6 - 2 - 2 - 2 + 1 + 1 + 1 - 1 = 2$ — which is the answer we were hoping for!

Of course, this is no proof, but it is reassuring (and hopefully helps illustrate the idea).

It turns out that one can prove an analogous result for any number of sets, and it is this that is usually called the Inclusion-Exclusion Principle.  I think I’ll state it here for 3 sets, but I hope it’s reasonably clear how it extends to $n$ sets.  You can see a statement of the general result on the Wikipedia page, for example.

Theorem (the inclusion-exclusion principle) Let $A$, $B$ and $C$ be subsets of a finite set $X$.  Then $|A \cup B \cup C| = |A| + |B| + |C| - |A \cap B| - |B \cap C| - |C \cap A| + |A \cap B \cap C|$.

How could one go about proving this (and, indeed, the corresponding result for $n$ sets)?  One strategy is to use induction, using the result for $n-1$ sets to deduce the result for $n$ sets.  But my favourite approach is to use indicator functions.  I like it because it seems very clean, and seems to capture our idea about why the result ought to be true.  But before I can tell you the argument, I’d better tell you what an indicator function is.  Through all of the rest of the post, I’m going to work with the sets $X$, $A$, $B$ and $C$ in the statement of the principle above.

The indicator function of $A$, written $1_{A}$, records which elements of $X$ are in $A$ and which are not.  Precisely, for an element $x$ of $X$, we say that $1_A(x) = 1$ if $x$ is in $A$, and $1_A(x) = 0$ if $x$ is not in $A$

How can we extract the size of $A$ if we know its indicator function?  Well, we add up $1_A(x)$ for all $x$ in $X$.  That is, $|A| = \sum_{x\in X} 1_A(x)$.  Why?  Well, we count 1 in the sum each time $x$ is in $A$, and 0 each time it isn’t — so we count each element of $A$ exactly once, which must give $|A|$.

It turns out to be quite easy to write down the indicator functions of $A \cap B$ and $A \cup B$ in terms of the indicator functions of $A$ and $B$.

The indicator function of $A \cap B$ is $1_{A \cap B} = 1_A 1_B$ (just the product).  Why?  Well, $1_{A \cap B} = 1$ if and only if $x$ is in $A \cap B$, which happens if and only if $x$ is in both $A$ and $B$, which happens if and only if $1_A(x) = 1$ and $1_B(x) = 1$, which happens if and only if $1_A(x)1_B(x)=1$.

The indicator function of $A \cup B$ is $1_{A \cup B} = 1_A + 1_B - 1_{A\cap B} = 1_A + 1_B - 1_A1_B$.  I’ll let you convince yourself of that — it’s a good exercise (and not too hard).

Now we can find the indicator function of $A \cup B \cup C$.  We have

$1_{A\cup B \cup C} = 1_A + 1_{B\cup C} - 1_A 1_{B\cup C}$

$= 1_{A} + 1_{B} + 1_{C} - 1_{B} 1_{C} - 1_{A}(1_{B} + 1_{C} - 1_{B} 1_{C})$

$=1_{A}+1_{B}+1_{C}-1_{B}1_{C}-1_{A}1_{B}-1_{A}1_{C}+1_{A}1_{B}1_{C}$

$= 1_A + 1_B + 1_C - 1_{A\cap B} - 1_{B\cap C} - 1_{C\cap A} + 1_{A\cap B \cap C}$.

Now that we know its indicator function, we can find the size of $A \cup B \cup C$.  We simply sum $1_{A \cup B \cup C}(x)$ for all $x$ in $X$.  We get

$|A \cup B\cup C| = \sum_{x\in X} 1_{A\cup B\cup C}(x)$

$= \sum_{x\in X}(1_A(x) + 1_B(x) + 1_C(x) - 1_{A\cap B}(x) - 1_{B\cap C}(x) - 1_{C\cap A}(x) + 1_{A \cap B \cap C}(x) )$

$= |A| + |B| + |C| - |A\cap B| - |B\cap C| - |C\cap A| + |A \cap B \cap C|$,

and that’s just what we wanted!

The same idea works in general (for $n$ sets), it just takes a bit more care to write it out.

As I mentioned above, there’s a Wikipedia page about the inclusion-exclusion principle.  I am not sure who first came up with the principle, so I’ll refer you to the Wikipedia page and you can read its assertions for yourself.  The problem of matching bowls to cats in such a way that no cat gets the right bowl is an example of derangements (although that isn’t the standard example!).  No rude remarks about deranged cats or their owners, please…

### 5 Responses to “Theorem 16: the inclusion-exclusion principle”

1. Olof Says:

Another useful property of indicator functions is that the indicator function of the complement $A^c$ of a set $A$ is $1 - 1_A$. Thus

$1 - 1_{A \cup B \cup C} = 1_{(A \cup B \cup C)^c} = 1_{A^c \cap B^c \cap C^c} = 1_{A^c} 1_{B^c} 1_{C^c}$

and one may get the three-set case of inclusion-exclusion from expanding

$1_{A^c} 1_{B^c} 1_{C^c} = (1-1_A)(1-1_B)(1-1_C)$!

(This may or may not make the result any clearer!)

2. theoremoftheweek Says:

Thanks very much for that Olof. I’m now trying to work out why I did what I did, because if someone asked me to prove Inclusion-Exclusion I’d do exactly as you suggest. I think I was trying to motivate the use of indicator functions by looking at the picture, and so got carried away down that path. I’m glad you pointed it out!

3. Theorem 24: the Chinese remainder theorem « Theorem of the week Says:

[…] I’m going to tell you about a pictorial way of thinking about this, but I’ll explain it with equations too (so don’t worry if you don’t like the picture).  Here’s the picture.  (Another test of my drawing skills…) […]

4. Number Theory: Lecture 17 « Theorem of the week Says:

[…] Proposition 52 (Legendre‘s formula): For , we have .  We saw that this follows nicely from the inclusion-exclusion principle. […]

5. Number Theory: Lecture 17 | Theorem of the week Says:

[…] Proposition 52 (Legendre’s formula): Let be a real number.  Then .  Moreover, .  We saw that this was basically the inclusion-exclusion principle. […]