John Travis  Mississippi College
Bayes Theorem
For a given sample space S, there may be a natural way to partition the space into disjoint setsso that all elements in S belong to exactly one of the sets. For example, separating a given set of people into males and females or dividing up a pocket full of change into pennies, nickles, dimes, quarters, etc.
For disjoint sets, a Venn diagram is traditionally written using a collection of disconnected blobsone for each set in the partitionin a box. However, when sets form a partition it is often simpler to write a Venn diagram as a pie chart with each set in the partition corresponding to one sector of that pie chart.
Notationally, we will describe our partition of subsets as
$S = B_{1}\cup B_{2}\cup B_{3} \cdots \cup B_{N}$.
From this partitioned sample space, one may desire a conditional probabability for some outcome A. For example, if A is the event that a random student selected from a particular class fails the course and $S = Males \cup Females$, then $P(fails  Male)$ and $P(fails  Female)$ might be well known from historical values. However, given that a student who failed has set up an appointment to meet with the teacher, determining the likelihood that the person will be Male (eg. $P(Male  fails)$) is often a harder thing to quantify directly. Bayes Theorem is the answer to solving this type of problem.
Derivation of Bayes Theorem: From the definition of conditional probability and using the notation developed above:
$P(A \cap B_{k}) = P(A) P(B_{k} A) $ or
$P(A \cap B_{k})$ = $P(B_{k})$ $ P(A  B_{k}) $ and by transitivity
$P(A) P(B_{k} A) $ = $P(B_{k})$ $ P(A  B_{k}) $
Since $\bigcup B_{k}$ comprises all of S, then one may compute P(A) by adding up the probabilities of its partsindeed, $A = (A\cap B_{1}) \cup (A\cap B_{2}) \cup (A\cap B_{3}) \cup \cdots \cup (A \cap B_{N})$. In the diagram below, this is illustrated using the probabilities inside the inner circle.
Using the second formula with this partition of A (remember the $B_k$ are all disjoint) yields:
${\bf P(A)} = P(A\cap B_{1}) + P(A\cap B_{2}) + \cdots + P(A \cap B_{N})$ =
$ = P(B_{1})P(AB_{1}) + P(B_{2})$$P(AB_{2})$$+ \cdots + $$P(B_{N})$$P(AB_{N})$
On the other hand, using the third formula and solving yields:
$P(B_{k} A) $ = $P(B_{k})$ $ P(A  B_{k})/ {\bf P(A)} $
Replacing the P(A) on the bottom with the bold formulation gives Bayes Theorem.
Therefore, Bayes Theorem is very useful when it is possible to determine the conditional probabilities $P(AB_{k}), k=1 \cdots N$ but perhaps not so easy to compute $P(B_{k} A) $.
Click to the left again to hide and once more to show the dynamic interactive window 

