Joint and conditional probability

Suppose that outcome can be either of events A or B (but never both) with probabilities 0.4 and 0.6 correspondingly in case event X happens. If mutually disjoint to X, event Y occurs instead then probabilites of A-B distribute evenly, like .5 and .5. These data can be summarized in a Markov matrix:

Here, P(A|X) stands for probability of event A provided that X has occurred. A|X generally denotes a conditional probability of event A under condition X.

Note that the sum of columns adds up to 1 since their entries represent mutually exclusive events.

Now, suppose that X can occur with probability .8 and Y has probability of .2. We multiply the first unit/column with .8 and second with .2 so that the joint distribution breaks down into

1 * .8 + 1 * .2 = 1 * (.8 +.2) = 1 * 1 = 1 = 1 * .8 + 1 * .2 = (.4 + .6) * .8 + (.5+.5)/5 = (.32 + .48) + (.1 + .1)

where first parenthesis is a sum of event probabilities under X and .1 + .1 are probabilities under event Y

This can be represented again by matrix again

Note that columns now add up to .8 and .2 correpsondingly whereas all table adds up to .8+.2 = 1. We have got a 2-dimensional distribution of probability. In every cell we have the joint probability of pair of events occurring, e.g. P(A∩X) = .32. The probability of conjunction A∩X is less than the probability of components (A|X) and X alone because probability under X, probability of every column, added up to 1 in the conditional probability table but it adds to P(X) ≤ 1 in the joint distribution table.

This fact, that column adds up to marginal probability of the column Xi, that is a probability that randomly drawn event ends up in the column i, enables us to recover the conditional probabilities. We just need to divide every in the column i by P(Xi):

The relationship

is a basis for famous w:Bayes' theorem because we can symmetrically condition the probabilities within the rows by probabilities of observing the rows:

That is, conditional probability P(X|A) = .76.