Reasoning about probabilities usually boils down to reasoning about sets, which in turn usually boils down to reasoning about true and false. It is intuitive to think of a set as a map from each thing to either true or false, and to think of a probability distribution as a map from each set to its area.
A “forgetting map” from one sample space S to another sample space S’ induces a map from probability distributions on S to probability distributions on S’. In particular, if S is a set of pairs (or more generally, tuples), then it is useful to “forget” say the first component of the pair by mapping each pair in S to its second component, which induces a map from joint distributions to marginal distributions. Each component of S is a random variable.
Causal independence means that the random process that determines one random variable X does not make use of the value of another random variable Y. For example, if we write a program that uses a random number generator to simulate an experiment whose outcome includes both X and Y, but the part of the program that yields X does not refer to Y, then X is causally independent of Y. In that case, we could speed up the simulation by running the X part and Y part in parallel, separately, without communication.
Probabilistic independence is best defined by factoring the probability distribution:
P(X,Y) = P(X) P(Y)
or to use less shorthand,
P(X=x ∩ Y=y) = P(X=x) × P(Y=y)
for every value x of X and every value y of Y.
Note that this notion is symmetric between the two random variables X and Y: if X is independent of Y, then Y is independent of X, and vice versa. If P(Y=y) is positive rather than zero for every value y of Y, then this definition of probabilistic independence is equivalent to each of the following two conditions.
-
P(X=x | Y=y) = P(X=x) for every value x of X and every value y of Y. Or for short, P(X | y) = P(X).
-
P(X=x | Y=y1) = P(X=x | Y=y2) for every value x of X and every pair of values y1,y2 of Y. Or for short, P(X | y1) = P(X | y2).
Causal independence entails probabilistic independence, but not vice versa because probabilistic independence can hold “accidentally”. In Bertsekas and Tsitsiklis’s problem 5, loving chocolate and being a genius are not independent in either sense. (In statistics, there is also the notion of zero correlation, which is entailed by but does not entail probabilistic independence.)
Independence among random variables lets us store a probability distribution using more efficient representations and process it using more efficient algorithms.