A Bayes net is an acyclic directed graph that describes a class of probability distributions by specifying the causal structure of a world model. Each node of the graph is a random variable, and the edges of the graph specify what a probabilistic sampling procedure must look like: the sampling procedure for each variable must only depend on its parent variables in the graph (in other words, on those variables that have edges leading to the sampled variable). To impose such a structure on sampling procedures is to factor the joint probability distribution of all the random variables into the product, for each variable, of the conditional probability distribution of that variable given its parents. As is the case with all joint distributions, this joint distribution then gives rise to marginal distributions when we sum over variables we don’t care about.
Examples: Kersten and Yuille’s Figure 4; Bertsekas and Tsitsiklis’s Problem 27; from meaning to facial expression and verbal expression (or is it from meaning and verbal expression to facial expression, when an actor subverts the script of a play?).
The structure of a Bayes net, even without any associated conditional probabilities, already guarantees some random variables to be independent, by virtue of the factoring described above. More precisely, two variables are guaranteed to be independent if every (undirected) path connecting the two variables contains a collider. A collider is a node in the middle of the path where the two edges on either side of the node both point into the node. In class I tried to give a more general condition for when two variables are guaranteed to be conditionally independent given a set of variables, but the condition I gave was wrong, as Chia-Chien pointed out to me afterwards! The correct condition, termed d-separation, is as follows: two variables are guaranteed to be conditionally independent given a set of variables Z if every (undirected) path connecting the two variables either contains a non-collider in Z or contains a collider with no descendant in Z. A descendant of a node c is a node that can be reached from c by following edges in the correct direction; in particular, every node is a descendant of itself.