Reducing naïveté in sampling

2009-01-18 23:57

The structure of a project using Monte Carlo simulation: from so naïve it’s impractical to so non-naïve it’s deterministic. Examples in MATLAB:

The code in pie.m computes P(x > ½ | x² + y² < 1) under the uniform distribution over x and y. In words, it computes the area of the rightmost quarter of a pie as a portion of the total area of the pie. Whether a point lies in the rightmost quarter of the pie is defined by its horizontal location.
The code in either.m computes p(fruit | color = 1), a guess as to whether we are looking at an apple or an orange. Here “fruit” is either “apple” or “orange”, with the prior probabilities P(fruit = apple) = .7 and P(fruit = orange) = .3. Furthermore, the color of each kind of fruit is normally distributed: p(color | fruit = apple) is the normal distribution with mean 3 and standard deviation 2, and p(color | fruit = orange) is the normal distribution with mean 0 and standard deviation 1.
The code in steps.m computes p(actual | measured = 2), where “measured” is the sum of two normal distributions with mean 0, “actual” (with standard deviation 4) and “offset” (with standard deviation 3). In the most naïve way of computing this conditional distribution in the code, we compute P(actual | 1.9 < measured < 2.1).

An excellent tutorial on Kalman filtering, a form of cue integration, is Chapter 1 of Stochastic models, estimation, and control (volume 1) by Peter Maybeck, 1979.
The code in traffic.m crudely models the traffic at a signalized intersection as a hidden Markov model: the state is just the current color of the traffic light (0 for red, 1 for yellow, 2 for green), and the observation is whether there is traffic (0 for no traffic, 1 for traffic). At each time step, the state stays the same with probability .7 and cycles to the next color with probability .3. Also at each time step, the probability that there is traffic is .1 if the light is red, .5 if the light is yellow, and .9 if the light is green.

To compute the conditional probability distribution of the final state given a sequence of observations, the least naïve way is to forget about old states—that is, merge samples that differ only in their old states—once the samples’ weights have been scaled down to account for the effect of old states on the samples’ probabilities. This strategy exemplifies a general pattern: even when our model generates a conceptually exponential number of possibilities, we can take advantage of its (lack of) dependencies to reuse computation and perform efficient inference.

(We write P(…) and p(…) above for probabilities and probability densities, respectively.)