16:198:504, Spring 2007
Classes: Wed 10:20am–1:20pm, RuCCS Playroom (Psych A139)
Recitations: Tue 11am–noon, CoRE 305
- Doug DeCarlo (decarlo at cs), office hours Wed 2–3pm in CoRE 310
- John Massari (jmassari at eden), office hours Tue 9:45–10:45am in CoRE 305
- Chung-chieh Shan (ccshan at cs), office hours Thu 3:30–5pm in CoRE 306
- Matthew Stone (mdstone at cs), office hours to be decided
Also on this Web site:
I’m sorry, I changed my mind: let’s simply have 15 minutes of presentation time per final project, including about 5 minutes for questions. Not every person involved in a project has to speak during the presentation.
To prepare for Tuesday’s recitation, please take a look at a problem set on Bayes-net learning from a course on artificial intelligence.
If you haven’t already, please start to think about what to do for your final project and with whom. (Individual projects are fine, but group projects are also encouraged.) Below are two things you can do; they are not mutually exclusive:
Write and explain a program to model a stochastic process, probably a cognitive process such as perception or action. The model should preferably be causal: running the model should simulate the process.
Experiment with a computation that consumes or produces a model, such as probabilistic inference. Explore the consequence of the model and the behavior of the computation, perhaps by visualizing them in action. Discuss how well they match your intuition and physical experiments.
On the last day of class, Wednesday April 25, will be a presentation for each project. Please tell me ahead of time how much time you need for your presentation. You also need to write a paper to describe your project. How about making the paper due on Wednesday May 2? Both the project and the paper should target your classmates as their audience and be as short as possible but no shorter.
By the way, you can find more details on Bayesian learning in Heckerman’s tutorial.
The games we analyzed in the class today are defined in games.scm.
If you feel uncomfortable with continuous probability distributions or would like more details on the math we use, you might find the following tutorials helpful.
- Paola Sebastiani’s tutorial on probability theory covers the basics of probability distributions discrete and concrete, joint and marginal.
- Andrew Moore’s tutorials on probabilities and probability density functions are in slide format and geared towards machine learning.
- H. Jerome Keisler’s Elementary Calculus: An Approach Using Infinitesimals details integration and differentiation using the idea of infinitesimals that I alluded to in class. (Start with Section 4.1.)
Update: Will Starr points out other good introductions to probability:
- Introduction to Probability by Dimitri Bertsekas and John Tsitsiklis
- Introduction to Probability by Charles Grinstead and J. Laurie Snell
- Reasoning about Uncertainty by Joseph Halpern
- Some MIT lecture notes
Again, please let me know if anything is unclear.
I apologize that the implementation of
markov.scm was incorrect because I
simplified it too much. It’s fixed now.
Huffman’s original paper on line labeling and drawing interpretation is: David A. Huffman, 1971. Impossible objects as nonsense sentences. In Machine intelligence 6: Proceedings of the 6th annual machine intelligence workshop (1970), ed. Bernard Meltzer and Donald Michie, 295–323. Edinburgh: Edinburgh University Press.
You’ll find a well-written summary in Section 4.1 of Nalwa’s A guided tour of computer vision.
There is no homework to read or submit this week. However, I highly recommend that you experiment with markov.scm, which implements probabilistic inference using hidden Markov models.
The model is defined in
hmm. Please change it to
describe your favorite process: office work, politics, baseball,
whatever. The Scheme functions
path-best perform two different kinds of inference.
best-path function finds the most likely sequence
of states, which is appropriate when the utility function gives no
partial credit for even a slightly incorrect state sequence. The
path-best function finds the sequence of most likely
states, which is appropriate when the utility function gives
partial credit for each correct state in the inferred sequence.
Here are some questions to ask:
Write a Scheme function to sample a sequence of states and observations from the probability distribution specified by
Using the function suggested above, implement inefficient (exponential-time) algorithms for
path-bestthat nevertheless return the correct answer.
How does the output of
path-bestmatch or mismatch your expectations as you vary the model and the sequence of observations?
Can changing a single observation affect all inferred states?
path-bestever infer two different state sequences?
What is the time and space complexity of
path-bestin terms of the number of states in the model, the number of observations possible in each state, and the duration of the observation sequence?
The remaining three questions address how the implementation fails to scale to longer sequences.
The probability of a sequence of observations or states is roughly inverse-exponential in the duration of the sequence. For example, doubling the duration about squares the probability. Confirm this statement with a concrete example.
Because the probability gets so small so quickly as the duration increases, the computer may approximate the probability of every state sequence by zero, which makes it impossible to check which sequence is most likely. Demonstrate this risk with a concrete example.
To avoid this risk, we represent probabilities using their log in practice. Change the code to do so.
Please tell me about any discoveries or problems you encounter!
This week we talked about parsing using hidden Markov models and context-free grammars. Below are some pointers to written introductions.
Charniak’s textbook explains what hidden Markov models are in Section 3.2, what probabilistic context-free grammars in Section 5.1, why probabilistic context-free grammars generalize hidden Markov models in Section 6.1, and inference algorithms for these language models in Sections 4.1, 4.2, and 6.2.
I’m still working on a way for you to play with these models hands-on. Please stay tuned.
The context-sensitivity of Swiss German was proven in: Stuart M. Shieber. 1985. Evidence against the context-freeness of natural language. Linguistics and Philosophy 8:333-343.
I apologize that I didn’t explain the perspective
gui-points.scm) homework clearly! My intention was
for you to create two 3-D objects that look the same from one point
of view but whose differences are revealed when the viewpoint
changes. The cube and truncated pyramid are such a pair of 3-D
objects: they look the same from the front but different from other
places. You should be able to make another pair of 3-D objects
whose 2-D projections behave the same way.
Please feel free to send me—or not send me—more changes to
gui-points.scm as you play with it.
The word “hidden” in the phrase “hidden Markov model” means that the underlying states are not directly observed. For more written details, check out: Lawrence R. Rabiner, 1989. An introduction to hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE 77(2):257–286.
In class we played with a decoder for abbreviated and numerically encoded text, which uses hidden Markov models to find a most likely underlying character sequence. The details are described in: Stuart M. Shieber and Rani Nelken, to appear. Abbreviated text input using language modeling. Journal of Natural Language Engineering.
Another application of hidden Markov models is to harmonize chorales in the style of Bach. There the observations are the melody (soprano) notes and the hidden states are the other voices and harmonic labels such as “tonic” and “dominant”. (See also elsewhere.)
Dana also pointed out Escher for real, which realizes Escher’s paintings (both the plausible and implausible ones) physically.
For homework this week, please change the definitions of
lines in gui-points.scm
to make your own depth illusion (or other visual illusion). To run
the program in DrScheme, choose the language “Graphical (MrEd,
includes MzScheme)” under “Language”, “Choose Language”, then
“PLT”. To change the eye position (viewpoint), click or drag with
the left mouse button. To switch from
noncube-vertices, click the middle mouse button. To
switch back to
noncube-vertices, click the right mouse button.
Please email your work to me or let me know of any problem you encounter. Have fun!
For this week, please read:
- Chapter 1 of: Peter Maybeck, 1979. Stochastic models, estimation, and control (volume 1).
- At least Sections 2.1.1 and 7.1 of: Vishvjit S. Nalwa, 1993. A guided tour of computer vision.
You may be interested in two activities this summer:
- Probabilistic Models of Cognition: The Mathematics of Mind, July 9–27 (UCLA)
- Formal Epistemology Workshop, May 31–June 3 (CMU)
Please read this week: Hermann von Helmholtz, 1896. Concerning the perceptions in general. (Translated from German, 1925; reprinted with foreword, 2001)
The examples of generative models I showed in class are:
- The Monty Hall problem as two interacting processes (including solving a contestant process to yield a cheating host): in Scheme and in Haskell.
- Generative models of English: uni/bi/tri/quad-gram models based on AP news (not online, sorry) and a trivial probabilistic context-free grammar in Scheme.
- Using a generative model of Boolean formulas (circuits) to learn from examples (based on code from last semester).
- SCIgen, an automatic CS paper generator.
- Context Free Design Grammar for pictures.
My screw-up this week was to confuse two ways to combine uncertain information.
- On one hand, if I make two independent observations of the same table’s location, then I can combine them to be more sure.
- On the other hand, if I observed once where the table was and once how far I moved, and these two random variables are independent, then I can combine them to track where the table is now. However, I would be less sure about the table’s current location than its previous location, because the uncertainties from my two observations accumulate.
I’ll explain all this more (correctly) next week. For now the main point is that combining multiple normal distributions (in either way above) yields simply another normal distribution, so an agent does not need more and more space as it gather successive observations and updates its belief.
Please note the updated information on recitations and office hours. The first recitation will be on Tuesday January 23 and preceded by John’s office hours in CoRE 305. Also, you can contact everyone in the course by editing this page or by sending email to comp-model at rams dot rutgers dot edu.
Again, if you are unable to register due to the course being “closed” or for any other reason, please send me your RUID number and your full name. I’ll make sure you get registered. Thanks!
In the first class the question was broached how probability distributions relate to quantum mechanics. Scott Aaronson argues from the perspective of a computer scientist that quantum mechanics simply generalize probability distributions to negative (and complex) probabilities.
Here’s this week’s homework:
Read Section 3 of: Matthew Stone, 2003. Agents in the real world: Computational models in artificial intelligence and cognitive science.
Use the Scheme code for decision models to make a decision model of your own. Play with it using the interpreters
solve. Email me the code for your decision model and describe your explorations.
Sorry I was confused today about when class ends and didn’t go over an extended language for modeling decision processes that separates nature’s choice of the value of a random variable and the agent’s discovery of that value. If you’re interested, you can see Scheme and Haskell implementations of this extended language and the Monty Hall problem.