cs504: Computational Modeling

Links 友頁連結: cs504

2009-01-18 23:57

16:198:504, Spring 2007
Classes: Wed 10:20am–1:20pm, RuCCS Playroom (Psych A139)
Recitations: Tue 11am–noon, CoRE 305

Instruction team:

Doug DeCarlo (decarlo at cs), office hours Wed 2–3pm in CoRE 310
John Massari (jmassari at eden), office hours Tue 9:45–10:45am in CoRE 305
Chung-chieh Shan (ccshan at cs), office hours Thu 3:30–5pm in CoRE 306
Matthew Stone (mdstone at cs), office hours to be decided

Also on this Web site:

Evolving outline
Programming resources

I’m sorry, I changed my mind: let’s simply have 15 minutes of presentation time per final project, including about 5 minutes for questions. Not every person involved in a project has to speak during the presentation.

At the beginning of class today, I showed a cool movie of the Kaye effect. We can also aspire to mind the gap in our presentations.

By the way, a good introduction to Monte Carlo methods is David MacKay’s. A good application of them is robot localization.

To prepare for Tuesday’s recitation, please take a look at a problem set on Bayes-net learning from a course on artificial intelligence.

If you haven’t already, please start to think about what to do for your final project and with whom. (Individual projects are fine, but group projects are also encouraged.) Below are two things you can do; they are not mutually exclusive:

Write and explain a program to model a stochastic process, probably a cognitive process such as perception or action. The model should preferably be causal: running the model should simulate the process.
Experiment with a computation that consumes or produces a model, such as probabilistic inference. Explore the consequence of the model and the behavior of the computation, perhaps by visualizing them in action. Discuss how well they match your intuition and physical experiments.

On the last day of class, Wednesday April 25, will be a presentation for each project. Please tell me ahead of time how much time you need for your presentation. You also need to write a paper to describe your project. How about making the paper due on Wednesday May 2? Both the project and the paper should target your classmates as their audience and be as short as possible but no shorter.

By the way, you can find more details on Bayesian learning in Heckerman’s tutorial.

The games we analyzed in the class today are defined in games.scm.

If you feel uncomfortable with continuous probability distributions or would like more details on the math we use, you might find the following tutorials helpful.

Paola Sebastiani’s tutorial on probability theory covers the basics of probability distributions discrete and concrete, joint and marginal.
Andrew Moore’s tutorials on probabilities and probability density functions are in slide format and geared towards machine learning.
H. Jerome Keisler’s Elementary Calculus: An Approach Using Infinitesimals details integration and differentiation using the idea of infinitesimals that I alluded to in class. (Start with Section 4.1.)

There are lots of math and other books online!

Update: Will Starr points out other good introductions to probability:

Introduction to Probability by Dimitri Bertsekas and John Tsitsiklis
Introduction to Probability by Charles Grinstead and J. Laurie Snell
Reasoning about Uncertainty by Joseph Halpern
Some MIT lecture notes

Thanks!

The midterm is now posted.

Again, please let me know if anything is unclear.

I apologize that the implementation of best-path in markov.scm was incorrect because I simplified it too much. It’s fixed now.

Huffman’s original paper on line labeling and drawing interpretation is: David A. Huffman, 1971. Impossible objects as nonsense sentences. In Machine intelligence 6: Proceedings of the 6th annual machine intelligence workshop (1970), ed. Bernard Meltzer and Donald Michie, 295–323. Edinburgh: Edinburgh University Press.

You’ll find a well-written summary in Section 4.1 of Nalwa’s A guided tour of computer vision.

There is no homework to read or submit this week. However, I highly recommend that you experiment with markov.scm, which implements probabilistic inference using hidden Markov models.

The model is defined in hmm. Please change it to describe your favorite process: office work, politics, baseball, whatever. The Scheme functions best-path and path-best perform two different kinds of inference. The best-path function finds the most likely sequence of states, which is appropriate when the utility function gives no partial credit for even a slightly incorrect state sequence. The path-best function finds the sequence of most likely states, which is appropriate when the utility function gives partial credit for each correct state in the inferred sequence.

Here are some questions to ask:

Write a Scheme function to sample a sequence of states and observations from the probability distribution specified by hmm.
Using the function suggested above, implement inefficient (exponential-time) algorithms for best-path and path-best that nevertheless return the correct answer.
How does the output of best-path and path-best match or mismatch your expectations as you vary the model and the sequence of observations?
Can changing a single observation affect all inferred states?
Do best-path and path-best ever infer two different state sequences?
What is the time and space complexity of best-path and path-best in terms of the number of states in the model, the number of observations possible in each state, and the duration of the observation sequence?

The remaining three questions address how the implementation fails to scale to longer sequences.

The probability of a sequence of observations or states is roughly inverse-exponential in the duration of the sequence. For example, doubling the duration about squares the probability. Confirm this statement with a concrete example.
Because the probability gets so small so quickly as the duration increases, the computer may approximate the probability of every state sequence by zero, which makes it impossible to check which sequence is most likely. Demonstrate this risk with a concrete example.
To avoid this risk, we represent probabilities using their log in practice. Change the code to do so.

Please tell me about any discoveries or problems you encounter!

This week we talked about parsing using hidden Markov models and context-free grammars. Below are some pointers to written introductions.

Wikipedia explains Context-free grammars.
Charniak’s textbook explains what hidden Markov models are in Section 3.2, what probabilistic context-free grammars in Section 5.1, why probabilistic context-free grammars generalize hidden Markov models in Section 6.1, and inference algorithms for these language models in Sections 4.1, 4.2, and 6.2.

I’m still working on a way for you to play with these models hands-on. Please stay tuned.

The context-sensitivity of Swiss German was proven in: Stuart M. Shieber. 1985. Evidence against the context-freeness of natural language. Linguistics and Philosophy 8:333-343.

I apologize that I didn’t explain the perspective (gui-points.scm) homework clearly! My intention was for you to create two 3-D objects that look the same from one point of view but whose differences are revealed when the viewpoint changes. The cube and truncated pyramid are such a pair of 3-D objects: they look the same from the front but different from other places. You should be able to make another pair of 3-D objects whose 2-D projections behave the same way.

Please feel free to send me—or not send me—more changes to gui-points.scm as you play with it.

The word “hidden” in the phrase “hidden Markov model” means that the underlying states are not directly observed. For more written details, check out: Lawrence R. Rabiner, 1989. An introduction to hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE 77(2):257–286.

In class we played with a decoder for abbreviated and numerically encoded text, which uses hidden Markov models to find a most likely underlying character sequence. The details are described in: Stuart M. Shieber and Rani Nelken, to appear. Abbreviated text input using language modeling. Journal of Natural Language Engineering.

Another application of hidden Markov models is to harmonize chorales in the style of Bach. There the observations are the melody (soprano) notes and the hidden states are the other voices and harmonic labels such as “tonic” and “dominant”. (See also elsewhere.)

Doug mentioned that Meisel Gallery in Soho (141 Prince St, New York, near West Broadway) is showing Patrick Hughes’s work until March 31 (closed on Sundays).

Today we saw more visual illusions in class: gelatinous ellipses and the haze illusion.

Dana also pointed out Escher for real, which realizes Escher’s paintings (both the plausible and implausible ones) physically.

For homework this week, please change the definitions of cube-vertices, noncube-vertices, and lines in gui-points.scm to make your own depth illusion (or other visual illusion). To run the program in DrScheme, choose the language “Graphical (MrEd, includes MzScheme)” under “Language”, “Choose Language”, then “PLT”. To change the eye position (viewpoint), click or drag with the left mouse button. To switch from cube-vertices to noncube-vertices, click the middle mouse button. To switch back to cube-vertices from noncube-vertices, click the right mouse button.

Please email your work to me or let me know of any problem you encounter. Have fun!

You may be interested in two activities this summer:

Probabilistic Models of Cognition: The Mathematics of Mind, July 9–27 (UCLA)
Formal Epistemology Workshop, May 31–June 3 (CMU)

For this week, please read:

Chapter 1 of: Peter Maybeck, 1979. Stochastic models, estimation, and control (volume 1).
At least Sections 2.1.1 and 7.1 of: Vishvjit S. Nalwa, 1993. A guided tour of computer vision.

Check out a collection of visual illusions, especially the rotating face mask. Next week we’ll see physical objects from Thomas Papathomas and Patrick Hughes demonstrating such depth illusions.

Please read this week: Hermann von Helmholtz, 1896. Concerning the perceptions in general. (Translated from German, 1925; reprinted with foreword, 2001)

The examples of generative models I showed in class are:

The Monty Hall problem as two interacting processes (including solving a contestant process to yield a cheating host): in Scheme and in Haskell.
Generative models of English: uni/bi/tri/quad-gram models based on AP news (not online, sorry) and a trivial probabilistic context-free grammar in Scheme.
Using a generative model of Boolean formulas (circuits) to learn from examples (based on code from last semester).
SCIgen, an automatic CS paper generator.
Context Free Design Grammar for pictures.

My screw-up this week was to confuse two ways to combine uncertain information.

On one hand, if I make two independent observations of the same table’s location, then I can combine them to be more sure.
On the other hand, if I observed once where the table was and once how far I moved, and these two random variables are independent, then I can combine them to track where the table is now. However, I would be less sure about the table’s current location than its previous location, because the uncertainties from my two observations accumulate.

I’ll explain all this more (correctly) next week. For now the main point is that combining multiple normal distributions (in either way above) yields simply another normal distribution, so an agent does not need more and more space as it gather successive observations and updates its belief.

Please note the updated information on recitations and office hours. The first recitation will be on Tuesday January 23 and preceded by John’s office hours in CoRE 305. Also, you can contact everyone in the course by editing this page or by sending email to comp-model at rams dot rutgers dot edu.

Again, if you are unable to register due to the course being “closed” or for any other reason, please send me your RUID number and your full name. I’ll make sure you get registered. Thanks!

In the first class the question was broached how probability distributions relate to quantum mechanics. Scott Aaronson argues from the perspective of a computer scientist that quantum mechanics simply generalize probability distributions to negative (and complex) probabilities.

Here’s this week’s homework:

Read Section 3 of: Matthew Stone, 2003. Agents in the real world: Computational models in artificial intelligence and cognitive science.
Use the Scheme code for decision models to make a decision model of your own. Play with it using the interpreters interact, execute, utility, and solve. Email me the code for your decision model and describe your explorations.

Sorry I was confused today about when class ends and didn’t go over an extended language for modeling decision processes that separates nature’s choice of the value of a random variable and the agent’s discovery of that value. If you’re interested, you can see Scheme and Haskell implementations of this extended language and the Monty Hall problem.