Proper Treatment 正當作法/ cs530: Principles of artificial intelligence
2011-12-16 18:10

16:198:530, Fall 2007
Classes: Mon 3:20–6:20pm, CoRE 301

This course introduces the following principles of artificial intelligence.

A concomitant goal is for you to learn how to work with other people, especially to talk to them without ESP.

Your instructors are:

Recommended textbooks:

Everyone is required to:

Grading will be based on:

From the grader

Actual schedule

Problems and solutions


Before writing a program, first try to pin down what problem it is supposed to solve and when one solution is better than another. This initial step is harder in AI but also more helpful, because evaluating a solution can involve chance, interaction, and human judgment. For example, what is a good strategy for rock paper scissors? The formal language of types provides guidance. (What about for parsing or translating text?)

Code in class:



One simple way to view parsing is as a function from String to Bool, that is, a binary classifier of strings. As with most tasks in which human performance constitutes the ground truth, it is worth establishing a upper bound on machine performance (the annotator agreement rate) and a lower bound on machine performance (the baseline) before trying to write a parser. The performance of a binary classifier is a tradeoff between recall and precision.

QuickCheck is a Haskell tool that demonstrates how types can guide testing.

Complexity and representation


Often a problem is easier to state using one representation but easier to solve using another. For example, brute-force generate-and-test is usually too slow to solve search and optimization problems, so we represent partial solutions in order to test before generating. To implement these ideas, it is useful to think about streams of solutions wholesale and to compose generation and testing separately. One instance of this approach include backtracking search (n-queens).

Code in class:



Don’t just solve board puzzles! By representing not just states but also actions as data, we can convert many planning problems into board problems such as finding a shortest path in a graph. Expressing concrete problems from real life as abstract problems in a well-designed language makes it easier for researchers to care about and take advantage of each other’s work. For example, we can translate Sudoku puzzles into SAT problems, then solve them using a fast SAT solver.

Our representation of incomplete Sudoku solutions last time was very inexpressive compared to general Boolean formulas: it only narrows down possibilities for a single cell. More sophisticated representations of incomplete solutions reduce the need for brute force in search and optimization and replace it by early backtracking and constraint propagation. Examples: high-school algebra, type inference, branch-and-bound in integer programming.



Discuss Mackworth’s paper:


A parse is a witness of a statement, or a (normal-form) constructive proof in a logic. Backtracking parsing can take time exponential in the length of the input string, even when there is no parse in the end. To parse in polynomial time, we apply dynamic programming, or memoization: we name and hence reuse the result of computations. (Compare slowFib and fastFib in Laziness.hs.) A grammar formalism is a domain-specific programming language through which researchers of human languages and parsing algorithms may collaborate. As with any interpreter of a domain-specific language, we can further speed up the parser by partial evaluation, which compiles a grammar (a user of the language) to a native program by fusing it with the parser (the implementation of the language).

Homework (due on 10/15):

Probabilities, expectations, and utilities


A probability distribution can be interpreted both as a sampling procedure and as a weighted set of possibilities.

Code in class:


Pull constants out of loops to avoid computing them over and over again. Filter possibilities according to observations.

Code in class: Stretch.hs, Stretch2.hs

Homework (due on 10/25):


A decision process is like a probability distribution, but with two additional primitives: perception and action. A policy specifies how to behave in a decision process. An optimal policy is one that maximizes expected utility.

Homework (see also the previous week and the next week):

Herbert Simon wrote in “Rationality as process and as product of thought”:

Complexity is deep in the nature of things, and discovering tolerable approximation procedures and heuristics that permit huge spaces to be searched very selectively lies at the heart of intelligence, whether human or artificial. A theory of rationality that does not give an account of problem solving in the face of complexity is sadly incomplete. It is worse than incomplete; it can be seriously misleading by providing “solutions” to economic questions that are without operational significance.


From probability distributions to decision processes. A discounting factor is used to define utility if the process may continue forever.

Please study the code from today, newly commented:

Also check out an application of probabilistic modeling: Joel Spolksy’s Evidence based scheduling.

Mid-course evaluation.

Homework (due on 11/5, in class of course):


Presentations of probability distributions.

We have already covered Bayes nets. It’s all about filtering to update belief states with observations.


Midterm (90 minutes) and discussion thereof.

A sample midterm is now available. It is about 20%–25% the length of the actual midterm. The actual midterm, like this sample midterm, refers to several pages of Russell and Norvig’s textbook; we will provide copies of these pages as part of the midterm. The cover page of the actual midterm will be identical to the cover page of this sample midterm, except for the number 2 (the total number of pages). One set of model answers for the sample midterm would be:

  1. 2n; 2n/4
  2. {Rich}; Allow negated literals to enter the conjunction

Another set of model answers for the sample midterm would be:

  1. 2n; 2n−2
  2. {Rich, Famous}; Allow disjunctions of literals in addition to conjunctions of literals

If you have any questions or notice any problems with the sample midterm, please speak up!

Today’s code for (hidden) Markov models is online. For a more classical explanation, see (the textbook or):

Lawrence R. Rabiner. 1989. An introduction to hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE 77(2):257–286.

On an almost entirely unrelated topic, if you are interested in the recent discussions on the mailing list about performance and about Haskell, you might find it helpful to study C++ versions of the n-queens problem and probability distributions.

Please see a separate page for information on the final assignment, the final class meeting, and the final project, including due dates.


Midterm discussion.

Decision-making under uncertainty. Learning. Continuous variables.

Please see updates about the final project and about King’s paper.

The Kalman filter is a nice example of cleverly representing an infinite distribution in very little space and manipulating it in very little time:

Peter S. Maybeck. 1979. Stochastic models, estimation, and control. Mathematics in Science and Engineering 141, San Diego, CA: Academic Press.

The parameters and structure of a Bayes net can be learned by hill-climbing search:

David Heckerman. 1995. A tutorial on learning with Bayesian networks. Tech. Rep. MSR-TR-95-06, Microsoft Research. Revised Nov. 1996.

Finally, you might be interested in an interview with a driver behind NASA’s Mars Rovers.

In class today, we built the following decision process for the environment of a robotic dog. It is a variation on env in ProcessTest.lhs. The first argument to this function is the number of rounds that the dog is allowed to wait for (say 3). The second argument is the utility for the dog to step forward right away (say 0.2).

data Step = Forward | Wait | Home deriving (Eq, Ord, Show, Read)

env :: (Decide d Bool, Observe d Step) => Int -> Double -> d Double Double ()
env n u = bind (observe ([Forward, Home] ++
                if n == 0 then [] else [Wait])) (\step ->
          case step of
          Forward -> reward u
          Home    -> reward 0.3
          Wait    -> bind (reward (-0.05)) (\() ->
                     bind (decide [(0.5, True), (0.5, False)]) (\ball ->
                     env (n-1) (if ball then 0.75 else 0.15))))


Decision trees and inductive learning illustrate:

Today’s code is in DecisionTree.lhs. A good way to learn more about decision trees is Quinlan’s original paper (Machine Learning 1(1):81–106, 1986).

Tom Walsh’s talk on Thursday morning is very relevant to the topics we have been covering in this course.


Neural networks.


Constructive proofs and knowledge representation.

Embedding theories for multiagent modeling.