16:198:530, Fall 2007
Classes: Mon 3:20–6:20pm, CoRE 301
This course introduces the following principles of artificial intelligence.
 Represent and solve problems in an abstract language
 Manage uncertainty by tracking possibilities
 Plan behavior by maximizing expected utility
A concomitant goal is for you to learn how to work with other people, especially to talk to them without ESP.
Your instructors are:
 Chungchieh Shan (ccshan at cs), office hours Thu 1:30–3pm in CoRE 306 or by appointment
 PaiHsi Huang (paihuang at cs), office hours Mon 2–3pm in Hill 402 or by appointment
Recommended textbooks:
 Artificial Intelligence: A Modern Approach
 The Haskell School of Expression: Learning Functional Programming through Multimedia
Everyone is required to:
 participate in class meetings;
 send weekly emails to “ai at rams dot rutgers dot edu” that ask questions and explain to your classmates how the ideas covered in class apply or fail to apply to your research and life;
 submit programming exercises, preferably in pairs;
 take a takehome midterm;
 propose, carry out, and write up a final project, preferably in pairs.
Grading will be based on:
 homework assignments (50%);
 participation during and outside class meetings (10%);
 midterm (10%);
 final proposal (15%);
 final project (15%).
From the grader
 09/27/2007

09/24/2007

09/18/2007 I also want to post some material that might help you understand Bayesian Networks.
 Bayesian Network Without Tears
 David Heckerman’s Technical Report. Feel free to skip section 1 & 2. Section 3, 4 & part of 5 are most important for you. If you decide to read section 2, note that the section talks about Bayesian Learning paradigm, not Bayesain Network. It has been argued that Bayesian Network is actually a misnomer and has nothing to do with the Bayesian learning paradigm.
 Andrew Moore’s tutorial. Here I give you the main page since this page contains a list of tutorials for useful models in AI. Related topics are Bayesian Networks, Inference in Bayesian Networks, Learning Bayesain Networks, Naive Bayesian Networks, and short overview of Bayes Nets.

09/18/2007 Later on this semester, we will be discussing Bayesian Networks. To understand Bayes Nets, you must have some basic knowledge of probability. In fact, you need to be familiar with conditional probability and some discrete and continuous probability distributions. I have put together some chapters from a book for you to review basic prob. and stat. Reading the material is optional, provided that you are comfortable with these topics. The ZIP file is password protected. I will send the password via the class email list. I have also attached, after each chapter, some practice problems and solutions to oddnumbered problems. Feel free to test yourself. The chapters cover:
 Basic conditional probability.
 Conditional distributions.
 Bernoulli, Binomial and Multinomial distributions (discrete)
 Normal, uniform and Beta distributions (continuous)
 I am also asked to find materials on Dirichlet distribution, but I believe that there is a Wikipedia entry that discusses about this distribution. It is actually a generalization of Beta distribution, similar to the relationship between multinomial and binomial.

09/12/2007 Make sure you are able to log in to handin. You will be submitting your programming assignment using this website. No email submission will be accepted unless explicit permission is given. If you have a problem logging in now (also make sure you find CS530 after you log in), you MUST talk to us. If you talk to us just two days before the assignment is due, then that is too late.
 09/12/2007 We have set up a gradebook for this course. In the future, you can view your grades anytime online, as long as you have a valid RUID. I have also created a dummy assignment at this point and given everyone a grade. Please log in to your gradebook. If you cannot log in or if you cannot see your grade on the dummy assignment, please email me.
Actual schedule
Problems and solutions
9/10
Before writing a program, first try to pin down what problem it is supposed to solve and when one solution is better than another. This initial step is harder in AI but also more helpful, because evaluating a solution can involve chance, interaction, and human judgment. For example, what is a good strategy for rock paper scissors? The formal language of types provides guidance. (What about for parsing or translating text?)
Code in class:
 Rock Paper Scissors: deterministic, random.
 Evaluating a onedimensional deterministic route.
Homework:
 Send the professor an email if you didn’t get an email from him to the class mailing list.
 Read “Why Functional Programming Matters” to familiarize yourself with types, higherorder functions, lazy evaluation, and game trees.
 Send an email to the class mailing list: introduce yourself and ask a question, make a comment, respond to the reading above, or otherwise talk about what in this class is and is not relevant to your work.
9/17
One simple way to view parsing is as a function from
String
to Bool
, that is, a binary
classifier of strings. As with most tasks in which human
performance constitutes the ground truth, it is worth establishing
a upper bound on machine performance (the annotator
agreement rate) and a lower bound on machine performance (the
baseline) before trying to write a parser. The performance
of a binary classifier is a tradeoff between recall and
precision.
QuickCheck is a Haskell tool that demonstrates how types can guide testing.
Complexity and representation
9/17
Often a problem is easier to state using one representation but easier to solve using another. For example, bruteforce generateandtest is usually too slow to solve search and optimization problems, so we represent partial solutions in order to test before generating. To implement these ideas, it is useful to think about streams of solutions wholesale and to compose generation and testing separately. One instance of this approach include backtracking search (nqueens).
Code in class:
Homework:
 Preferably in pairs (contact the professor first if not): Solve a family of problems you care about by backtracking. Describe to your classmates why you care and how you did it. (If your solution turns out to take too long to run, analyze why.) Get feedback about your problem, solution, and writing using the class mailing list. Submit your program and a description of it (these may be the same thing, if your program is literate) to the handin system. Please submit only one copy of each project but include the name of everyone who worked on it. If you have multiple files to submit, please zip them together. Due 9/24.
9/24
Don’t just solve board puzzles! By representing not just states but also actions as data, we can convert many planning problems into board problems such as finding a shortest path in a graph. Expressing concrete problems from real life as abstract problems in a welldesigned language makes it easier for researchers to care about and take advantage of each other’s work. For example, we can translate Sudoku puzzles into SAT problems, then solve them using a fast SAT solver.
Our representation of incomplete Sudoku solutions last time was very inexpressive compared to general Boolean formulas: it only narrows down possibilities for a single cell. More sophisticated representations of incomplete solutions reduce the need for brute force in search and optimization and replace it by early backtracking and constraint propagation. Examples: highschool algebra, type inference, branchandbound in integer programming.
Homework:

Read: Mackworth, Alan K. 1977. Consistency in networks of relations. Artificial Intelligence 8(1):99–118.
The point of reading this article is to see how a general algorithm of constraint propagation can apply to more than just one kind of puzzles.

Don’t forget to participate on the class mailing list!
10/1
Discuss Mackworth’s paper:

The author suggests that a solution to the nqueens problem is “a constructive proof for the wff
(∃x_{1})(∃x_{2})…(∃x_{n}) P_{1}(x_{1}) ∧ P_{2}(x_{2}) ∧ … ∧ P_{n}(x_{n}) ∧
P_{12}(x_{1},x_{2}) ∧ P_{13}(x_{1},x_{3}) ∧ …∧ P_{n−1,n}(x_{n−1},x_{n})”where P_{i} are certain unary constraints and P_{ij} are certain binary constraints. Give one way to express the nqueens problem by spelling out what these constraints are. Then, if time permits, give an example of a node inconsistency, an arc inconsistency, and a path inconsistency.

Osha: “could someone provide examples of practical ai problems that do not involve these boolean constraint that we’ve been talking about so far?” (Reid: look for a problem that doesn’t feel like a puzzle to humans.)
Parsing:
A parse is a witness of a statement, or a (normalform)
constructive proof in a logic. Backtracking parsing can take time
exponential in the length of the input string, even when there is
no parse in the end. To parse in polynomial time, we apply dynamic
programming, or memoization: we name and hence reuse the result of
computations. (Compare slowFib
and
fastFib
in Laziness.hs.) A
grammar formalism is a domainspecific programming language through
which researchers of human languages and parsing algorithms may
collaborate. As with any interpreter of a domainspecific language,
we can further speed up the parser by partial evaluation, which
compiles a grammar (a user of the language) to a native program by
fusing it with the parser (the implementation of the language).
Homework (due on 10/15):

Preferably in pairs (contact the professor first if not): In Parse1.hs is a backtracking parser called
parses
and a CYK (dynamic programming) parser calledparses'
. Although the latter is more efficient, both parsers return a list of parse trees. In Parse2.hs is a partially evaluated CYK parserparse
, which is faster thanparses'
not just because it is a parser and a grammar fused together but also because it only bothers to return aBool
to signify whether there is a parse. Changeparses'
to return parse trees. Start by puttingimport Parse1 (Symbol(..), Parse(..))
at the top of Parse2.hs, to reuse the definition of parse trees and symbols in Parse1.hs. Because you need only change the definitions of
word
,()
, and(&&&)
in Parse2.hs, you need only submit Parse2.hs to the handin system.
Probabilities, expectations, and utilities
10/8
A probability distribution can be interpreted both as a sampling procedure and as a weighted set of possibilities.
Code in class:

Distr.lhs defines the
Distr
interface and implements it two ways. 
DistrTest.lhs uses the
Distr
interface to model coins and dice.
10/15
Pull constants out of loops to avoid computing them over and over again. Filter possibilities according to observations.
Code in class: Stretch.hs, Stretch2.hs
Homework (due on 10/25):
 Preferably in pairs (contact the professor first if not): Use
the implemented representations of probability distributions in
Distr.lhs to model a stochastic process you care about. Explain
how to use your code and why you care about it. Try to filter
possibilities to match observations using
choose []
. Be prepared for your work to be presented in class by someone else.
10/22
A decision process is like a probability distribution, but with two additional primitives: perception and action. A policy specifies how to behave in a decision process. An optimal policy is one that maximizes expected utility.
Homework (see also the previous week and the next week):
 Read: Stone, Matthew. 2003. Agents in the real world: Computational models in artificial intelligence and cognitive science.
Herbert Simon wrote in “Rationality as process and as product of thought”:
Complexity is deep in the nature of things, and discovering tolerable approximation procedures and heuristics that permit huge spaces to be searched very selectively lies at the heart of intelligence, whether human or artificial. A theory of rationality that does not give an account of problem solving in the face of complexity is sadly incomplete. It is worse than incomplete; it can be seriously misleading by providing “solutions” to economic questions that are without operational significance.
10/29
From probability distributions to decision processes. A discounting factor is used to define utility if the process may continue forever.
Please study the code from today, newly commented:
 Basic implementation: Process.lhs (needs ProcessUtil.lhs)
 Usage examples: ProcessTest.lhs
 Pretty output: ProcessPretty.lhs (needs Finite.hs and Pretty.hs)
Also check out an application of probabilistic modeling: Joel Spolksy’s Evidence based scheduling.
Midcourse evaluation.
Homework (due on 11/5, in class of course):
 Each team from the
Distr
homework should have been assigned randomly to another team. Prepare a fiveminute presentation of your assigned group’sDistr
homework submission: what did they do, why is it interesting, and what did you learn? If you want, you can email me a PDF file ahead of time or use your own laptop’s VGA output for your presentation.
11/5
Presentations of probability distributions.
We have already covered Bayes nets. It’s all about filtering to update belief states with observations.
11/12
Midterm (90 minutes) and discussion thereof.
 The midterm is not takehome, but you may bring any written material.
 There will be multiple questions asking for short answers. For example, one question might ask whether backtracking takes polynomial time on a given family of parsing problems. Another question might ask you to explain whether a given decision process accurately models a natural environment, then compute an optimal strategy in response to the process. There are no previous exams for you to refer to—sorry!
 You will not have to write any code or an essay. You may have to read a bit of Haskell. You should know how the code we presented works, so that you know what takes a long time and what doesn’t, and what computes the correct result and what doesn’t.
A sample midterm is now available. It is about 20%–25% the length of the actual midterm. The actual midterm, like this sample midterm, refers to several pages of Russell and Norvig’s textbook; we will provide copies of these pages as part of the midterm. The cover page of the actual midterm will be identical to the cover page of this sample midterm, except for the number 2 (the total number of pages). One set of model answers for the sample midterm would be:
 2^{n}; 2^{n}/4
 {Rich}; Allow negated literals to enter the conjunction
Another set of model answers for the sample midterm would be:
 2^{n}; 2^{n−2}
 {Rich, Famous}; Allow disjunctions of literals in addition to conjunctions of literals
If you have any questions or notice any problems with the sample midterm, please speak up!
Today’s code for (hidden) Markov models is online. For a more classical explanation, see (the textbook or):
Lawrence R. Rabiner. 1989. An introduction to hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE 77(2):257–286.
On an almost entirely unrelated topic, if you are interested in the recent discussions on the mailing list about performance and about Haskell, you might find it helpful to study C++ versions of the nqueens problem and probability distributions.
Please see a separate page for information on the final assignment, the final class meeting, and the final project, including due dates.
11/19
Midterm discussion.
Decisionmaking under uncertainty. Learning. Continuous variables.
Please see updates about the final project and about King’s paper.
The Kalman filter is a nice example of cleverly representing an infinite distribution in very little space and manipulating it in very little time:
Peter S. Maybeck. 1979. Stochastic models, estimation, and control. Mathematics in Science and Engineering 141, San Diego, CA: Academic Press.
The parameters and structure of a Bayes net can be learned by hillclimbing search:
David Heckerman. 1995. A tutorial on learning with Bayesian networks. Tech. Rep. MSRTR9506, Microsoft Research. Revised Nov. 1996.
Finally, you might be interested in an interview with a driver behind NASA’s Mars Rovers.
In class today, we built the following decision process for the
environment of a robotic dog. It is a variation on env
in
ProcessTest.lhs. The first argument to this function is the
number of rounds that the dog is allowed to wait for (say
3
). The second argument is the utility for the dog to
step forward right away (say 0.2
).
data Step = Forward  Wait  Home deriving (Eq, Ord, Show, Read) env :: (Decide d Bool, Observe d Step) => Int > Double > d Double Double () env n u = bind (observe ([Forward, Home] ++ if n == 0 then [] else [Wait])) (\step > case step of Forward > reward u Home > reward 0.3 Wait > bind (reward (0.05)) (\() > bind (decide [(0.5, True), (0.5, False)]) (\ball > env (n1) (if ball then 0.75 else 0.15))))
11/26
Decision trees and inductive learning illustrate:
 Programs as hypotheses
 Greedy hillclimbing as approximate search
 Measuring discriminative information as bits of entropy
 Validation and pruning to prevent overfitting; Occam’s razor
 The utility of classification; cost sensitivity
 Statistical (macro) vs anecdotal (micro) evidence
Today’s code is in DecisionTree.lhs. A good way to learn more about decision trees is Quinlan’s original paper (Machine Learning 1(1):81–106, 1986).
Tom Walsh’s talk on Thursday morning is very relevant to the topics we have been covering in this course.
12/3
Neural networks.
 Key phrases: blackbox model; resembling/beinginspiredby/beingaplausiblemodelof nature; understanding intelligence
 Topics: perceptron; expressive power; hidden neurons; training by gradient descent and its variants
 Discussion question: “if neural nets are black boxes, how do they help us understand or imitate intelligence?”
12/10
Constructive proofs and knowledge representation.
Embedding theories for multiagent modeling.