cs530: Principles of artificial intelligence
2011-12-16 18:10

16:198:530, Fall 2007
Classes: Mon 3:20–6:20pm, CoRE 301

This course introduces the following principles of artificial intelligence.

• Represent and solve problems in an abstract language
• Manage uncertainty by tracking possibilities
• Plan behavior by maximizing expected utility

A concomitant goal is for you to learn how to work with other people, especially to talk to them without ESP.

• Chung-chieh Shan (ccshan at cs), office hours Thu 1:30–3pm in CoRE 306 or by appointment
• Pai-Hsi Huang (paihuang at cs), office hours Mon 2–3pm in Hill 402 or by appointment

Recommended textbooks:

Everyone is required to:

• participate in class meetings;
• send weekly emails to “a-i at rams dot rutgers dot edu” that ask questions and explain to your classmates how the ideas covered in class apply or fail to apply to your research and life;
• submit programming exercises, preferably in pairs;
• take a take-home midterm;
• propose, carry out, and write up a final project, preferably in pairs.

• homework assignments (50%);
• participation during and outside class meetings (10%);
• midterm (10%);
• final proposal (15%);
• final project (15%).

• 09/27/2007
• 09/24/2007

• 09/18/2007 I also want to post some material that might help you understand Bayesian Networks.

• 09/18/2007 Later on this semester, we will be discussing Bayesian Networks. To understand Bayes Nets, you must have some basic knowledge of probability. In fact, you need to be familiar with conditional probability and some discrete and continuous probability distributions. I have put together some chapters from a book for you to review basic prob. and stat. Reading the material is optional, provided that you are comfortable with these topics. The ZIP file is password protected. I will send the password via the class email list. I have also attached, after each chapter, some practice problems and solutions to odd-numbered problems. Feel free to test yourself. The chapters cover:

• Basic conditional probability.
• Conditional distributions.
• Bernoulli, Binomial and Multinomial distributions (discrete)
• Normal, uniform and Beta distributions (continuous)
• I am also asked to find materials on Dirichlet distribution, but I believe that there is a Wikipedia entry that discusses about this distribution. It is actually a generalization of Beta distribution, similar to the relationship between multinomial and binomial.
• 09/12/2007 Make sure you are able to log in to handin. You will be submitting your programming assignment using this website. No email submission will be accepted unless explicit permission is given. If you have a problem logging in now (also make sure you find CS530 after you log in), you MUST talk to us. If you talk to us just two days before the assignment is due, then that is too late.

## Actual schedule

### Problems and solutions

#### 9/10

Before writing a program, first try to pin down what problem it is supposed to solve and when one solution is better than another. This initial step is harder in AI but also more helpful, because evaluating a solution can involve chance, interaction, and human judgment. For example, what is a good strategy for rock paper scissors? The formal language of types provides guidance. (What about for parsing or translating text?)

Code in class:

Homework:

• Send the professor an email if you didn’t get an email from him to the class mailing list.
• Read “Why Functional Programming Matters” to familiarize yourself with types, higher-order functions, lazy evaluation, and game trees.
• Send an email to the class mailing list: introduce yourself and ask a question, make a comment, respond to the reading above, or otherwise talk about what in this class is and is not relevant to your work.

#### 9/17

One simple way to view parsing is as a function from `String` to `Bool`, that is, a binary classifier of strings. As with most tasks in which human performance constitutes the ground truth, it is worth establishing a upper bound on machine performance (the annotator agreement rate) and a lower bound on machine performance (the baseline) before trying to write a parser. The performance of a binary classifier is a tradeoff between recall and precision.

QuickCheck is a Haskell tool that demonstrates how types can guide testing.

### Complexity and representation

#### 9/17

Often a problem is easier to state using one representation but easier to solve using another. For example, brute-force generate-and-test is usually too slow to solve search and optimization problems, so we represent partial solutions in order to test before generating. To implement these ideas, it is useful to think about streams of solutions wholesale and to compose generation and testing separately. One instance of this approach include backtracking search (n-queens).

Code in class:

Homework:

• Preferably in pairs (contact the professor first if not): Solve a family of problems you care about by backtracking. Describe to your classmates why you care and how you did it. (If your solution turns out to take too long to run, analyze why.) Get feedback about your problem, solution, and writing using the class mailing list. Submit your program and a description of it (these may be the same thing, if your program is literate) to the handin system. Please submit only one copy of each project but include the name of everyone who worked on it. If you have multiple files to submit, please zip them together. Due 9/24.

#### 9/24

Don’t just solve board puzzles! By representing not just states but also actions as data, we can convert many planning problems into board problems such as finding a shortest path in a graph. Expressing concrete problems from real life as abstract problems in a well-designed language makes it easier for researchers to care about and take advantage of each other’s work. For example, we can translate Sudoku puzzles into SAT problems, then solve them using a fast SAT solver.

Our representation of incomplete Sudoku solutions last time was very inexpressive compared to general Boolean formulas: it only narrows down possibilities for a single cell. More sophisticated representations of incomplete solutions reduce the need for brute force in search and optimization and replace it by early backtracking and constraint propagation. Examples: high-school algebra, type inference, branch-and-bound in integer programming.

Homework:

• Read: Mackworth, Alan K. 1977. Consistency in networks of relations. Artificial Intelligence 8(1):99–118.

• Don’t forget to participate on the class mailing list!

#### 10/1

Discuss Mackworth’s paper:

• The author suggests that a solution to the n-queens problem is “a constructive proof for the wff

(∃x1)(∃x2)…(∃xn) P1(x1) ∧ P2(x2) ∧ … ∧ Pn(xn) ∧
P12(x1,x2) ∧ P13(x1,x3) ∧ …∧ Pn−1,n(xn−1,xn)”

where Pi are certain unary constraints and Pij are certain binary constraints. Give one way to express the n-queens problem by spelling out what these constraints are. Then, if time permits, give an example of a node inconsistency, an arc inconsistency, and a path inconsistency.

• Osha: “could someone provide examples of practical a-i problems that do not involve these boolean constraint that we’ve been talking about so far?” (Reid: look for a problem that doesn’t feel like a puzzle to humans.)

Parsing:

A parse is a witness of a statement, or a (normal-form) constructive proof in a logic. Backtracking parsing can take time exponential in the length of the input string, even when there is no parse in the end. To parse in polynomial time, we apply dynamic programming, or memoization: we name and hence reuse the result of computations. (Compare `slowFib` and `fastFib` in Laziness.hs.) A grammar formalism is a domain-specific programming language through which researchers of human languages and parsing algorithms may collaborate. As with any interpreter of a domain-specific language, we can further speed up the parser by partial evaluation, which compiles a grammar (a user of the language) to a native program by fusing it with the parser (the implementation of the language).

Homework (due on 10/15):

• Preferably in pairs (contact the professor first if not): In Parse1.hs is a backtracking parser called `parses` and a CYK (dynamic programming) parser called `parses'`. Although the latter is more efficient, both parsers return a list of parse trees. In Parse2.hs is a partially evaluated CYK parser `parse`, which is faster than `parses'` not just because it is a parser and a grammar fused together but also because it only bothers to return a `Bool` to signify whether there is a parse. Change `parses'` to return parse trees. Start by putting

````import Parse1 (Symbol(..), Parse(..))`
```

at the top of Parse2.hs, to reuse the definition of parse trees and symbols in Parse1.hs. Because you need only change the definitions of `word`, `(|||)`, and `(&&&)` in Parse2.hs, you need only submit Parse2.hs to the handin system.

### Probabilities, expectations, and utilities

#### 10/8

A probability distribution can be interpreted both as a sampling procedure and as a weighted set of possibilities.

Code in class:

• Distr.lhs defines the `Distr` interface and implements it two ways.
• DistrTest.lhs uses the `Distr` interface to model coins and dice.

#### 10/15

Pull constants out of loops to avoid computing them over and over again. Filter possibilities according to observations.

Code in class: Stretch.hs, Stretch2.hs

Homework (due on 10/25):

• Preferably in pairs (contact the professor first if not): Use the implemented representations of probability distributions in Distr.lhs to model a stochastic process you care about. Explain how to use your code and why you care about it. Try to filter possibilities to match observations using `choose []`. Be prepared for your work to be presented in class by someone else.

#### 10/22

A decision process is like a probability distribution, but with two additional primitives: perception and action. A policy specifies how to behave in a decision process. An optimal policy is one that maximizes expected utility.

Herbert Simon wrote in “Rationality as process and as product of thought”:

Complexity is deep in the nature of things, and discovering tolerable approximation procedures and heuristics that permit huge spaces to be searched very selectively lies at the heart of intelligence, whether human or artificial. A theory of rationality that does not give an account of problem solving in the face of complexity is sadly incomplete. It is worse than incomplete; it can be seriously misleading by providing “solutions” to economic questions that are without operational significance.

#### 10/29

From probability distributions to decision processes. A discounting factor is used to define utility if the process may continue forever.

Please study the code from today, newly commented:

Also check out an application of probabilistic modeling: Joel Spolksy’s Evidence based scheduling.

Mid-course evaluation.

Homework (due on 11/5, in class of course):

• Each team from the `Distr` homework should have been assigned randomly to another team. Prepare a five-minute presentation of your assigned group’s `Distr` homework submission: what did they do, why is it interesting, and what did you learn? If you want, you can email me a PDF file ahead of time or use your own laptop’s VGA output for your presentation.

#### 11/5

Presentations of probability distributions.

We have already covered Bayes nets. It’s all about filtering to update belief states with observations.

#### 11/12

Midterm (90 minutes) and discussion thereof.

• The midterm is not take-home, but you may bring any written material.
• There will be multiple questions asking for short answers. For example, one question might ask whether backtracking takes polynomial time on a given family of parsing problems. Another question might ask you to explain whether a given decision process accurately models a natural environment, then compute an optimal strategy in response to the process. There are no previous exams for you to refer to—sorry!
• You will not have to write any code or an essay. You may have to read a bit of Haskell. You should know how the code we presented works, so that you know what takes a long time and what doesn’t, and what computes the correct result and what doesn’t.

A sample midterm is now available. It is about 20%–25% the length of the actual midterm. The actual midterm, like this sample midterm, refers to several pages of Russell and Norvig’s textbook; we will provide copies of these pages as part of the midterm. The cover page of the actual midterm will be identical to the cover page of this sample midterm, except for the number 2 (the total number of pages). One set of model answers for the sample midterm would be:

1. 2n; 2n/4
2. {Rich}; Allow negated literals to enter the conjunction

Another set of model answers for the sample midterm would be:

1. 2n; 2n−2
2. {Rich, Famous}; Allow disjunctions of literals in addition to conjunctions of literals

If you have any questions or notice any problems with the sample midterm, please speak up!

Today’s code for (hidden) Markov models is online. For a more classical explanation, see (the textbook or):

Lawrence R. Rabiner. 1989. An introduction to hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE 77(2):257–286.

On an almost entirely unrelated topic, if you are interested in the recent discussions on the mailing list about performance and about Haskell, you might find it helpful to study C++ versions of the n-queens problem and probability distributions.

Please see a separate page for information on the final assignment, the final class meeting, and the final project, including due dates.

#### 11/19

Midterm discussion.

Decision-making under uncertainty. Learning. Continuous variables.

The Kalman filter is a nice example of cleverly representing an infinite distribution in very little space and manipulating it in very little time:

Peter S. Maybeck. 1979. Stochastic models, estimation, and control. Mathematics in Science and Engineering 141, San Diego, CA: Academic Press.

The parameters and structure of a Bayes net can be learned by hill-climbing search:

David Heckerman. 1995. A tutorial on learning with Bayesian networks. Tech. Rep. MSR-TR-95-06, Microsoft Research. Revised Nov. 1996.

Finally, you might be interested in an interview with a driver behind NASA’s Mars Rovers.

In class today, we built the following decision process for the environment of a robotic dog. It is a variation on `env` in ProcessTest.lhs. The first argument to this function is the number of rounds that the dog is allowed to wait for (say `3`). The second argument is the utility for the dog to step forward right away (say `0.2`).

``````data Step = Forward | Wait | Home deriving (Eq, Ord, Show, Read)

env :: (Decide d Bool, Observe d Step) => Int -> Double -> d Double Double ()
env n u = bind (observe ([Forward, Home] ++
if n == 0 then [] else [Wait])) (\step ->
case step of
Forward -> reward u
Home    -> reward 0.3
Wait    -> bind (reward (-0.05)) (\() ->
bind (decide [(0.5, True), (0.5, False)]) (\ball ->
env (n-1) (if ball then 0.75 else 0.15))))```
```

#### 11/26

Decision trees and inductive learning illustrate:

• Programs as hypotheses
• Greedy hill-climbing as approximate search
• Measuring discriminative information as bits of entropy
• Validation and pruning to prevent overfitting; Occam’s razor
• The utility of classification; cost sensitivity
• Statistical (macro) vs anecdotal (micro) evidence

Today’s code is in DecisionTree.lhs. A good way to learn more about decision trees is Quinlan’s original paper (Machine Learning 1(1):81–106, 1986).

Tom Walsh’s talk on Thursday morning is very relevant to the topics we have been covering in this course.

#### 12/3

Neural networks.

• Key phrases: black-box model; resembling/being-inspired-by/being-a-plausible-model-of nature; understanding intelligence
• Topics: perceptron; expressive power; hidden neurons; training by gradient descent and its variants
• Discussion question: “if neural nets are black boxes, how do they help us understand or imitate intelligence?”

#### 12/10

Constructive proofs and knowledge representation.

Embedding theories for multiagent modeling.