Proper Treatment 正當作法/ cs504/ 2007/ midterm
2008-08-17 19:19

Don’t panic!

This midterm is due by email to “ccshan at cs dot rutgers dot edu” by the end of March 9, 2007 (Friday), midnight Eastern time. On March 10, 2007 (Saturday), I will respond to acknowledge each message I get from you by then. Please send me a plain-text, PDF, or PostScript document, not a Word or Excel document.

Please (please!) also send any questions to me at the same email address. If I don’t respond in person, I will respond by email and post the exchange here with your name removed.

Feel free to use code from this course or elsewhere (or not). Your calculations can be approximate, as long as they are precise enough to justify your conclusions. The problems can be solved out of order. In any case, please describe not just your decisions but also how you arrive at them.

Updates

A student helps me clarify:

For problem 2, as described, the train’s leaving time is uniformly distributed between 0 and 5 minutes. My understanding was that the whole area of [0, 5] has the same probability of 0.2.

To be more precise, the probability density is 0.2 between 0 and 5.

Then the last sentence of this problem says “with probability 0.2, I might ascertain that the train will leave in 3 to 4 minutes (uniformly distributed).” I was not quite sure if this additional information means that the train leave in 3 to 4 minutes with the probability of 1 or 0.2, because if it was 0.2, this seemed to me not to have added any different (or additional) information. Please confirm with me even though my understanding was wrong.

The probability 0.2 here is the probability of my ascertaining, not of the train’s leaving. So, the probability is 0.2 that I discover that the probability is 1 that the train leaves in 3 to 4 minutes. But note also that the probability is 0.2 that I discover that the probability is 1 that the train leaves in 2 to 3 minutes.

For problem 1, it’s described that checking the time and being honked are worth 10 cents, while the benefit, although not given a value, is indicated as giving me an idea in how many minutes the train will leave, in relation to problem 2. My understanding is that the utility issue is not directly related to HMM which considers the states rather than the utility, but in general the utility may still make differences after solving the state by HMM. For example, even though the next state is early green, I may still want to check the time if the rewards of doing so outweigh the cost of being honked. So my question is in this problem, should we taken into account any rewards/utility of checking the time or not?

Yes, you’re right. Problem 2 yields the rewards/utility of checking the time.

More on Problem 3:

I’m wondering if the numbers are what you intended. I ask because with the mean of X being 1.01, and the SD being .005, even with the .002 SD in the measure, the 1.05 and 1.1 estimates for X given by the arrival times seem serious into the realm of not-going-to-happen.

The numbers are as intended, and indeed they are far from my prior expectations, but isn’t everything in life unlikely or impossible anyway? (Perhaps more so with the Pennsylvania Rail Road…)

Also, minutes, instead of seconds, seem too broad a measure, given the precision by which the standard deviations are known. Speaking of which, do you want the arrival time to the nearest minute or nearest second?

Please bear with my use of minutes as a unit, and feel free to estimate the arrival time to the nearest minute or second as long as you say how you compute it. (To be sure, I know the standard deviations precisely because they are subjective to me.)

I don’t understand to what ‘ratios’ you are referring. Or are you just indicating that the time of travel from New Brunswick to Rahway has a mean of 20 minutes with a standard deviation of .002?

The time of travel from New Brunswick to Rahway is normally distributed with a mean of 20X minutes and a standard deviation of 20×0.002 minutes. Conditional on X, this time is independent of the times beyond Rahway.

More on Problem 1:

If people do start honking, did you still manage to check?

If people do start honking, you should finish checking, because the 10 cents worth of annoyance has already been incurred.

Are you grading based on us using the answer we got from Problem 2, or on what it should have been?

I won’t take off points in Problem 1 just because you used a wrong result from (the second part of) Problem 2.

I am still confused if for problem 1 we should consider the utility or not. One understanding could be ‘not’ (at least not as problem 2, which almost all focused on utility computing). In other words, for problem 1, if the next state is early green, I should not check the time because I do not want to be honked (costing 10 cents), even though I may know the time when the train leaves (which is reward) by checking it, because the credit is not considered here. What we care is only what the next state is early green or late red. Another understanding could be that we should take care of the credit of checking time. Even though we obtain the result of the next state (early green or late red), we’ll still need to compute total utility before we decide if ‘I should check the time’. Any more possible clarifications?

But if you don’t take into account the utility of knowing the time, why bother checking the time at all even if you know for sure that the light will stay red?

Yet more on Problem 3:

I take it that we need to find the value of X for the day at issue.

You cannot find out the exact value of X! The problem asks you for the expected value of a random variable. Your prior belief about X is a normal distribution, and it turns out that your posterior belief about X—that is, the condition probability distribution of X given the two interval times you know—is another normal distribution.

Yet more on Problem 1:

For an HMM you need to have distinct observables (an alphabet), but we have the fact that the light is red and green and this additional information relating to speed. If we use this to obtain a type of classifier saying, well if its within some distance of the distribution for early red it is in state early red, we would be directly observing the states. I guess a related question is even if we went ahead and used Bayes’ rule to update a probability distribution based on the prior (mean 5 stddev 1) and the likelihood (5mph), how would we reduce the distribution down to a single probability. I guess I have issues with a combination of an HMM and models of probabilities for the observables.

Observing that the light is red and the cross traffic is 5 mph at the first time step tells you for sure that the state at the first time step can only be “early red” or “late red”, not “early green” or “late green”. Furthermore such an observation provides uncertain information about the state. The conditional probability density of observing 5-mph cross traffic given that the state is early red is greater than the conditional probability density of observing 5-mph cross traffic given that the state is late red. Neither red state bounds the speed of the cross traffic sharply: even when the state is early red, cross traffic could still conceivably go at 10 mph; it’s just much less likely.

Another question regarding problem 1. I have the information about the observations when the light is red, but can I use these same distributions when the light is green. Are the states symmetric, for example if it is late red on one side of the traffic is it late green on the other? In this way we could model it using only two states. In other words, we know something about the probabilities when the light is red, but what about when the light is green.

There are four states for the entire model (not four for each of the two streets!). Their initial and transitions probabilities are pretty symmetric and specified in the problem. What I called the “early red” state is the “early red on George, early green on Hamilton” state; what I called the “late green” state is the “late green on George, late red on Hamilton” state. You need to distinguish the state “early red (on George, early green on Hamilton)” from the state “early green (on George, early red on Hamilton)” because I only get honked at in the latter state.

Don’t we need to know something about the probability densities of the speed when the light is early green or late green? Right now the observations are speed probability, in other words we are in state EarlyRed, the probability of seeing 5mph is 0.39, what about state early green?

We can assume, perhaps fatally, that there is no cross traffic when our light is green (: But actually we don’t need to assume that because each of our observations rules out the possibility that the corresponding state is early green or late green, because we observe that “the light has stayed red”.

In one of students’ questions you posted, there said “we are in state EarlyRed, the probability of seeing 5mph is 0.39, what about state early green?” As I think the number 0.39 is, not the probability, but the density value of 5 mph. As in problem 3, here in problem 1, we can calculate only the density value of each speed. So, do we have to use the same formula as in problem 3 for calculating probabilities?

You are right that the probability of seeing 5mph in state EarlyRed is actually zero. The probability density of seeing 5mph in state EarlyRed is greater than zero.

In problem 1, do I lose 10 cents if honking makes me annoyed? And otherwise, do I earn 10 cents or not?

There are two equivalent ways to think about this. The first way is: if someone honks, I lose 10 cents; if nobody honks, the utility is zero. The second way is: if someone honks, the utility is zero; if nobody honks, I earn 10 cents.

Problem 1: I feel like I understand what to do for this question (always a good start!), but not how to do it. … My current problem is that I just don’t have the technical skills to implement the Viterbi algorithm. I could learn, but it would take some time. I was thinking of adapting the code in markov.scm so that it dealt with continuously distributed outputs (i.e. the car speeds), but my knowledge of Scheme isn’t yet good enough to know how to do so. Another idea was to use the HMM Toolbox for Matlab, but I couldn’t follow the terminology used in the readme file.

The math is simple enough that you can do it by hand. I apologize for the repetition in the following text.

You want the conditional probability distribution over the states at the 4th time step, given the observations you have at the 1st, 2nd, and 3rd time steps. To get this conditional probability distribution, you need to compute the conditional probability distribution over the states at the 3rd time step, given the observations you have at the 1st, 2nd, and 3rd time steps. To get this conditional probability distribution, you need to compute the conditional probability distribution over the states at the 2nd time step, given the observations you have at the 1st and 2nd time steps. Finally, to get this conditional probability distribution, you need to compute the conditional probability distribution over the states at the 1st time step, given the observations you have at the 1st time step.

Even more on Problem 3:

You write “I then assume that, conditional on X, the three ratios… are independently distributed with mean X and standard deviation 0.002”. I understand what a conditional density function is, but only when it is conditional on an event, i.e., a set of worlds. I don’t understand what it is for anything to be conditional on a random variable. My best guess at what you have in mind is that given a particular observation of X, say X=x, the cPDF f(x|X=x) for each ratio is normally distributed. Am I thinking along the right lines?

Yes—to adjust your last sentence a bit (especially because we never observe X directly): Given any value x that the random variable X could possibly take, the conditional probability distribution of each trip segment duration is normal. (I wouldn’t write “f(x|X=x)” because it doesn’t make sense to have the same “x” appear both to the left and to the right of “|”.) Furthermore, again given any value x that the random variable X could possibly take, the durations of the three trip segments are conditionally independent of each other.

Indeed, a similar worry goes for when you say that the mean is, e.g., 20X. You can’t really mean that, but rather that for each world w, the mean is 20X(w).

Yes (note that as soon as you take a segment duration into account, that single world w splits into more worlds, each with the same value for X but a different segment duration).

Even more on Problem 2:

I’ve worked out an answer for the first part of this question, though I’m a little less confident of my reasoning.

I hope the following way of looking at the problem makes sense: the probability “that I do not catch the train if I take the stairs” is equal to the probability that a real number uniformly picked between 0 and 1 is greater than a real number uniformly picked between 0 and 5, assuming that the two picks are independent.

I am wondering, in the problem 2, what it is exactly meant that 50 cents of not lugging my bike up the stairs. In other word, when I don’t lug the bike, do I earn the utility of 50 cents? And when I do lug it, do I lose 50 cents, or is there no cost or earning?

There are two equivalent ways to think about this. The first way is: when I don’t lug the bike, I earn the utility of 50 cents; when I do lug the bike, the utility is zero. The second way is: when I don’t lug the bike, the utility is zero; when I do lug the bike, I lose the utility of 50 cents.

The meaning of sentences below, you mentioned, does not come to my mind well. Does it mean that the probability of my subjective belief is 0.2 that the train definitely leaves in 3 to 4 minutes? If not, could you paraphrase them shortly with easier expressions? “So, the probability is 0.2 that I discover that the probability is 1 that the train leaves in 3 to 4 minutes. But note also that the probability is 0.2 that I discover that the probability is 1 that the train leaves in 2 to 3 minutes.”

I think I still need help on understanding part 2 of problem 2. The change here is that I find out which specific minute the train will leave in. But the utilities of A and B are different depending on which minute I believe the train will leave in. I guess I did not understand what you are asking for. Should we look at each minute?

The second part of Problem 2 describes the following process. At the beginning, there are five possibilities, each with probability 0.2. Let’s call these possibilities P0, P1, P2, P3, P4. Exactly one of these possibilities is true; I find out which one immediately. Then, in each possibility Pn, the train leaves in between n and n+1 minutes, uniformly distributed.

I’m asking you to find the difference in expected utility between this process and the process described in the first part of the problem.

So at the point when I find out which one of these five is true, the possibility of that one is 1, instead of the previous 0.2, correct?..

At the point when I find out which one of these five is true, the probability (not possibility) of that one is 1, instead of the previous 0.2.

This means that each possibility Pn (the train leaving within [n, n+1]) is 1, rather than 0.2, when it uniformly distributed within [0, 5]. right?..

What I called P0, P1, P2, P3, P4 are possibilities, not probabilities. A possibility is not a number; it can be neither 1 nor 0.2.

As for the probability of 0.2 that I discover that the probability is 1 that the train leaves in 3 to 4 minutes, could I take the probability as another value, for example, 0.9? Does that make the question different?

I wrote “the probability is 0.2 that I discover that the probability is 1 that the train leaves in 3 to 4 minutes” to clarify the problem stated in the midterm. If you change “1” in my clarification to “0.9”, it would become inconsistent with the problem stated in the midterm.

Sorry I meant to ask what if I change “0.2” to “0.9”. The reason was that I was not understanding why the probability is picked as 0.2. If I am only 0.2 certain that the train leaves in a specific [n, n+1], what was the difference from part 1, in which the probability leaving in any [n, n+1] is 0.2, too. And my original understanding was that when I find out in which minute the train leaves, I am 100% (p=1) ascertain, not just 0.2. Then later I thought maybe the 0.2 certain (0.8 uncertainty) was caused by my error in reading the time?..I guess I thought too much?

The difference between the two parts of Problem 2 is: In the second part, you find out for sure which minute the train will arrive in before you have to decide whether to take the stairs or the escalator. So you can do something different based on which minute the train will arrive in. But even in the second part, the probability (from your perspective before you find out which minute the train will arrive in) that you find out for sure that the train will arrive in the 4th minute (for example) is 0.2.

On Problem 0:

I feel like once I’ve figured out how to implement the HMM in 0, this shouldn’t be too much of a problem.

You don’t need to implement the HMM to solve this problem. In particular, the observation probabilities don’t matter here because there are no observations in this problem.

Problem 3

Pennsylvania Rail Road, 1911

I am on a train from New Brunswick to New York, and wonder when I will arrive. To model the time a given train takes to get from one station to another, I assume a random variable X (the “sluggishness” of the given train), which is normally distributed with mean 1.01 and standard deviation 0.005. I then assume that, conditional on X, the three ratios

are independently and normally distributed with mean X and standard deviation 0.002. Today the train to New York left New Brunswick at 6:49 pm, Rahway at 7:10 pm, and Elizabeth at 7:21 pm. When do I expect the train to arrive in New York?

Problem 2

New Brunswick, 1910

Biking straight south on George Street in New Brunswick, I cross Somerset Street and face a decision to make. Do I board the train to New York

I’d rather not lug my bike up the stairs (that’s worth 50 cents to me), but I’d also rather not wait for the next train (that’s worth 1 dollar to me).

I don’t know exactly how much time I have until this train leaves the station. As far as I know, this duration is uniformly distributed between 0 minutes and 5 minutes. Climbing the stairs takes between 0 and 1 minutes (uniformly distributed). Taking the escalators requires a detour to Albany Street and takes between 1 and 4 minutes (uniformly distributed).

Which end of the platform would a rational me choose (and why)?

How much better off would I be if, before I make my decision, I find out which of the next 5 minutes the train will leave in? For example, with probability 0.2, I might ascertain that the train will leave in 3 to 4 minutes (uniformly distributed).

Problem 1

Before crossing Somerset Street, I stop at a red light at Hamilton Street (C). While waiting, should I fumble my cell phone out of my pocket to check the time? If I do, I would find out which minute the train will leave in, as described in Problem 2 above. But if the light turns green during the 10 seconds it takes me to take out my phone, look at it, and put it back, then the motorists behind me will honk, and that annoyance is worth 10 cents to me. For simplicity, let us neglect how the time I spend at this light affects the time left before the train leaves.

I apply the following crude hidden Markov model to the intersection and its traffic light. The traffic light cycles through four states: early red, late red, early green, and late green. Each state has the same initial probability 1/4. At each time step (every 10 seconds), the light advances to the next state with probability 1/2, and stays in the same state with probability 1/2.

Of course, I can see whether the light is red or green at each time step. I cannot directly tell early red from late red (or early green from late green), but I can see the speed of traffic on Hamilton Street. When the light is early red, I model the speed as a normal distribution whose mean is 5 mph and standard deviation is 1 mph. When the light is late red, traffic accelerates to a mean of 7 mph and standard deviation of still 1 mph.

In the three time steps (in other words, 30 seconds) since I arrived at this intersection, the light has stayed red while traffic on Hamilton Street went by at the speeds 5 mph, 6 mph, and 7 mph. (I think these speed estimates are fairly and equally precise.) Meanwhile, I daydreamed instead of checking the time. Should I check the time at the very next time step and risk the honk?

Problem 0

How much time does the traffic-light model in Problem 1 above expect each light cycle to take?

That’s all!