Hi guys!
I want to invite you to the wine and cheese tasting at CBRSS from 4-6. It will
be set up in the main lobby. Also, if you prefer beer and cheese, that could
be arranged :) Just ask.
Hope to see you there!
Marie
Hi, everyone. Having heard from several folks, I'm going to split the
difference between people's schedules and say that we'll do Calculus
10.30am to 11.30am next Monday. I think it will be most helpful for folks
if we do the session before lecture Monday.
Sound ok?
Ryan
------------------------------------------
Ryan T. Moore ~ Government & Social Policy
Ph.D. Candidate ~ Harvard University
Homepage: http://www.people.fas.harvard.edu/~rtmoore/
Gov1000: http://www.courses.fas.harvard.edu/~gov1000/
Hi, all. I'm planning a vector calculus session of about one hour for
those who haven't done any calculus, other than maybe the Prefresher.
Would 10am Monday work for those who are interested?
Cheers,
Ryan
------------------------------------------
Ryan T. Moore ~ Government & Social Policy
Ph.D. Candidate ~ Harvard University
Homepage: http://www.people.fas.harvard.edu/~rtmoore/
Gov1000: http://www.courses.fas.harvard.edu/~gov1000/
Hi guys!
Just a reminder that there will be a section today focusing on matrix
algebra and vector geometry. It is optional, though I recommend that
anyone who hasn't taken a course in linear algebra come. (attendance was
low on Tues.)
See you soon!
Alison
> I am having a latex problem. I can't get decent looking paragraphs
> in my latex document. Is there a proper syntax for indentation of
> paragraphs?
If you double space between paragraphs, then the first paragraph of a
section will not be indented, but the rest will, I believe. Another
command you might investigate in LaTeX is \indent. Let me know if neither
of these suffices!
Ryan
> > S_E is just a descriptive
> > measure that tells you how much variability there is around the
> > regression line.
>
> Ok, your email definitely helped clarify what S_E is trying to describe:
> variability around the regression line.
>
Cool.
> But I still don't understand what values tell us the variability is
> low or variability is high. Doesn't the S_E value depend on the
> y-values for our particular experiment? ie, the variability values
> do not mean the same thing for different data sets, unlike r^2,
> where the varability is always between 0 and 1.
S_E does depend on the sample variance of y in a particular dataset.
Because of this, you do not want to compare S_E across models that are
fit to different y variables. Note that, for the same reason, one
should also *NOT* compare R^2 across regression models that are fit to
different y variables. R^2 is simply the following:
R^2 = RegSS / TSS (p. 91 Fox97)
which is the same as
R^2 = Var(\hat{y}) / Var(y) (p. 58 of Achen)
despite the standardized [0,1] scale, R^2 depends on the observed
variance of y.
As a side note, the discussion on pp. 58-61 of Achen regarding R^2 is
very good.
> So how do we know what range of values for our experiment denote low
> vs high levels of variability? I recall on the section handout
> sheet, Allison said to compare it to the standard deviation of the y
> variable, range, min, max, etc???
Yep, exactly. Note the identity
RegSS = TSS - RSS
on p. 91 of Fox97.
Another way to write this is:
Var(\hat{y}) = Var(y) - Var(\hat{\epsilon})
or with some simple algebra
Var(y) = Var(\hat{y}) + Var(\hat{\epsilon})
in words, the sample variance of y is equal to the sample variance of
the fitted values plus the sample variance of the residuals.
Recall that S_E^2 is the sample variance of the residuals. Since
variances have to be nonnegative it has to be the case that S_E^2
falls somewhere in the range [0, Var(y)] and S_E has to be in the
range [0, SD(y)]. At the one extreme if S_E = 0 then the regression
line fits the observed data perfectly (all residuals are 0) and at the
other extreme if S_E = SD(y) then the slope coefficient is exactly 0
so that all the fitted values are equal to the mean of y.
As a side note, recall the RFS plot from Cleveland. This plot is a
graphical depiction of the variance decomposition
Var(y) = Var(\hat{y}) + Var(\hat{\epsilon})
Hope this helps.
Best,
Kevin
Hi, everyone. Problem Set 5 has been posted to the course website. As
Kevin mentioned in lecture yesterday, PS5 will be due in lecture 15
November 2004.
Happy Voting,
Ryan
------------------------------------------
Ryan T. Moore ~ Government & Social Policy
Ph.D. Candidate ~ Harvard University
Homepage: http://www.people.fas.harvard.edu/~rtmoore/
Gov1000: http://www.courses.fas.harvard.edu/~gov1000/
> in any case, I am still a little confused about how to evaluate the
> standard error of regression. What, exactly, are "good", "ok", and
> "bad" values?
Right. I wouldn't think about it as a question of "good" vs. "bad"
values of S_E. S_E is an estimator for the standard deviation of the
disturbances in the regression model. S_E is just a descriptive
measure that tells you how much variability there is around the
regression line. It might help to think of it like this-- suppose you
have:
y_i = \beta_0 + \beta_1 * x_i + \epsilon_i, i=1,...,n
and you get estimates \hat{\beta}_0 and \hat{\beta}_1 of the intercept
and slope parameter. These two things determine the regression line
through the data which is (oftentimes) a reasonable and concise way to
summarize the information in a scatterplot of y on x.
Being a summary, the regression line alone doesn't capture some of the
important aspects of the relationship between y and x. In particular,
it doesn't allow us to say anything about the variability around the
regression line. Looking at S_E provides exactly this information--
how much variability there is around the regression line. Larger
values of S_E imply that the distribution of residuals has greater
variance.
Suppose you've never seen a scatterplot of x and y. If I tell you the
values of \hat{\beta}_0, \hat{\beta}_1 and S_E from a regression of y
on x you will be able to more accurately reproduce the scatterplot of
y on x than if I just tell you \hat{\beta}_0, \hat{\beta}_1.
It seems that a common theme running through many of the questions to
the list and some of the questions in class yesterday is that there
should be objectively determined cutoff values of S_E, R^2, etc. that
allow one to make sharp decisions that guide one's data anlysis-- "if
R^2 is above some number one should believe the results", etc. I can't
emphasize enough that this sort of thing is *extremely bad practice*.
This is what Cleveland discussed as "rote data analysis". Data
analysis really does involve a great deal of artistry. The two things
we can be assured of are that inferences depend upon assumptions and
that the necessary assumptions almost always don't exactly hold. The
real question is whether the assumptions are close enough to being
true so that our inferences are not wildly misleading.
We haven't talked about how one goes about examining assumptions too
much yet (although Cleveland deals with this a fair amount). Once we
get multiple regression under our belts then the bulk of the remainder
of the course will deal with diagnostics and model checking. All of
this will come together quite a bit more over the next few weeks.
Hope this helps.
Best,
Kevin
Hi guys-
We will be having optional section this week for folks coming in without a
strong matrix algebra background. What I plan to cover:
-rank
-orthogonality
-linear independence
-the calculation of an inverse
-working with matrices in R
If you feel confident that you understand these concepts well enough to do
the problem set, you're free to spend your time on other things.
Note that those who skip will miss the "matrix algebra quiz bowl" (they
had Halloween candy on half off tonight at Walgreen's...) Practicing is
by far the best way for matrix algebra to become intuitive!
For those worried about the 6pm start for Gary's party, I plan for section
to run approx. 1 hour this week. So you should be out with plenty of time
to reach Gary's place.
Office hours as usual, after section until 6pm on Tuesdays and Thursdays.
Alison
Hi,
I'm sure I'm missing basic here, but how do I get for Anscombe data into
R (for problem 3)? If i just do data(anscombe), I get data that looks
*nothing* like Table 5.1. It doesn't say what library to load and I don't see
the data read for download on the website. Any ideas?
Thanks,
Becky