Hi Everyone,
Revised lecture slides are now available on the course website.
Best,
Kevin
------------------------------------------------------
Kevin Quinn
Assistant Professor
Department of Government and
Center for Basic Research in the Social Sciences
34 Kirkland Street
Harvard University
Cambridge, MA 02138
Hi Everyone,
We just realized that January 17 is MLK Day. As a result, the final
exam will now be due at 5pm on Tue. January 18.
Best,
Kevin
------------------------------------------------------
Kevin Quinn
Assistant Professor
Department of Government and
Center for Basic Research in the Social Sciences
34 Kirkland Street
Harvard University
Cambridge, MA 02138
Hi guys,
I don't seem to be able to figure out a way of getting the graphics output
to print to a file I could then use, when there is more than one page in
the R graphics device.
How did you all resolve this?
Thansk
Lucy
Hi guys-
Just wanted to send a follow-up to my last email, because I feel it wasn't
entirely clear.
->"What is meant by a disturbance or error term?"
Suppose have an underlying data generation process that we assume takes
the following form:
y=Xb + e
Xb represents the systematic component and e represents the
stochastic or random components of y. We assume that e--our
disturbances or error--are normally distributed and centered at zero.
In modeling, we can't observe e directly and can only infer its
distribution by viewing the residuals (y-yhat). To assess the assumption
that our errors are normally distributed, we compare the distribution of
the studentized residuals from our regression with a t distribution. We
use the t rather than the normal as a reference distribution to compensate
for the fact that we are working with residual, an estimate of the
disturbance, rather than the actual disturbances.
Cheers-
Alison
See questions below on "disturbances""
-in the regression model:
y = Xb + e
where yhat = Xb
The disturbances, or errors, represent the difference between Xb (yhat)
and y. In the real world, we do not know the true errors or disturbances.
We simply have estimates of them, in the form of the regression residuals.
(This is because we have a sample of y values from some distribution
rather than a set of fixed y's.)
One of our OLS regression assumptions is that the disturbances follow a
normal distribution that is centered at zero. This means that the
E[error] = 0. To evaluate whether or not this assumption holds, we look
at our residuals. These are not exactly the same as the true
disturbances, but the best material we have to work with. This is why we
are comparing the distribution of the residuals with a t distribution
rather than a normal--to reflect this added uncertainty.
I looked at the distribution of the y values in the handout example after
looking at the distribution of the residuals in the qq.plot because I
wanted to see if a skew in the distribution of the y values might be
partially responsible for the skew in the residual distribution. You can
see how the two are linked directly from the equation above.
On White's estimator, see p. 305 in Fox. It is calculate different,
robust standard errors for the coefficients using the residuals from the
regression. White's method does not calculate new regression
coefficients.
The last homework assignment will be due the last day of lecture.
Good luck!
Alison
> Hi Alison,
> I have a couple questions on for the material this week:
> 1. What, exactl, does "non-normality of disturbances" mean? It seems
> that we are checking to make sure that the y variable follows the
> t-distribution by looking at qq plots and histograms. Is this right? Or
> what is meant by a "disturbance"? I am getting a litle tripped up on the
> vocab.
>
> 2. What does the White heteroscedasticity consistent covariance estimator
> do exactly? In the homework I used it to recalculate the standard errors
> of the regression coefficient estimates. Does using the White method not
> alter the actual regression coefficient estimates? I see that the White
> method is
> used to correct the estimate of the variance of our regression
> coefficient estimates - is the variance in our regression coefficient
> estimates measured by the standard errors?
>
> Oh, one more question: what is the due date of our
> last homework?
>
For 3B you will want 6 graphs. The first 3 will have the square root
of the absolute value of the residuals from the regression with
interlocks as the dependent variable on the y axes and assets,
sqrt(assets), and the fitted values of interlocks from the regression
of interlocks on the X axes. The second 3 will have the square root of
the absolute value of the residuals from the regression with
sqrt(interlocks +1) as the dependent variable on the y axes and
assets, sqrt(assets), and the fitted values of sqrt(interlocks + 1)
from the regression of sqrt(interlocks + 1) on the X axes.
Hope this helps.
Best,
Kevin
> Are we only supposed to have two graphs for 3B? This is the way I
> read it, but I want to make sure. This is what I have:
Hi Becky-
When looking for nonnormality of the disturbances, we want to look at the
distribution of the studentized residuals (this is what the qq.plot
function does). Viewing a histogram of the residuals (and, at times, the
y values, if we think skewed y values may be inducing nonnormality in the
residuals) can help us identify multimodality in the residuals.
There is generally no need to look at the distribution of our individual
x variables, though. You might look to Fox's example analysis p. 298-299.
good luck-
Alison
On Sun, 5 Dec 2004, Rebecca Marie Nelson wrote:
> Hi Alison,
> I was looking at histogram plots of the x variables..... are we just
> concerned with the Y variable in non-normality of disturbances?
> Becky
>
> On Sun, 5 Dec 2004, Alison Elizabeth Post wrote:
>
> > Becky-
> >
> > I'm a little confused about how you're going about this.
> >
> > Generally, we generate a qq.plot of the residuals from the lm output:
> >
> > qq.plot(lm.out)
> >
> > So this tells us about the distribution of the residuals generally. It
> > does not relate the distribution of these residuals to a particular
> > explanatory variable like pop15. Are you instead plotting particular x
> > variables using the qq.plot function?
> >
> > A
> >
> > On Sun, 5 Dec 2004, Rebecca Marie Nelson wrote:
> >
> > > Hi,
> > > When correcting for non-normailty of disturbances, I understand how to
> > > correct for skewness. I do not understand how to correct for
> > > multi-modality, which it looks like pop15 has. Are we supposed to correct
> > > for this problem this week or just recognize that it is a problem and we
> > > will correct for it later?
> > > Thanks,
> > > Becky
> > > _______________________________________________
> > > gov1000-list mailing list
> > > gov1000-list(a)lists.fas.harvard.edu
> > > http://lists.fas.harvard.edu/mailman/listinfo/gov1000-list
> > >
> >
>
Hi,
When correcting for non-normailty of disturbances, I understand how to
correct for skewness. I do not understand how to correct for
multi-modality, which it looks like pop15 has. Are we supposed to correct
for this problem this week or just recognize that it is a problem and we
will correct for it later?
Thanks,
Becky
Hi, folks. Please note that the example I gave below is specifically a
shift from 0% to 10% on the assumed pop75 scale. Because the quadratic
curve is non-linear (by definition), a 10% increase in pop75 will
translate into different amounts of change in the dependent variable,
depending on the *levels* of pop75. For example, the shift from 10% to
20% in pop75 would be associated with a
[2*20 + 3*(20^2)] - [2*10 + 3*(10^2)] = 1240 - 320 = 920
unit change in the dependent variable. This is very different from the
320 unit change associated with a change from 0 to 10%. Sorry for any
confusion this (incomplete) example may have caused. This example should
now be easier to reconcile with Alison's note, as well.
Cheers,
Ryan
> Hi, Lucy. For illustration, let's assume the coefficient on pop75 is 2
> and on pop75^2 is 3, and assume that pop75 is measured in percentage
> points. Then, a 10%-point increase in pop75 would be associated with a
> 2*10 + 3*(10^2) = 320 unit increase in our dependent variable.
>
> Make sense?
>
> Ryan
>
> ------------------------------------------
> Ryan T. Moore ~ Government & Social Policy
> Ph.D. Candidate ~ Harvard University
>
> Homepage: http://www.people.fas.harvard.edu/~rtmoore/
> Gov1000: http://www.courses.fas.harvard.edu/~gov1000/
>
> On Sat, 4 Dec 2004, Lucy Clare Barnes wrote:
>
> > Hi guys,
> >
> > How can we interpret the coefficients/significance of the results of
> > models in which you have both eg x1 and x1^2 (as in lecture slide 390).
> > With simple numerical models it makes sense, but how should we best
> > interpret if we think that we need to include, say, both population over
> > 75 and population over 75 squared?
> >
> > thanks
> > Lucy
> > _______________________________________________
> > gov1000-list mailing list
> > gov1000-list(a)lists.fas.harvard.edu
> > http://lists.fas.harvard.edu/mailman/listinfo/gov1000-list
> >
> _______________________________________________
> gov1000-list mailing list
> gov1000-list(a)lists.fas.harvard.edu
> http://lists.fas.harvard.edu/mailman/listinfo/gov1000-list
>