> so we aren't supposed to be doing any regressions at all in problem
> 1 - just looking at the data and doing whatever we can to make the
> relationship more linear?
>
Yes, for problem 1 you don't need to fit any linear regression models.
You could just eyeball the necessary bivariate relationships and
transform them to linear relationships if the original untransformed
relationships are not linear. If the original relationships look
pretty close to linear you can just report that fact. My suggestion of
using a loess curve is not necessary, but it might help you discern
empirical relationships in some cases.
Hope this helps.
Best,
Kevin
> so we aren't supposed to be doing any regressions at all in problem
> 1 - just looking at the data and doing whatever we can to make the
> relationship more linear?
>
Hi Everyone,
Here is the answer to Vipin's question about the possibility of
getting arbitrarily good fits to the data (as judged by R^2 and size
of t-statistics) by choosing a particular set of weights for WLS.
This is something that it is possible to do. For instance, suppose you
have n observations and k columns in X. Then give k arbitrarily chosen
observations weight 1 and the remaining observations weight very very
close to 0. WLS in this case is basically the same thing as OLS
applied to only the observations with weight 1. Since you have as many
columns in X as observations you will get an R^2 of 1 and huge
t-statistics.
For example:
> x <- rnorm(100)
> y <- rnorm(100)
> w <- c(1, 1, rep(1e-10, 98))
> ols.out <- lm(y~x)
> wls.out <- lm(y~x, weights=w)
> summary(ols.out)
Call:
lm(formula = y ~ x)
Residuals:
Min 1Q Median 3Q Max
-2.52978 -0.77233 0.02590 0.64279 2.05652
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.0003379 0.0969588 0.003 0.997
x 0.0708301 0.0921538 0.769 0.444
Residual standard error: 0.9696 on 98 degrees of freedom
Multiple R-Squared: 0.005992, Adjusted R-squared: -0.004151
F-statistic: 0.5908 on 1 and 98 DF, p-value: 0.444
> summary(wls.out)
Call:
lm(formula = y ~ x, weights = w)
Residuals:
Min 1Q Median 3Q Max
-4.653e-05 -1.233e-05 -3.695e-07 8.839e-06 5.572e-05
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.637e-01 1.431e-05 11441 <2e-16 ***
x -1.209e+00 2.493e-05 -48497 <2e-16 ***
---
Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1
Residual standard error: 1.678e-05 on 98 degrees of freedom
Multiple R-Squared: 1, Adjusted R-squared: 1
F-statistic: 2.352e+09 on 1 and 98 DF, p-value: < 2.2e-16
The moral of the story is that if you have to use WLS you should
really believe (based on a priori knowledge combined with diagnostic
plots) that your weights are sensibly related to the error variance
and that observations that are getting close to 0 weight really are so
extreme that it is worth basically tossing them out of the dataset.
When in doubt, White's var-cov estimator is a pretty safe way to get
standard errors. Reporting several sets of results (OLS, WLS, and OLS
with White SEs) is also a good way to convince readers that your
results are not just the artifact of a crazy weighting decision.
Hope this helps.
Best,
Kevin
------------------------------------------------------
Kevin Quinn
Assistant Professor
Department of Government and
Center for Basic Research in the Social Sciences
34 Kirkland Street
Harvard University
Cambridge, MA 02138
Just a reminder that the final papers are due JANUARY 3 AT 5PM.
Note that the papers are no longer due on the last day of class. This
will give you two extra weeks to complete the paper.
Best,
Kevin
------------------------------------------------------
Kevin Quinn
Assistant Professor
Department of Government and
Center for Basic Research in the Social Sciences
34 Kirkland Street
Harvard University
Cambridge, MA 02138
Hi, everyone. Many people have begun to think about working on Gov1000 or
other course papers over the winter break. For those who may not have
fast internet connections at home, working on your local machine may be
preferable at times to working on the remote XDesktop.
Some students have downloaded R and XEmacs for their local machine, in
order to use these programs without being connected. If you want to do
so, see the 2-page hyperlinked guide at
http://www.courses.fas.harvard.edu/~gov2001/handouts/Windows/Xemacs.pdf
If anyone needs help with this process, let me know and I'm happy to get
your laptop set up. Others in the class might be able to help as well.
Best,
Ryan
------------------------------------------
Ryan T. Moore ~ Government & Social Policy
Ph.D. Candidate ~ Harvard University
Homepage: http://www.people.fas.harvard.edu/~rtmoore/
Gov1000: http://www.courses.fas.harvard.edu/~gov1000/