New subject: Problem 1

1 Dec 2004

Hi Everyone,

Here is the answer to Vipin's question about the possibility of
getting arbitrarily good fits to the data (as judged by R^2 and size
of t-statistics) by choosing a particular set of weights for WLS.

This is something that it is possible to do. For instance, suppose you
have n observations and k columns in X. Then give k arbitrarily chosen
observations weight 1 and the remaining observations weight very very
close to 0. WLS in this case is basically the same thing as OLS
applied to only the observations with weight 1. Since you have as many
columns in X as observations you will get an R^2 of 1 and huge
t-statistics.

For example:

...
  x <- rnorm(100)
 y <- rnorm(100)
 w <- c(1, 1, rep(1e-10, 98))
 ols.out <- lm(y~x)
 wls.out <- lm(y~x, weights=w)
 summary(ols.out) 
Call:
lm(formula = y ~ x)

Residuals:
     Min       1Q   Median       3Q      Max
-2.52978 -0.77233  0.02590  0.64279  2.05652

Coefficients:
             Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.0003379  0.0969588   0.003    0.997
x           0.0708301  0.0921538   0.769    0.444

Residual standard error: 0.9696 on 98 degrees of freedom
Multiple R-Squared: 0.005992,   Adjusted R-squared: -0.004151
F-statistic: 0.5908 on 1 and 98 DF,  p-value: 0.444

...
  summary(wls.out) 
Call:
lm(formula = y ~ x, weights = w)

Residuals:
       Min         1Q     Median         3Q        Max
-4.653e-05 -1.233e-05 -3.695e-07  8.839e-06  5.572e-05

Coefficients:
              Estimate Std. Error t value Pr(>|t|)
(Intercept)  1.637e-01  1.431e-05   11441   <2e-16 ***
x           -1.209e+00  2.493e-05  -48497   <2e-16 ***
---
Signif. codes:  0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1

Residual standard error: 1.678e-05 on 98 degrees of freedom
Multiple R-Squared:     1,      Adjusted R-squared:     1
F-statistic: 2.352e+09 on 1 and 98 DF,  p-value: < 2.2e-16

The moral of the story is that if you have to use WLS you should
really believe (based on a priori knowledge combined with diagnostic
plots) that your weights are sensibly related to the error variance
and that observations that are getting close to 0 weight really are so
extreme that it is worth basically tossing them out of the dataset.

When in doubt, White's var-cov estimator is a pretty safe way to get
standard errors. Reporting several sets of results (OLS, WLS, and OLS
with White SEs) is also a good way to convince readers that your
results are not just the artifact of a crazy weighting decision.

Hope this helps.

Best,
Kevin

------------------------------------------------------
Kevin Quinn
Assistant Professor
Department of Government and
Center for Basic Research in the Social Sciences
34 Kirkland Street
Harvard University
Cambridge, MA  02138

WLS with bad weights