Dear teachers,
Can you please contrast how different measures describe our level of confidence
in an estimated coefficient (say, psi), and when each should be used? The
measures I am thinking of include:
1. standard error of psi in a single year
2. standard error of mean psi (averaged over many years)
3. 95% confidence interval for psi in a single year
4. 95% confidence interval for mean psi (averaged over many years)
5. variance of psi in a single year
6. variance of mean psi (aggregated over all years)
7. any other important measures for confidence that I'm forgetting...
For example, in the second midterm I calculated and plotted a separate
confidence interval for each year of psi, but I gather that the top 7 exams did
not all do this. Why not? Now that I'm starting to get the hang of running a
regression, I'm trying to get a better handle on describing and interpreting
the reliability of my results.
These questions about interpreting confidence in results touch on the
substantive interpretation asked for in HW 7, part 1(b). Since there was no
answer provided for 1(b), explanation of this interpretative stuff would be
particularly helpful.
Thanks,
Anna
--
Anna Lorien Nelson
Department of Government,
Harvard University
alnelson(a)fas.harvard.edu
Is there a way to set na.strings within a dataframe I've already loaded?
I'm dealing with some data that has different NA values for each variable
so I need to set it individually.
Any help would be appreciated.
Thanks,
Andrew
The following question got "lost" in problem set 8.
\item One of the purposes of GOV 1000 is to teach you how to express
your quantitative ideas, both in words and in pictures. For the
later, \texttt{R}'s ``lattice'' graphics are a very powerful tool.
You can find a very nice introduction to lattice grahics in the June
2002 edition of ``R News,'' available at:
http://cran.r-project.org/doc/Rnews. Using lattice, create a figure
that shows the histogram of the percentage Democratic vote across
districts for each ``0'' year from 1900 through 1990. In other
words, you will be showing 10 individual histograms, one for each
decade-ending year. Make the figure legible and visually appealing.
Of course, this is not a particular interesting figure, but we
wanted to introduce you to lattice in the simplest way possible. If
you are feeling adventurous, you are free to provide something more
interesting. One ambitious choice would be to show ten scatter plots
(one for each decade year) and fitted regression lines of Democratic
percentage of the vote versus lagged percentage, as GK do in their
Figure 1.
This matters because we expect you to be able to use lattice graphics
on the final. Tao will be updating the anwers for problem set 8 in due
time. In the meantime, if any wanted to give a shot at this for the
class list, I know that Tao would appreciate it.
Dave
--
David Kane
Lecturer in Government
617-563-0122
dkane(a)latte.harvard.edu
Does anyone happen to know a good definition of the "simultaneity issue"? It
comes up in a few of the readings. And how is it related to the endogeneity
problem?
Many thanks.
Best,
Dan
A couple of people have asked for overviews of the material in the QR
33 handouts. Here are three good articles, all on JSTOR, that provide
an overview of the "right" way --- or at least the way that was
emphasized in this class --- for thinking about causal effects.
Although there are sections of each article that are somewhat
advanced, all are worth reading. The tough sections (e.g., Bayesian
stuff) involve material that many of you will see in GOV 2001.
Statistics and Causal Inference (in Theory and Methods)
Paul W. Holland
Journal of the American Statistical Association, Vol. 81, No. 396. (Dec., 1986), pp. 945-960.
Stable URL: http://links.jstor.org/sici?sici=0162-1459%28198612%2981%3A396%3C945%3ASACI…
Practical Implications of Modes of Statistical Inference for Causal Effects and the Critical Role of the Assignment Mechanism
Donald B. Rubin
Biometrics, Vol. 47, No. 4. (Dec., 1991), pp. 1213-1234.
Stable URL: http://links.jstor.org/sici?sici=0006-341X%28199112%2947%3A4%3C1213%3APIOMO…
Bayesian Inference for Causal Effects: The Role of Randomization
Donald B. Rubin
Annals of Statistics, Vol. 6, No. 1. (Jan., 1978), pp. 34-58.
Stable URL: http://links.jstor.org/sici?sici=0090-5364%28197801%296%3A1%3C34%3ABIFCET%3…
Apologies for not coming up with these earlier in the semester, but I
have only recently gained access to the wonder of JSTOR.
Dave
--
David Kane
Lecturer in Government
617-563-0122
dkane(a)latte.harvard.edu
Ryan Thomas Moore writes:
> Dave:
>
> I removed the few Rdata files (Palm and GG related stuff), restarted R,
> and I still don't get the ctest package when I start R in my
> ~/fall02/gov1000 directory. I still do get the ctest package when I start
> R in my ~ directory. Any other ideas on how I can use R in my gov1000
> directory, and still have ctest accessible?
Good question for the list.
cd to the ~/fall02/gov1000 directory
Type pwd
Type ls -al
Start R (from the command prompt)
Type ls() from the R prompt.
Type search()from the R prompt.
Copy and paste the output from all the above as an e-mail to the list.
Dave
> Thanks in advance,
> Ryan
>
> ------------------------------------------
> Ryan T. Moore ~ Government & Social Policy
> Ph.D. Candidate ~ Harvard University
>
> On Sun, 24 Nov 2002, Dave Kane wrote:
>
> > Good question for the list.
> >
> > Hmmm.
> >
> > By deduction, there must be something different about the two
> > directories. You can read about the messy details about how R starts
> > up here:
> >
> > > help(.Rprofile)
> >
> >
> > My best guess is that you have a messed up .Rdata file (or .Rprofile)
> > in your ~/fall02/gov1000 directory. Look for them (or anything else
> > weird) with ls -a. Delete them if you find them. Then try
> > restarting. That should work.
> >
> > Dave
> >
> > Ryan Thomas Moore writes:
> > > Dave:
> > >
> > > I restarted R, and within my ~/fall02/gov1000 directory, I still don't
> > > have the ctest package. But, if I start R in my home (/rmoore) directory,
> > > the package:ctest does appear. I'd rather use R in the ~/fall02/gov1000
> > > directory if I can, but if not, I'm ok with using it in the home
> > > directory. In short, I have a viable solution, but is there any way I can
> > > get the package:ctest to appear in the /gov1000 directory I've created?
> > >
> > > Thanks!
> > > Ryan
> > >
> > > ------------------------------------------
> > > Ryan T. Moore ~ Government & Social Policy
> > > Ph.D. Candidate ~ Harvard University
> > >
> > >
> >
> > --
> > David Kane
> > Lecturer in Government
> > 617-563-0122
> > dkane(a)latte.harvard.edu
> >
>
>
--
David Kane
Lecturer In Government
617-563-0122
dkane(a)latte.harvard.edu
Please avoid sending me Word or PowerPoint attachments.
See http://www.fsf.org/philosophy/no-word-attachments.html
Dear Dave & Gary,
I would like to see at least one example from you that shows us in the kind of
language you expect us to use on the exam and in scholarly articles (and no
hand-waving!) 1. how to interpret coefficients (i.e. such that reasonable
people can agree that it's good or OK), 2. how to convey to the reader why we
chose a particular model. Otherwise, I think it is unreasonable to expect us to
use such language in the exam.
So, for 1., suppose through some legitimate process we obtained the following
model as summarized below. Could you give us an example of how a *good,
reasonable scholar* could interpret the coefficients in language suitable for
publication in a major academic journal? (If you think this is a really bad
model, feel free to use another one instead).
Thanks,
Phillip.
> summary(arf5)
Call:
lm(formula = dperc ~ dwin.lag2 + dwin.lag6 + dperc.lag4 + dperc.lag2 +
incumb, data = dog)
Residuals:
Min 1Q Median 3Q Max
-0.351964 -0.044175 0.003146 0.048644 0.361068
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.118519 0.008934 13.266 < 2e-16 ***
dwin.lag2 -0.039145 0.009203 -4.254 2.18e-05 ***
dwin.lag6 -0.025457 0.004857 -5.241 1.74e-07 ***
dperc.lag4 0.320400 0.022001 14.563 < 2e-16 ***
dperc.lag2 0.505499 0.023522 21.490 < 2e-16 ***
incumb 0.049977 0.004320 11.569 < 2e-16 ***
---
Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1
Residual standard error: 0.07459 on 2341 degrees of freedom
Multiple R-Squared: 0.7453, Adjusted R-squared: 0.7448
F-statistic: 1370 on 5 and 2341 DF, p-value: < 2.2e-16
-------------------------------------------------
Phillip Y. Lipscy
Perkins Hall Room #129
35 Oxford Street
Cambridge, MA 02138
(617)493-4893
lipscy(a)fas.harvard.edu
Ph.D. Candidate
Harvard University, FAS, Department of Government
-------------------------------------------------
Here are some brief comments on the readings.
1) There are 2 methodology readings (Leamer and McCloskey). The Leamer article
is the best methodology article I have ever read --- and I read a lot of
methodology. Both are valuable, both for framing the discussion on Monday
and for use in your final exams.
2) The other three articles are somewhat verbose. In an ideal world, you would
all have time to read all three very closely. On the off chance that we are
not living in the best of all possible worlds, despite what Dr. Pangloss
tells me, you should (obviously) focus closely on the article that you are
expected to critique. Reading the abstract, introduction, conclusion and
regression result sections of the other two articles will give you enough
background to follow the discussion.
3) The articles occasionally mention things (3 stage least squares, probit, and
so on) that we have not mentioned in class. That's OK. Feel free to skip
those parts.
4) If you have any questions are comments on the papers over the week-end,
please send them to the list. Example: Why do Alt et al misuse
"multicollinearity"? I think it would be useful to get the discussion going
before Monday.
Back to tending my garden,
Dave
--
David Kane
Lecturer In Government
617-563-0122
dkane(a)latte.harvard.edu
Please avoid sending me Word or PowerPoint attachments.
See http://www.fsf.org/philosophy/no-word-attachments.html
Dear All,
About 1b. I'm not sure what to make of the correlation between P_8 and v_8.
Following your example on the list serve, we've run several regressions with
different combos of the variables, and we've found that the coefficients vary
considerably. But given that these are both controls that we don't really care
about them, are we mainly concerned about what happens to incumb when we do
this? But in that case, taking out either of the variables merely gives us
biased results (G/K), right? So what are we looking for?
-Phillip.
> arf <- lm(dperc ~ dperc.lag2 + dwin.lag2 + incumb, data = dog)
> summary(arf)
Call:
lm(formula = dperc ~ dperc.lag2 + dwin.lag2 + incumb, data = dog)
Residuals:
Min 1Q Median 3Q Max
-3.809e-01 -4.764e-02 -3.825e-06 4.994e-02 2.996e-01
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.158014 0.008646 18.275 < 2e-16 ***
dperc.lag2 0.719269 0.018685 38.495 < 2e-16 ***
dwin.lag2 -0.038345 0.009417 -4.072 4.82e-05 ***
incumb 0.049108 0.004507 10.897 < 2e-16 ***
---
Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1
Residual standard error: 0.07789 on 2343 degrees of freedom
Multiple R-Squared: 0.722, Adjusted R-squared: 0.7216
F-statistic: 2028 on 3 and 2343 DF, p-value: < 2.2e-16
> cor(dog$dwin.lag2, dog$dperc.lag2)
[1] 0.8051607
> summary(lm(dperc ~ dperc.lag2 + incumb, data = dog))
Call:
lm(formula = dperc ~ dperc.lag2 + incumb, data = dog)
Residuals:
Min 1Q Median 3Q Max
-3.969e-01 -4.835e-02 -2.247e-05 5.131e-02 3.145e-01
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.156361 0.008666 18.04 <2e-16 ***
dperc.lag2 0.685660 0.016819 40.77 <2e-16 ***
incumb 0.034132 0.002613 13.06 <2e-16 ***
---
Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1
Residual standard error: 0.07815 on 2344 degrees of freedom
Multiple R-Squared: 0.72, Adjusted R-squared: 0.7198
F-statistic: 3014 on 2 and 2344 DF, p-value: < 2.2e-16
> summary(lm(dperc ~ dwin.lag2 + incumb, data = dog))
Call:
lm(formula = dperc ~ dwin.lag2 + incumb, data = dog)
Residuals:
Min 1Q Median 3Q Max
-0.3875738 -0.0686709 0.0004885 0.0651227 0.3589614
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.444149 0.005642 78.720 <2e-16 ***
dwin.lag2 0.121790 0.010792 11.285 <2e-16 ***
incumb 0.054079 0.005755 9.398 <2e-16 ***
---
Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1
Residual standard error: 0.0995 on 2344 degrees of freedom
Multiple R-Squared: 0.5462, Adjusted R-squared: 0.5458
F-statistic: 1410 on 2 and 2344 DF, p-value: < 2.2e-16
Dear Colleagues,
One quick notional question about 3c. We are asked to derive beta_1IV and
beta_1IISLS. On page 2 of the handout from section, though, we learn that "the
IISLS estimator coincides with the IV estimator," which seems to tell us that
the two are in fact the same. Or do I misread? It seems clear that we are
going to do two regressions, producing a few different betas. But which beta
is the IISLS, and is the IISLS estimate of beta always equal to the IV
estimate?
Many thanks.
Best,
Dan