I think that I may have mis-e-mailed on the topic of this week's
readings. According to the updated syllabus on the web site, we were
supposed to have done Neter chapters 1-5 for last week, 6,7, 11 for
this coming Monday and 8,9,10 for the 4th. I will also be adding an
article or two.
Sorry for any confusion.
Dave
--
David Kane
Lecturer in Government
617-563-0122
dkane(a)latte.harvard.edu
I'll be out tricker treating with lovely Belle (of Beauty and the Beast fame)
and Cinderella and then entertaining until who knows when.
But, it looks like people are well enough along on the problem set.
Please answer each others questions on the list and in person.
Please do not kill yourself on this. If you reach the 15 hour mark, and you are
not done, then you should just stop.
Do not worry if you can not replicate the GK results. I can't either. With
luck, we will iterate toward the truth next week.
For anyone who cares, here is what I get for the first few years.
> test[1:10,]
year advantage
1 1900 -0.0011
2 1904 0.0078
3 1906 -0.0018
4 1908 -0.0090
5 1910 -0.0126
6 1914 0.0075
7 1916 -0.0064
8 1918 0.0081
9 1920 0.0108
10 1924 0.0048
>
Dave
--
David Kane
Lecturer In Government
617-563-0122
dkane(a)latte.harvard.edu
okay, so its not a problem, but i guess what i'm wondering is just for clarification whether or not including demperc84 and incumbency 86 (Pearson's r = .768) will cause you to underestimate the incumbency effect or will it have some other effect?
Dear Friends,
We are in the process of creating a new variable to indicate the party of the
winner in the 1986 election. We have told the computer to tell us when the
percentage of Democratic votes is greater than .5, so now we have a variable
which is "TRUE" when the Democratic candidate won. Can we regress on a
variable that's range is TRUE and FALSE, or do we need to convert it to a
numerical variable? If the latter, any advice as to how?
Best,
Dan
Dear Dave,
After running your code to find the values of psi and trying to replicate GK
Figure 2, we found that our values for psi are very different from those
reported by GK. For example, psi for 1990 comes out as 0.013, and psi 1948 is
0.019:
> arf <- one.year(clean, 1990)
> summary(arf)
Call:
lm(formula = dpct ~ dpct.old + dwin + incumb, data = x)
Residuals:
Min 1Q Median 3Q Max
-0.153638 -0.043165 -0.006808 0.042711 0.251270
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.236672 0.018930 12.503 < 2e-16 ***
dpct.old 0.436993 0.039420 11.086 < 2e-16 ***
dwin 0.115458 0.016555 6.974 2.01e-11 ***
incumb 0.013574 0.009081 1.495 0.136
---
Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1
Residual standard error: 0.06727 on 296 degrees of freedom
Multiple R-Squared: 0.8247, Adjusted R-squared: 0.8229
F-statistic: 464.1 on 3 and 296 DF, p-value: < 2.2e-16
> arf <- one.year(clean, 1948)
> summary(arf)
Call:
lm(formula = dpct ~ dpct.old + dwin + incumb, data = x)
Residuals:
Min 1Q Median 3Q Max
-0.2018891 -0.0266436 0.0008216 0.0264690 0.1532010
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.221180 0.014100 15.687 < 2e-16 ***
dpct.old 0.616474 0.031057 19.850 < 2e-16 ***
dwin 0.066049 0.007194 9.181 < 2e-16 ***
incumb 0.018888 0.004394 4.299 2.28e-05 ***
---
Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1
Residual standard error: 0.04622 on 320 degrees of freedom
Multiple R-Squared: 0.8765, Adjusted R-squared: 0.8753
F-statistic: 756.7 on 3 and 320 DF, p-value: < 2.2e-16
----
However, this clearly doesn't fit well with Figure 2, where psi1990 is higher
than psi1948. (we used your code to get these results, so you should be able
to replicate it). When we use a loop to graph these results, we get something
that looks broadly similar to Figure 2 but different on exact values.
We suspect that this is because GK coded their parameters differently from us.
i.e, in the definition of party victory, they include the case where one party
has vote total > 0 and the other party has NA. (see the code book). This
would change the parameter results we get for psi. we're not sure if there
might be other factors driving this result. Should we try to account for this,
or should we report our results based on what we have?
The ps file of our results is attached.
Thanks,
Phillip.
For reference, here is the code for my loop:
superloop <- function(a){
psi <- data.frame(year = 1900:1990, psi = NA)
for(yr in seq(1900,1990,by = 2)){
if(! yr %in% c(1902,1912,1922,1932,1942,1952,1962,1972,1982)){
this.year <- a[a$year %in% c(yr),]
last.year <- a[a$year %in% c(yr - 2),]
x <- merge(this.year[c("state", "district", "dpct", "incumb", "dwin")],
last.year[c("state", "district", "dpct")], by = c("state", "district"),
suffixes = c("", ".old"))
object <- lm(dpct ~ dpct.old + dwin + incumb, data = x)
psi$year[psi$year == yr] <- yr
psi$psi[psi$year == yr] <- object$coefficients[4]
}}
return(psi)
}
> lol <- superloop(clean)
> lol
year psi
1 1900 -0.001121156
2 1901 NA
3 1902 NA
4 1903 NA
5 1904 0.007834282
6 1905 NA
7 1906 -0.001820890
8 1907 NA
9 1908 -0.008986478
10 1909 NA
11 1910 -0.012557491
12 1911 NA
13 1912 NA
14 1913 NA
15 1914 0.007483021
16 1915 NA
17 1916 -0.006402552
18 1917 NA
19 1918 0.008074224
20 1919 NA
21 1920 0.010826889
22 1921 NA
23 1922 NA
24 1923 NA
25 1924 0.004814250
26 1925 NA
27 1926 -0.005773500
28 1927 NA
29 1928 0.003498337
30 1929 NA
31 1930 0.011895348
32 1931 NA
33 1932 NA
34 1933 NA
35 1934 -0.011678172
36 1935 NA
37 1936 -0.013364340
38 1937 NA
39 1938 -0.001759503
40 1939 NA
41 1940 -0.003771221
42 1941 NA
43 1942 NA
44 1943 NA
45 1944 0.023622260
46 1945 NA
47 1946 -0.014708913
48 1947 NA
49 1948 0.018887876
50 1949 NA
51 1950 0.001349521
52 1951 NA
53 1952 NA
54 1953 NA
55 1954 0.009512017
56 1955 NA
57 1956 -0.007121092
58 1957 NA
59 1958 0.024744267
60 1959 NA
61 1960 0.008981457
62 1961 NA
63 1962 NA
64 1963 NA
65 1964 0.016531133
66 1965 NA
67 1966 0.035415505
68 1967 NA
69 1968 0.023452892
70 1969 NA
71 1970 0.024081422
72 1971 NA
73 1972 NA
74 1973 NA
75 1974 0.057502703
76 1975 NA
77 1976 0.019737726
78 1977 NA
79 1978 0.030968983
80 1979 NA
81 1980 0.020595827
82 1981 NA
83 1982 NA
84 1983 NA
85 1984 0.043275505
86 1985 NA
87 1986 0.019672685
88 1987 NA
89 1988 0.038630039
90 1989 NA
91 1990 0.013574458
> newlol <- lol[! lol$psi %in% c(NA),]
> plot(newlol, type = "l")
(see the ps file!)
Dear Friends,
The predict function is predictably problematic. When we tell it to give us a
single value for v_1 given the specifications below, it instead gives us a
prediction for every single distrct:
newdt <- data.frame(c(d.perc.84 = .45 , party2 = TRUE , Incumbent68 = 0))
> predict(lmprob1b, newdt, int = "predict", level = 0.9)
fit lwr upr
1 0.7205660 0.52902307 0.9121089
2 0.7065463 0.51492419 0.8981684
3 0.6903992 0.49865573 0.8821427
4 0.7626190 0.57116722 0.9540708
5 0.8102611 0.61864699 1.0018752
6 0.3586044 0.16674447 0.5504644
7 0.3186548 0.12698011 0.5103294
8 0.38
Thoughts? Many thanks.
Best,
Dan
As soon as the top 5 send in the exams to Tao (perhaps already done) and Tao posts
them.
Can you give us an update Tao?
Dave
Anna Lorien Nelson writes:
> Hey Dave,
>
> Since this class moves pretty sequentially, each week building on the last, it
> would be a big help to see the top 5 midterms as soon as possible. When will
> they be posted online?
>
> Thanks,
> Anna
>
> --
> Anna Lorien Nelson
> Department of Government,
> Harvard University
> alnelson(a)fas.harvard.edu
>
>
--
David Kane
Lecturer In Government
617-563-0122
dkane(a)latte.harvard.edu
I have a new document, and I've compiled it a few times. When I try
xdvi <filename> &
to view it, I get "no such file or directory" back. I can see
<filename>.tex
in my directory, though. Any thoughts?
Ryan
------------------------------------------
Ryan T. Moore ~ Government & Social Policy
Ph.D. Candidate ~ Harvard University
Olivia Lau writes:
> Yes, I was. I don't know why it gave me 900+ rows when I tried to subset
> this it gave me all the 3's and NA. Sorry!
That's OK. The purpose of this list is precisely to facilitate a sort of
virtual data-poking-around exercise. More questions, more quickly is more
better. Of course, I will not always respond that quickly (time for dinner and
story time (Harry Potter III) with my little darlings right now, for
example. But I hope that other people are working on the problem set as well
. . .
This is anoter reason why NA's can be so dangerous. In general, a better way to
subset is to use the %in% operator or the subset() function. These will handle
NA's correctly. See my clean dataset from last week. Or:
> y <- read.table("da6311_LREC.yr982",
+ col.names = c("state", "district", "incumb", "dem", "rep"),
+ na.strings = "-9")
> table(y$incumb)
-1 0 1 3
161 63 203 5
> summary(y$incumb)
Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
-1.00 -1.00 0.00 0.13 1.00 3.00 3.00
> y[y$incumb == 3,] # Grabs the 5 3's and the 3 NA's.
state district incumb dem rep
NA NA NA NA NA NA
X12 3 4 3 121802 82804
X53 13 14 3 51728 67626
X61 13 22 3 73124 92266
X199 34 4 3 96388 79565
X209 37 98 3 142122 133530
NA1 NA NA NA NA NA
NA2 NA NA NA NA NA
> y[y$incumb %in% c(3),] # Grabs just the 3's.
state district incumb dem rep
12 3 4 3 121802 82804
53 13 14 3 51728 67626
61 13 22 3 73124 92266
199 34 4 3 96388 79565
209 37 98 3 142122 133530
> subset(y, incumb %in% c(3)) # Also works
state district incumb dem rep
12 3 4 3 121802 82804
53 13 14 3 51728 67626
61 13 22 3 73124 92266
199 34 4 3 96388 79565
209 37 98 3 142122 133530
> subset(y, incumb == 3) # Works too.
state district incumb dem rep
12 3 4 3 121802 82804
53 13 14 3 51728 67626
61 13 22 3 73124 92266
199 34 4 3 96388 79565
209 37 98 3 142122 133530
>
This last one works because subset() is smart enough to know that you don't
want NA's.
Dave
> Olivia.
>
> On Tue, 29 Oct 2002 dkane(a)latte.harvard.edu wrote:
>
> > Olivia Lau writes:
> > > Hi,
> > >
> > > So I looked at Gary King's codebook and it says that when two incumbents
> > > (one democrat and one republican) were both running in the same election,
> > > that he coded this as "3". So going back to the discussion in class,
> > > (about whether or not we should recode the incumbency variable) what
> > > should I do? There are 968 instances (out of the total dataset) of
> > > incumbency = 3. Should I subset this data out?
> >
> > For now, we are just trying to replicate GK. From their write up, it seems
> > clear that they feel that incumbency advantage is not defined in such
> > situations. So, you should delete those rows.
> >
> > Note that I only see 45 of these. Is one of us missing something?
> >
> > > table(x$incumb)
> >
> > -1 0 1 3
> > 7546 2988 9481 45
> > > dim(x)
> > [1] 20954 6
> > >
> >
> > Dave
> >
> > > Thanks,
> > >
> > > Olivia.
> > >
> > >
> > > _______________________________________________
> > > gov1760-l mailing list
> > > gov1760-l(a)fas.harvard.edu
> > > http://www.fas.harvard.edu/mailman/listinfo/gov1760-l
> > >
> >
> > --
> > David Kane
> > Lecturer In Government
> > 617-563-0122
> > dkane(a)latte.harvard.edu
> >
>
>
--
David Kane
Lecturer In Government
617-563-0122
dkane(a)latte.harvard.edu
I think this command just really doesn't like me. It tells me that it
can't find one of the objects I've called in my regression, BUT I've
cleaned the subsetted dataset so that it doesn't include any NAs. how can
it not be finding it??
Ergh. What am I doing wrong?
Thanks,
Olivia.