gov1000-list December 2002

gov1000-list@lists.fas.harvard.edu

19 participants
83 discussions

by Olivia Lau

If I'm running R in an ESS window, what's the command to flush output to the next prompt? I have this long output and it's taken 15 minutes so far and still going strong... thanks Olivia.

21 years, 5 months

Interpreting control variables

by ravishan＠fas.harvard.edu

A basic question: when reporting our results from the regression, should we be interpreting the coefficients of the control variables? If yes, how? When interpreting the key explanatory variable, we say something like a one unit increase in this variable results in blah blah controlling for variables X and Y. Can we say something similar about the control variables? - Nirmala -- Nirmala Ravishankar PhD Candidate, Government Department Harvard University Perkins Hall #210 35 Oxford Street Cambridge, MA 02138 Tel: (617) 493 3460

21 years, 5 months

interaction terms

by Gary King

Ok, suppose your model is E(Y|X) = a + bX + cZ, where Y is starting salary in dollars X is education in years Z is parent's income in dollars and so (if the assumptions hold), when X goes up by one _year_, E(Y|X) goes up by b holding constant Z. Ok, so now suppose you are worried that b is not constant over the observations as the above model assumes. In particular, suppose there is sex discrimination and so the effect of education is bigger for men then women. In that case, the above specification is wrong, and b is not the average causal effect of men and women. At best, the standard error of b will be too small because there is variation in it that is ignored by the model. At worst, if b and X are correlated over the observations, the least squares estimate of b will be biased (note that I'm asserting this and you haven't seen the proof, but its true). But in this case, we know the cause of the variation (according to our theory anyway); its sex. So let's model it. Let's write this equation, where I'm putting in "_i" to be the subscript i just to be clear what I'm talking about: E(Y_i|X) = a + b_iX_i + cZ_i, note the coefficient on X now varies over i. (forget for a moment that you have no idea how to estimate this; we'll fix that in a minute.) ok, so now let's add a 2nd equation: b_i = d + fS_i, where d and f are constant coefficients and S is sex (1 for males and 0 for females). this equation follows our new theory and lets b_i vary as a function of sex. So, in particular b_i=d for females and b_i=d+f for males. the 2 equations above are called the structural model. when we estimate it (i'll explain shortly) we get estimates only of a,c,d,f. Once you have those, its easy to interpret. When X (education) goes up by one year, Y (income) goes up by $d on average for females and $(d+f) for males. (Note that this can be easily extended to S representing a continuous variable too.) ok, so the only question left is how to estimate it. to do it, substitute the 2nd eqn into the first equation: E(Y_i|X) = a + b_iX_i + cZ_i, = a + (d+fS_i)X_i + cZ_i = a + dX_i + f(S_i*X_i) + cZ_i This is known as the reduced form equation. so we have an interaction, but ignore that. to estimate this equation you regress Y on a constant term, X, the product of S and X, and Z. this gives unbiased estimates of a, d, f, and c, which is what we need to interpret everything. So if someone asks you whether f is significant, a good answer is 'who cares?' Although it is true that f is the difference in the effects for females and for males, but that's besides the point. If you think hard about the regressions you run, you can come up with a lot of interesting effects by adding additional equations (which produces more complicated interactions of course). To intepret them ignore the reduced form and only look at the structural model. (To estimate, use the reduced form and ignore the structural model.) Gary

21 years, 5 months

coding for dem.win

by Olivia Lau

Dave and Tao: To clarify: Is dem.win supposed to be coded 0 or -1 for a Republican victory? This matters for several reasons: 1) GK footnote 10 and estimating the effect for Democrats and Republicans says to subtract gamma1 from gamma0, which only makes sense if Repubican is coded as -1. 2) It changes my coefficients in 1a and makes the regression results from factor analysis different from my other results (because factors are treated as 0 and 1 if there are only two of them). 3) I just recoded the darned variable for the third time and really don't want to do it again. All I'm asking is: Is it okay to use either as long as we specify which coding we're using? I think this is the most reasonable solution given the lateness of the hour and the confusion over the codings given in the GK paper, the emails, and the instructions to the problem set. Please let me know ASAP! Olivia.

21 years, 5 months

cleaning the data

by Olivia Lau

So, Nirmala's dataset has 2815 rows. Mine has 3070 rows. I think the problem is in the cleaning, not the pooling, so someone stop me where I'm going wrong: Load the data from the .txt files. Generate and append the Democratic percentage and party affiliation vars. Remove rows where democratic percentage is 0, 1, or NA, and remove rows where there is no/bad incumbency data (incumbency is NA or 3). Pool the data... What is everyone else removing that I'm not? Thanks. Olivia.

21 years, 5 months

by Dave Kane

ravishan(a)fas.harvard.edu writes: > > Dear Dave, All questions should be sent to the list for homeworks, unless there is a really good reason not to. > I am not sure what we are supposed to do in 1e. More specifically, I don't > understand how to apply the footnote from GK. Here is what I tried...let me > know if this is right. Could you say in words what you are trying to do? I am having trouble following your code. The key footnote is 10 on page 1158. One way to think about it is to note that instead of estimating the base equation, you need to estimate a new equation, one with both the incumbency and party variables (as before) and with an *interaction* between the two. I think that you want something more like: lm(dpct ~ dpct.old + dwin + incum + dwin*incum) and then you need to figure out how the two sets of estimated coefficiencients relate to one another. Dave > > W$newincumb[W$newincumb == -1] <- 2 > > regi <- lm(dpct ~ dpct.old + dwin + as.factor(newincumb), data = W) > > summary(regi) > > Call: > lm(formula = dpct ~ dpct.old + dwin + as.factor(newincumb), data = W) > > Residuals: > Min 1Q Median 3Q Max > -0.3682688 -0.0481769 -0.0006633 0.0512496 0.3475647 > > Coefficients: > Estimate Std. Error t value Pr(>|t|) > (Intercept) 0.137786 0.008399 16.405 < 2e-16 *** > dpct.old 0.731240 0.016616 44.007 < 2e-16 *** > dwin -0.038159 0.008711 -4.380 1.23e-05 *** > as.factor(newincumb)1 0.064584 0.006570 9.830 < 2e-16 *** > as.factor(newincumb)2 -0.032554 0.005739 -5.673 1.55e-08 *** > --- > Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1 > > Residual standard error: 0.07959 on 2810 degrees of freedom > Multiple R-Squared: 0.7345, Adjusted R-squared: 0.7342 > F-statistic: 1944 on 4 and 2810 DF, p-value: < 2.2e-16 > > > Does this mean that the incumbency advantage on proportion of Democratic vote > is a positive 6.4 percentage points in the case of a Democratic incumbent and a > negative 3 percentage points for Republican incumbents? > > > - Nirmala > > -- > Nirmala Ravishankar > PhD Candidate, Government Department > Harvard University > > Perkins Hall #210 > 35 Oxford Street > Cambridge, MA 02138 > Tel: (617) 493 3460 > -- David Kane Lecturer in Government 617-563-0122 dkane(a)latte.harvard.edu

21 years, 5 months

(no subject)

by Y Kirpichevsky

hi, so we are trying to do matching and are failing miserably. this is our code, which is mostly borrowed from the problem set 3. any suggestions? bound$dpct.old.cut <- cut(dpct.old, 20) matching.func<-function(n){ y <-mean(bound[bound$incumb == 1,"dpct"]) vector<-array(NA,n) ##create a vector to store averages of dem vote percentages among the 40 ##matching categories dpct.inc0 <- array(NA, 40) k<-1 for(q in 1:n){ for(party in 0:1){ for(dempct in 1:20){ matches<-bound[bound$incumb == 0 & bound$dwin == party & levels (bound$dpct.old.cut)==levels(bound$dpct.old.cut)[dempct],]; draws<-matches[sample(nrow(matches),nrow(bound[bound$incumb == 1 & bound$dwin == party & levels(bound$dpct.old.cut)== levels(bound$dpct.old.cut) [dempct],])),]; dpct.inc0[k] <- mean(draws$dpct) k <- k+1 } } ##this vector stores differences between the average democratic percentages when incumbent was Democrat and the ##avg. dem. percentage when there was an open seat vector[q] <- y-mean(dpct.inc0) } return(vector) }

21 years, 5 months

by Tao Li

for 4), i think it's pretty clear what we need. if you find it puzzling, go back to the definition of identity and diagonal matrix and define an arbitrary one to evaluate it. or use a numerical example to try for yourself.

21 years, 5 months

Q1C

by dhopkins＠fas.harvard.edu

Dear Colleagues, Pardon my all-too-constant presence in your in-boxes. We are trying to create a matrix on Nirmala's recommendation, but have not yet been successful. Our attempt is below. Thoughts on how to successfully do this are much appreciated. > help(matrix) > m1 <- matrix(clean8a$dempct.08) > m2 <- matrix(clean8a$demwin.08) > m3 <- matrix(clean8a$incum.10) > dim(m1) [1] 2357 1 > m4 <- cbind(m1, m2, m3) > dim(m4) [1] 2357 3 So it has the right dimensions... > is.matrix(m4) [1] TRUE And it is a matrix... > library(MASS) > help(ginv) > m5 <- ginv(m4) Error in svd(X) : NA/NaN/Inf in foreign function call (arg 1) And yet, the "ginv" function doesn't seem able to operate on it... > m5 <- t(m4) > dim(m5) [1] 3 2357 > ...despite the fact that "t" does. Best, Dan

21 years, 5 months

Matrix Notation--Q4a

by dhopkins＠fas.harvard.edu

Dear All, Questions 4a and 4b ask us to find IB, I^2 and AB, A^2. Are those commas some form of matrix notation, or should we simply find each of the four matrices given separately? Best and thanks as always, Dan

21 years, 5 months

← Newer
1
2
3
4
5
6
7
8
9
Older →

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

gov1000-list December 2002