gov1000-list January 2004

gov1000-list@lists.fas.harvard.edu

13 participants
36 discussions

correlation for variables with missing data

by Jacob Kline

Here's a much easier way to deal with na's: Instead of writing cor(x,y), you could write out var(x,y)/sqrt((var(x)*var(y))). Notice that when you give var both x and y, it returns the covariance. But var has an option allowing you to ignore entries with missing values, 'na.rm' (for removes NA's) var(x,y,na.rm=TRUE)/sqrt(var(x,na.rm=TRUE)*var(y,na.rm=TRUE)) That gives you the correlation between x and y.

20 years, 4 months

RE: Re: Fwd: RE: Gov 1000 question

by Jacob Kline

(1) The code is doing what it is supposed to do. Lmlast is the last model which passed an ftest over its predecessor. Lmnew is the most recent innovation. So after the code you mailed me, I find that ftest(lmlast,lmnew) = .059 meaning that lmnew is not sufficiently better than lmlast, so the while loop closes and you keep lmlast. (2) As for the problem with the residuals. That line runs fine. If you want me to try to replicate your problem, you need to send me everything you ran prior to getting that error message. I also replied last night with some help on debugging the problem yourself. -----Original Message----- From: Bettina Raquel Scholz [mailto:scholz@fas.harvard.edu] Sent: Friday, January 09, 2004 10:32 PM To: jkline(a)fas.harvard.edu Subject: Fwd: Re: Fwd: RE: Gov 1000 question # Prospective variables > > > ftestvalue <- 0 > i <- 1 > lmnew <- lmlast > varsnew <- vars > > while (ftestvalue < .05) { + lmlast <- lmnew + vars <- varsnew + text <- paste("inflationMonthlytp",i,"<- c(inflationMonthly[",(i+1),":length (Date)],rep(NA,",i,"))",sep="") + eval(parse(text=text)) + text <- paste("unratetp",i,"<- c(unrate[",(i+1),":length(Date)],rep (NA,",i,"))",sep="") + eval(parse(text=text)) + + varsnew <- paste(vars," + inflationMonthlytp",i," + unratetp",i,sep="") + text <- paste("lmnew <- lm(approval ~ ",varsnew,",subset=usable)",sep="") + eval(parse(text=text)) + + ftestvalue <- ftest(lmlast,lmnew) + i <- i + 1 + + + } > summary(lmlast) Call: lm(formula = approval ~ inflationMonthly + unrate + eisenhower + kennedy + nixon + ford + carter + regan + bush + inflationMonthlytp1 + unratetp1 + inflationMonthlytp2 + unratetp2 + inflationMonthlytp3 + unratetp3 + inflationMonthlytp4 + unratetp4, subset = usable) Residuals: Min 1Q Median 3Q Max -23.6309 -4.9153 0.2136 4.5721 25.3518 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 82.5270 2.2346 36.931 < 2e-16 *** inflationMonthly -5.0226 1.6678 -3.012 0.002734 ** unrate -0.4890 1.8639 -0.262 0.793159 eisenhower 6.8865 1.2501 5.509 5.85e-08 *** kennedy 15.9619 1.6339 9.769 < 2e-16 *** nixon 1.3001 1.4739 0.882 0.378164 ford 10.3949 2.1778 4.773 2.40e-06 *** carter 10.3501 2.1426 4.831 1.83e-06 *** regan 11.0687 1.4910 7.424 5.13e-13 *** bush 13.0309 1.5404 8.460 3.15e-16 *** inflationMonthlytp1 -3.1411 1.6976 -1.850 0.064876 . unratetp1 -1.6046 2.7180 -0.590 0.555224 inflationMonthlytp2 -3.6101 1.7142 -2.106 0.035714 * unratetp2 -2.5850 2.7267 -0.948 0.343589 inflationMonthlytp3 -5.6802 1.6877 -3.366 0.000824 *** unratetp3 -0.7133 2.7153 -0.263 0.792899 inflationMonthlytp4 -4.7337 1.6414 -2.884 0.004102 ** unratetp4 1.0130 1.9014 0.533 0.594426 --- Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1 Residual standard error: 8.001 on 487 degrees of freedom Multiple R-Squared: 0.5612, Adjusted R-squared: 0.5458 F-statistic: 36.63 on 17 and 487 DF, p-value: < 2.2e-16 > ftestvalue [1] 0.05912646 Jacob, Here is the code that gave us the .059 result. We kind of concluded that we might have gotten this because some of the prospective months were insignificant and others were significant and the .059 was a composit f-score. While, I am at it, what about this question: We keep getting the same error when we try to run the residual part: > residual <- approval - predict.lm(lmfinal,data.approval) Warning in approval - predict.lm(lmfinal, data.approval) : longer object length is not a multiple of shorter object length Prior to this message we followed your code as given on the solutions set. Thanks Bettina and company Quoting soroka(a)fas.harvard.edu: > > > ----- Forwarded message from Jacob Kline <jkline(a)fas.harvard.edu> ----- > Date: Fri, 9 Jan 2004 20:43:11 -0500 > From: Jacob Kline <jkline(a)fas.harvard.edu> > Reply-To: Jacob Kline <jkline(a)fas.harvard.edu> > Subject: RE: Gov 1000 question > To: soroka(a)fas.harvard.edu > > interspersed > > -----Original Message----- > From: soroka(a)fas.harvard.edu [mailto:soroka@fas.harvard.edu] > Sent: Friday, January 09, 2004 8:35 PM > To: jkline(a)fas.harvard.edu > Cc: scholz(a)fas.harvard.edu; sarabyn(a)fas.harvard.edu; > tung(a)fas.harvard.edu > Subject: Gov 1000 question > > > Hi Jacob, > > The Theory + 1 Soft Comparativist Group is studying tonight. We were > running > the code you gave as an answer to homework 7/#3 and had the following > question > arise: > > We ran the prospective variables and then did the "ftestvalue", which > gave us > an answer of .05912646. However, when we ran "summary(lmlast) we found > prospective values were included. Why is this? Wouldn't they onluy be > included > in the ftestvalue <=.05? > > It is always possible that something is wrong with my code. That said, > I need you to send me the code the produced the .059 in order to > investigate. One possible confusion: the way I wrote the code, it adds > only one prospective variable at a time and compares models with n and > n-1 prospective variables. Maybe your .059 corresponds to a comparison > between a model with n prospective variables and one with zero? In > retrospect, that is the right comparison to make, and the way I have > done it in the key is somewhat unsatisfactory. > > Second, looking at lag approval variables we ran: > > residual<-approval-predict.lm(lmfinal,data.approval) but got back a > message "warning in approval...longer object length is not a multiple of > > shorter object length" > > Probably you have modified the variable 'approval' from its length in > the original data set. A quick check. Type length(approval), > length(data.approval$approval) and > length(predict.lm(lmfinal,data.approval). Which of these do not agree? > > What's going on? > > Thanks, > > George > > ----- End forwarded message ----- > > > > ----- End forwarded message -----

20 years, 4 months

RE: [gov1000-list] updated list of delayed passwords; explanation of my write-up

by Jacob Kline

1. OK - I send you the password tomorrow afternoon. 2. Well -- I didn't exactly personally choose to include all those different lags. I let the computer keep adding month after month until the additional r^2 wasn't great enough to justify adding any more months. That may not be the best way to choose the best model, but it satisfies the criterion I set up at the beginning of the write-up. Additionally, it does make sense. We can all agree that the gist of MES is reasonable, i.e. that the my (presumably accurate) expectations about the future economy affect my support of the President. OK - but what and when in particular about the future economy. What justifies picking out one quarter, or one month? No - what seems reasonable to me is that my fuzzy sense of the future of the economy is some vague combination of all the economic indicators for the next year or so. You might say: why not then just look at the average unemployment or the average inflation rate over the next few months. To which I would answer: 'sure, ok. That is very defensible." But try to understand the point of view I offered in the solutions (which admittedly is not my real point of view). What I say in the write-up is: 'Why make any kind of decision about what combination of economic indicators is best (for instance, average unemployment six months out is just the average of the six months that are eventually included)? In principle the F Test can tell me whether I'm justified in adding in some more variables. Now what also comes up in the write-up is that this approach isn't very good for answering the kinds of questions in the hw. For instance, if I want to know how people react to rising unemployment, the model I end up with is sort of annoying -- too many coefficients, all in different directions. But bear in mind that the definition of 'best model' which I started with had nothing to do with answering those kinds of questions. If I had said 'Give me the best model for answering questions a through e, the model I came up with definitely wouldn't have been my choice. In that case, I would probably want one clear indicator of past, present, and future economic well-being. I am going in to Littauer for the next few hours, if you want to stop by the lounge and chat about this. As for the control variables. Sure - you don't have to use the qq plot. I hope that was clear enough in the write-up. If you have specific information about what months are screwy, you should use it. The point of my ignoring that kind of information was to show you that a totally blind approach can still figure out which events are outliers -- and that approach is in some ways superior. (For instance, I never have to assume that one kind of bad month is just as bad, or twice as bad as the next month. When you do that, you add an assumption to your work that is a little ridiculous, but justifiable.) -----Original Message----- From: Paul Bodnar [mailto:bodnar@fas.harvard.edu] Sent: Thursday, January 08, 2004 6:59 PM To: Jacob Kline Subject: Re: [gov1000-list] updated list of delayed passwords Jacob, Two things. First, I'd like to start the exam on Friday at 5. Second, my group has a couple of questions about the way you determined the "best" model. Basically we're confused about why you put so many measures of the same variable into one model. What's the rationale for including 6 different lags of the same variable into a model given that those must be correlated? Also, I'm just confirming that it is OK to code control events as we've been doing instead of using the Q-Q plot as you did in the solution set, assuming we state our assumptions etc. Thanks, Paul Quoting Jacob Kline <jkline(a)fas.harvard.edu>: > Saturday. > > Soroka > Tung > Scholz > Liebman > Mohandas > Bodnar > > Anybody else want to wait until later? > > -- Paul Bodnar Department of Government Harvard University +1 617 230 4525 bodnar(a)fas.harvard.edu

20 years, 4 months

RE: inflationMonthly variable

by Jacob Kline

Inflation is seasonally adjusted. inflationMonthly is log(CPIt) - log(CPIt-1). -----Original Message----- From: Adam N Gossen [mailto:agossen@fas.harvard.edu] Sent: Saturday, January 10, 2004 9:57 AM To: gov1000(a)fas.harvard.edu Subject: inflationMonthly variable Hi Jacob, I think you might have mentioned this in the past at some point, but can you refresh my memory about what the "inflationMonthly" variable is? And also what the difference is between it and "inflation" is? Thanks, Nick

20 years, 4 months

RE: R question

by Jacob Kline

Unlike Stata, there is no easy undo command. If you know exactly what you did, you could undo it. So unrate <- unrate - 1 would undo your transformation. If you are going to mess around with the data, it is best to save the altered variables to different variable names. The safest bet is to reload the data set entirely if you think anything might be amiss. -----Original Message----- From: Paul Bodnar [mailto:bodnar@fas.harvard.edu] Sent: Friday, January 09, 2004 9:44 PM To: Jacob Kline Subject: R question When you mess around with variables, like unrate, and then want to start again with the original version as it is in the data set, is there an R command that lets you reset that variable to its original values? So for example, if I did unrate <-unrate+1 and then thought, no I'd like unrate back again, what would I do? Paul

20 years, 4 months

(no subject)

by Jacob Kline

No need for anything complicated-- you are just supposed to provide R code which can be cut and paste into R to produce the results of that table and you should produce a Also, you want to provide a brief comment describing your result. Does you finding support the estimates of the original researcher? -----Original Message----- From: Adam N Gossen [mailto:agossen@fas.harvard.edu] Sent: Friday, January 09, 2004 6:13 PM To: gov1000(a)fas.harvard.edu Subject: Question 4 Hi Jacob, Excuse me if you've answered this already - I get the digest and sometimes I'm a bit behind. On question 4 we are just asked to replicate the table. ... I'm not having any trouble getting perfect results. So what are we supposed to write up? Are we supposed to imagine the rationale for constructing such a table? Thanks, Nick

20 years, 4 months

password

by Jacob Kline

20 years, 4 months

password

by Jacob Kline

20 years, 4 months

updated list of delayed passwords

by Jacob Kline

Saturday. Soroka Tung Scholz Liebman Mohandas Bodnar Anybody else want to wait until later?

20 years, 4 months

updated list of delayed passwords

by Jacob Kline

Saturday. Soroka Tung Scholz Liebman Mohandas Bodnar Anybody else want to wait until later?

20 years, 4 months

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

gov1000-list January 2004