Anna Lorien Nelson writes:
> Hi Dave,
>
> I can't get the code from pp. 6-7 of HW 5 to run.
Hmmm. Perhaps the author of this code would be willing to lend a helping
hand. Perhaps not surprisingly, I prefer my own code, as sent to the mailing
list.
> Every line and function goes
> in fine, without errors, and I have read through my typed code line by line to
> make sure it matches the solution. But then when I try to run:
>
> > superloop(clean)
>
> I get this message:
>
> Error in "[<-"(*tmp*, psi$year == yr, value = NULL) :
> incompatible types
At this point, traceback() will tell you precisely where the error is.
> Huh? I have no idea what this error means.
Alas, errors in R are not the clearest thing. The "[<-" means that the problem
occured during assignment and the stuff about incompatible types means that you
are assigning something that is the wrong "type".
> So I've been running the HW 5 code
> line by line to error-check, and I'm pretty sure the problem is somewhere in
> these two lines toward the end (in the superloop function):
>
> psi$year[psi$year == yr] <- yr
> psi$psi[psi$year == yr] <- objects$coefficients[4]
>
> Now I am stuck. Below, I've attached the whole code I've been running, in case
> you need it. (Unless I've missed some typo, it should be exactly what's on pp.
> 6-7 of HW 5.)
Ahhh. Seems like a typo. You create an "object" using lm and then try to get a
piece of it calling it "objects". Using:
psi$psi[psi$year == yr] <- object$coefficients[4]
will probably work.
In debugging, another way to see that would be to look at
object$coefficients[4] in the browser. You would have seen that it is NULL, I
suspect.
Dave
> Thanks,
> Anna
>
> ## Start of code:
>
> load.data <- function(end.year){
> result <- data.frame()
> for(year in seq(898, end.year, by = 2)){
> file.name <- paste("~/hw4/house/da6311_LREC.yr", year, sep="")
> x <- read.table(file.name, col.names=c("state", "district", "incumb",
> "dem", "rep"), na.strings="-9")
> x$year <- 1000 + year
> result <- rbind(result, x)
> }
> result
> }
>
> y <- load.data(992)
> y$dempct <- y$dem/(y$dem + y$rep)
> y$dwin <- as.integer(y$dempct > 0.5)
> clean <- y[! y$dempct %in% c(0, 1, NA) & ! y$incumb %in% c(3, NA),]
>
> one.year = function(x, year){
> this.year <- x[x$year %in% c(year),]
> last.year <- x[x$year %in% c(year - 2),]
> x <- merge(this.year[c("state", "district", "dempct", "incumb", "dwin")],
> last.year[c("state", "district", "dempct")], by=c("state", "district"),
> suffixes = c("", ".old"))
> reg <- lm(dempct ~ dempct.old + dwin + incumb, data = x)
> return(reg)
> }
>
> superloop <- function(dataframe){
> psi <- data.frame(year = 1900:1990, psi = NA)
> years <- seq(1900, 1990, by = 2)
> years.with.a.2 <- years %% 10 == 2
> for(yr in seq(1900, 1990, by = 2)){
> if(! 2 %in% (yr %% 10)){
> this.year <- dataframe[dataframe$year %in% c(yr),]
> last.year <- dataframe[dataframe$year %in% c(yr - 2),]
> x <- merge(this.year[c("state", "district", "dempct", "incumb",
> "dwin")],
> last.year[c("state", "district", "dempct")],
> by = c("state", "district"), suffixes = c("", ".old"))
> object <- lm(dempct ~ dempct.old + dwin + incumb, data = x)
> psi$year[psi$year == yr] <- yr
> psi$psi[psi$year == yr] <- objects$coefficients[4]
> }
> }
> psi <- psi[! psi$psi %in% c(NA),]
> return(psi)
> }
>
> superloop(clean)
>
> ## End of code, this is where I get the error message ##
>
>
--
David Kane
Lecturer In Government
617-563-0122
dkane(a)latte.harvard.edu
Weird.
In anyone else having this problem?
Are you using a .Rprofile file? If you have "detach(ctest)" in there,
than that would explain the problem.
Try starting up a new xterm and/or logging out of everything and
logging back in.
Dave
Ryan Thomas Moore writes:
> There's the problem...I restarted R, typed "search()", and got the three
> OTHER objects there. package:ctest is missing. What would you suggest?
>
> Thanks in advance,
> Ryan
>
> ------------------------------------------
> Ryan T. Moore ~ Government & Social Policy
> Ph.D. Candidate ~ Harvard University
>
> On Tue, 19 Nov 2002 dkane(a)latte.harvard.edu wrote:
>
> > Ryan Thomas Moore writes:
> > >
> > > When I try to load the libraries ctest and MASS, I get the following
> > > error:
> > >
> > > > library(ctest)
> > > Error in length(data)/nrow:non-numeric argument to binary operator
> > >
> > > Any thoughts?
> >
> > Hmm. ctest should be loaded by default when you start R. Try restarting R and
> > then typing search(). You should see something like this:
> >
> > > search()
> > [1] ".GlobalEnv" "package:ctest" "Autoloads" "package:base"
> >
> > You should then be able to do library(MASS) without problem.
> >
> > Dave
> >
> >
> > > Ryan
> > >
> > > ------------------------------------------
> > > Ryan T. Moore ~ Government & Social Policy
> > > Ph.D. Candidate ~ Harvard University
> > >
> > >
> > >
> >
> > --
> > David Kane
> > Lecturer In Government
> > 617-563-0122
> > dkane(a)latte.harvard.edu
> >
>
--
David Kane
Lecturer in Government
617-563-0122
dkane(a)latte.harvard.edu
Ryan Thomas Moore writes:
>
> When I try to load the libraries ctest and MASS, I get the following
> error:
>
> > library(ctest)
> Error in length(data)/nrow:non-numeric argument to binary operator
>
> Any thoughts?
Hmm. ctest should be loaded by default when you start R. Try restarting R and
then typing search(). You should see something like this:
> search()
[1] ".GlobalEnv" "package:ctest" "Autoloads" "package:base"
You should then be able to do library(MASS) without problem.
Dave
> Ryan
>
> ------------------------------------------
> Ryan T. Moore ~ Government & Social Policy
> Ph.D. Candidate ~ Harvard University
>
>
>
--
David Kane
Lecturer In Government
617-563-0122
dkane(a)latte.harvard.edu
Anna Lorien Nelson writes:
> Hi,
>
> Wasn't sure if I could send these 3 questions to the list...
All perfectly reasonable questions for the list. Others should feel free to
chime in.
> HW5, part 1a: The function "one.year" is created to run a regression on a pair
> of election years. What are the input variables "x" and "year"?
x is a dataframe with all the data that you need. "year" is the year for which
you want to explain the democratic percentage, that is, year = 1986 will place
the democratic percentage for 1986 as the left hand side variable.
> In the code,
> the values "clean" and "1986" are entered. But this doesn't work for me
> because I haven't created those objects. What were those objects, and with
> what code were they created?
I think that this is the code that you are looking for.
http://www.fas.harvard.edu/pipermail/gov1760-l/2002-November/000551.html
> GK article question: hen regressing votes in 1986 on votes in 1984, HW 5
> refers to P_2 as the party of the winner in 1986. I think people have said
> that actually, P_2 indicates party of the winner in 1984. Just Wanted to make
> sure: what is the meaning of P_2?
The meaning of P_2 is the party of the winner for 1984 not 1986. See the
discussion in the above referenced e-mail. Let me know if this is not
clear. Key issue is that using the part of the winner in 1986 would be to use a
variable which is not "causally prior" to the key variable of interest,
incumbency status in 1986.
> Another GK article question: Incumbency is coded as -1, 0, or 1. What, if
> any, difference do these particular assigned values make in the regression?
> What if, for instance, a Republican incumbent were coded 0, an open seat = 5,
> and a Democratic incumbent = 10?
This is a great thing to test out for yourself. In fact, we will be doing this
testing in the next problem set. Recall the meaning of a regression coefficient
--- is the the amount of change in the right hand side variable for a "one
unit" change in the right hand side variable (that it is the coefficient
for). With your coding, it is not clear what a "one unit" change would mean.
These are all good (and fair) questions. Please follow up (anyone) if it
doesn't make sense.
Dave
> Thanks,
>
> Anna
>
>
--
David Kane
Lecturer In Government
617-563-0122
dkane(a)latte.harvard.edu
A perfectly reasonable question for the list.
It depends on the value you pass in for "range". Have you seen help(boxplot)?
Dave
Traci Burch writes:
> Hi Guys
>
> When using the boxplot in R, what rubric does R use to determine the outlier points? Quantiles? Standard Deviation?
> <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
> <HTML><HEAD>
> <META http-equiv=Content-Type content="text/html; charset=iso-8859-1">
> <META content="MSHTML 6.00.2719.2200" name=GENERATOR>
> <STYLE></STYLE>
> </HEAD>
> <BODY bgColor=#ffffff>
> <DIV><FONT face=Arial size=2>Hi Guys</FONT></DIV>
> <DIV><FONT face=Arial size=2></FONT> </DIV>
> <DIV><FONT face=Arial size=2>When using the boxplot in R, what rubric does R use
> to determine the outlier points? Quantiles? Standard
> Deviation?</FONT></DIV></BODY></HTML>
--
David Kane
Lecturer In Government
617-563-0122
dkane(a)latte.harvard.edu
Hi everybody,
In "Making the Most of Statistical Analyses," KTW comment that results should
be reported with the "correct level of precision." In general, how do we know
what the correct level of precision is for reporting decimal results? (I.e.,
how do we judge how many decimal places to use in our estimations of
coefficients and errors?) Is it intuition, or...?
Thanks!
Anna
Yongwook Ryu writes:
> I am doing homework 5 Question 1(a).
> I've done the following and got the error message at the end.
> Could you please tell me what I am doing wrong here?
>
> > load<-function(end.year){
> + result<-data.frame()
> + for (year in seq(898, end.year, by=2)){
> + file.name<-paste("~/housedata/da6311_LREC.yr",year, sep="")
> + x<-read.table(file.name, na.strings="-9", col.names=c
> ("state","dist", "incum", "dem", "rep"))
> + x$year<-1000+year
> + result<-rbind(result,x)
> + }
> + result}
>
> > GK1<-load(992)
> > row.names(GK1)<-seq(nrow(GK1))
> > clean<-GK1[! GK1$dpct %in% c(0,1,NA) & !GK1$incum %in% c(3,NA),]
> > one.year<-function(a,yr){
> + this.year<-a[a$year %in% c(yr),]
> + last.year<-a[a$year %in% c(yr-2),]
> + x<-merge(this.year[c("state", "dist", "dpct", "incum", "dwin")], last.year[c
> ("state", "dist", "dpct")], by=c("state", "dist"), suffixes=c("", ".old"))
> + lm(dpct~dpct.old+dwin+incum, data=x)}
> > prob1<-one.year(clean, 1986)
> Error in "[.data.frame"(this.year, c("state", "dist", "dpct", "incum", :
> undefined columns selected
Hmmmm. Whenever trying to debug something in R, you should do the following:
0) Never give one of your functions (like "load") the same name as a built in R
function. This will only cause you headaches.
1) Issue the traceback() command after getting the error. This will give you a
better idea of where the error is.
2) When a function with many steps produces an error, you should break it up
into parts, executing one line at a time. This will help to isolate the
problem.
3) For tricky cases, consider the use of browser().
In this case, give us some more details.
Does GK1 look correct? Show us dim(GK1) and names(GK1). Does clean look
correct? In general, the error message would suggest that clean does not have
all the variables in it (dpct? dwin?) that you think that it does.
Don't hestitate to send us a follow up.
Dave
> yongwook
>
> -----------------------------
> Yongwook Ryu
> PhD Candidate
> Department of Government
> Harvard University
> Tel:617-493-3397
> Email: yryu(a)fas.harvard.edu
> -----------------------------
>
>
--
David Kane
Lecturer In Government
617-563-0122
dkane(a)latte.harvard.edu
These are perfectly reasonable subjects to discuss with the whole class.
Yongwook Ryu writes:
> here comes a a series of stupid questions.
> I went through Olivia's R code for homework 4 and could not figure out what
> the folliwing commands meant.
Also check out my various code postings in the e-mail archive.
> > x<-c("state", "dist", "incum72", "demo72", "rep72")
> > e72<-read.table("da6311_LREC.yr972", na.strings="-9", col.names=x)
> > y<-c("state", "dist", "incum74", "demo74", "rep74")
> > e74<-read.table("da6311_LREC.yr974", na.strings="-9", col.names=y)
> > dpct72<-e72$demo72/(e72$demo72+e72$rep72)
> > dpct72<-e72$demo72/(e72$demo72+e72$rep72)
> > e72<-cbind(e72, dpct72)
> > dpct74<-e74$demo74/(e74$demo74+e74$rep74)
> > e74<-cbind(e74, dpct74)
> > data1<-merge(e72, e74, all=FALSE, all.y=TRUE)
>
> Q1. I don't understand how she knew what names to give to the data. She
> created the columns of state, dist, etc, but how did she knew to enter those
> names? Is there some kind of code book for the data?
There is a code book for the data. It is "codebook.txt" and should have
appeared after you unzipped the data.
> Q2. When she cbinds e72 and dpct72, is she bisically combining e72 with
> dpct72?
cbind stands for (C)olumn bind, so she is stacking them. See help(cbind) for
more details.
> Q3. I can see that the last line is about creating a dataframe by merging e72
> and e74. But what do the rest (all=FALSE and all.y=TRUE) mean?
The help page for merge is quite good. As a rule of thumb, you should read the
help page for any R command that you use. I actually think that the usage here
is somewhat confusing. In general, you shouldn't specify both "all" and "all.y"
in the same call. The later ensures that all the rows in e74 are kept,
regardless of whether or not they have a matching row in e72.
> I will probably have some more stupid questions over the next couple of days.
> Let me thank in advance for helping me out. ^^
Questions are good. Feel free to send questions like this directly to the list.
Dave
> yongwook
> -----------------------------
> Yongwook Ryu
> PhD Candidate
> Department of Government
> Harvard University
> Tel:617-493-3397
> Email: yryu(a)fas.harvard.edu
> -----------------------------
>
>
--
David Kane
Lecturer In Government
617-563-0122
dkane(a)latte.harvard.edu
A student writes:
> one small question. can i discuss with classmates about homeworks
> #4 and #5? Does this breach the rules concerning the proper
> mid-term conduct? my questions are really basic. for example, how
> to upload data into my home directory, how to make the
> diagrams/graphs show on the screen, how to create data frames, etc.
> you can rest assured that i won't discuss at all about the mid-term
> and they wouldn't want to help me with it either.
This is a slippery slope and you and the students who might help you
need to be very careful. At one extreme, if you are having trouble
printing, then, of course, you can ask someone for help. The point of
the midterm is not to test your ability to print. At the other
extreme, if you are having trouble interpreting a coefficient on a
regression, then, of course, you can *not* ask someone. The ability to
interpret regression coefficients is one of the things that we are
testing.
So, perhaps a better way of expressing the rule is that you may not
discuss the exam *privately* with anyone. To the extent that you are
having difficulties with things like printing, ftp'ing and so on, you
should feel free to ask questions to the class list directly (and
people should feel free to answer them). To the extent that you are
having trouble making graphs and dataframes, then it would appear that
you didn't do homeworks 4 and 5 that thoroughly. Your best bet is to
do them and then to move on to the midterm. You may find the e-mail
archive to be helpful in this regard. The "cost" of not doing the
homeworks when they were assigned is that things like the midterm will
take you much, much longer . . . which is one of the reasons that we
have exams, of course.
;-)
Dave
--
David Kane
Lecturer in Government
617-563-0122
dkane(a)latte.harvard.edu
on a mundane note, my "delete" key has suddenly become a backspace key. Any ideas??
-p
-------------------------------------------------
Phillip Y. Lipscy
Perkins Hall Room #129
35 Oxford Street
Cambridge, MA 02138
(617)493-4893
lipscy(a)fas.harvard.edu
Ph.D. Candidate
Harvard University, FAS, Department of Government
-------------------------------------------------