gov1000-list December 2002

gov1000-list@lists.fas.harvard.edu

19 participants
83 discussions

Start a nNew thread

Computer Server Problems

by Andrew Reeves

Server is frozen!!!

21 years, 5 months

Another Problem 2f

by Traci Burch

Dave: In 2e, we estimate the regression model with 10 variables and then test it on our holdout sample that has only 10 rows or observations. R gave me the smack down and actually used exclamation points and capital letters to tell me that there are no residual degrees of freedom. What should we do? T

21 years, 5 months

1(a) dimensions?

by Anna Lorien Nelson

Hello all, My group members and have authored our own codes, getting different dimensions for the loaded data and the cleaned data frames in 1(a). We've compared code, and it seems identical. But we are getting very different dimensions: Dimensions from code #1: 3958 18 (uncleaned) 2347 28 (cleaned) Dimensions from code #2: 3959 (uncleaned) 2815 (cleaned) What dimensions are other people getting for 1(a)? We're trying to figure out who's on the right track. Thanks, Anna

21 years, 5 months

Re: [gov1000-list] 2d

by Tiffany Chiemi Nagano

well i got 0.3975 in the multiple R-squared value in my plain, original top ten regression... but I don't know if this "counts." > > lm1 <- lm(x$Y ~ V25 + V82 + V79 + V50 + V72 + V55 + V73 + V9 + V36 + V57, > > data = x) > > > > > summary(lm1) > > > > Call: > > lm(formula = x$Y ~ V25 + V82 + V79 + V50 + V72 + V55 + V73 + > > V9 + V36 + V57, data = x) > > > > Residuals: > > Min 1Q Median 3Q Max > > -1.84731 -0.69144 -0.02001 0.65444 2.69899 > > > > Coefficients: > > Estimate Std. Error t value Pr(>|t|) > > (Intercept) -0.001831 0.096681 -0.019 0.98493 > > V25 -0.186392 0.108434 -1.719 0.08910 . > > V82 -0.265918 0.100981 -2.633 0.00997 ** > > V79 -0.181536 0.101551 -1.788 0.07724 . > > V50 0.266542 0.106989 2.491 0.01458 * > > V72 -0.299618 0.118109 -2.537 0.01293 * > > V55 0.275047 0.097828 2.812 0.00606 ** > > V73 -0.245171 0.097415 -2.517 0.01364 * > > V9 -0.135546 0.112570 -1.204 0.23174 > > V36 0.213352 0.102484 2.082 0.04023 * > > V57 0.171292 0.095630 1.791 0.07666 . > > --- > > Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1 > > > > Residual standard error: 0.9543 on 89 degrees of freedom > > Multiple R-Squared: 0.3975, Adjusted R-squared: 0.3298 > > F-statistic: 5.871 on 10 and 89 DF, p-value: 9.407e-07 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Tiffany C. Nagano Harvard College '05 nagano(a)fas.harvard.edu 313 Mather House Mail Center (617) 493-7370 On Tue, 10 Dec 2002, Phillip Y. Lipscy wrote: > This seems to work. > > The highest value I'm getting is in the 0.34 range though, so I'm not sure if > this is the best method after all. for cor4, it seems to be better if you use > something like cor^10 rather than abs(cor) or cor^2. This is an empirical issue > though. > > Anybody get something higher?? > > warning - this function can go on a long time if it hits on a really high value. > Use C-c C-c. > > -Phillip. > > --- > > superfindfunc <- function(){ > r <- 0 > compare <- 0 > k <- 0 > > while(k < 100000){ > r <- 0 > while(r <= compare){ > > a <- sample(2:100, 10, prob = parta$cor4) > sumry <- summary(lm(x$Y ~ x[,a[1]] + x[,a[2]] + x[,a[3]] + x[,a[4]] + x[,a[5]] + > x[,a[6]] + x[,a[7]] + x[,a[8]] + x[,a[9]] + x[,a[10]])) > > r <- sumry$adj.r.squared > k <- 1 + k > } > cat(paste(a)," R^2: ", r, "\n") > compare <- r > } > > } > > ---- > > > superfindfunc() > 82 25 79 43 73 36 50 55 72 84 R^2: 0.3127964 > 25 82 50 79 72 43 73 36 55 9 R^2: 0.3171445 > 25 82 79 9 73 36 84 50 72 55 R^2: 0.3209903 > 25 82 9 72 79 50 73 55 84 85 R^2: 0.3289005 > 25 82 72 50 79 36 55 9 57 73 R^2: 0.3297745 > 82 25 84 72 55 36 79 50 73 57 R^2: 0.3313713 > 25 82 79 50 73 72 33 55 57 36 R^2: 0.3351744 > 25 72 79 82 50 73 55 36 45 57 R^2: 0.337514 > 25 79 82 57 36 72 55 50 73 51 R^2: 0.3438499 > 57 25 50 79 72 55 82 73 63 36 R^2: 0.3475270 > > > > ------------------------------------------------- > Phillip Y. Lipscy > Perkins Hall Room #129 > 35 Oxford Street > Cambridge, MA 02138 > (617)493-4893 > lipscy(a)fas.harvard.edu > > Ph.D. Candidate > Harvard University, FAS, Department of Government > ------------------------------------------------- > > > > > > _______________________________________________ > gov1000-list mailing list > gov1000-list(a)fas.harvard.edu > http://www.fas.harvard.edu/mailman/listinfo/gov1000-list >

21 years, 5 months

by Phillip Y. Lipscy

This seems to work. The highest value I'm getting is in the 0.34 range though, so I'm not sure if this is the best method after all. for cor4, it seems to be better if you use something like cor^10 rather than abs(cor) or cor^2. This is an empirical issue though. Anybody get something higher?? warning - this function can go on a long time if it hits on a really high value. Use C-c C-c. -Phillip. --- superfindfunc <- function(){ r <- 0 compare <- 0 k <- 0 while(k < 100000){ r <- 0 while(r <= compare){ a <- sample(2:100, 10, prob = parta$cor4) sumry <- summary(lm(x$Y ~ x[,a[1]] + x[,a[2]] + x[,a[3]] + x[,a[4]] + x[,a[5]] + x[,a[6]] + x[,a[7]] + x[,a[8]] + x[,a[9]] + x[,a[10]])) r <- sumry$adj.r.squared k <- 1 + k } cat(paste(a)," R^2: ", r, "\n") compare <- r } } ---- > superfindfunc() 82 25 79 43 73 36 50 55 72 84 R^2: 0.3127964 25 82 50 79 72 43 73 36 55 9 R^2: 0.3171445 25 82 79 9 73 36 84 50 72 55 R^2: 0.3209903 25 82 9 72 79 50 73 55 84 85 R^2: 0.3289005 25 82 72 50 79 36 55 9 57 73 R^2: 0.3297745 82 25 84 72 55 36 79 50 73 57 R^2: 0.3313713 25 82 79 50 73 72 33 55 57 36 R^2: 0.3351744 25 72 79 82 50 73 55 36 45 57 R^2: 0.337514 25 79 82 57 36 72 55 50 73 51 R^2: 0.3438499 57 25 50 79 72 55 82 73 63 36 R^2: 0.3475270 ------------------------------------------------- Phillip Y. Lipscy Perkins Hall Room #129 35 Oxford Street Cambridge, MA 02138 (617)493-4893 lipscy(a)fas.harvard.edu Ph.D. Candidate Harvard University, FAS, Department of Government -------------------------------------------------

21 years, 5 months

update...

by Phillip Y. Lipscy

Actually, while I was sending that e-mail, I found one that is 0.22. So I'm raising it a little bit for the all nighter version. :) -Phillip. ------------------------------------------------- Phillip Y. Lipscy Perkins Hall Room #129 35 Oxford Street Cambridge, MA 02138 (617)493-4893 lipscy(a)fas.harvard.edu Ph.D. Candidate Harvard University, FAS, Department of Government -------------------------------------------------

21 years, 5 months

step

by Phillip Y. Lipscy

Dear all, So I'm trying to figure out "step." For starters, it doesn't like getting fed all of the independent vars, since this exhausts the degrees of freedom, causing an error. So I fed it a bunch of the data, and it gives me a nice model, but I can't figure out how to limit the number of parameters used to 10. Is there a way to specify this? The help for "scope" is cryptic. Thanks, Phillip. > lm1 <- lm(Y ~ V10 + V11 + V12 + V13 + V14 + V15 + V16 + V17 + V18 + V19 + V20 + V21 + V22 + V23 + V24 + V25 + V26 + V27 + V28 + V29 + V30 + V31 + V32 + V33 + V34 + V35 + V36 + V37 + V38 + V39 + V40 + V41 + V42 + V43 + V44 + V45 + V46 + V47 + V48 + V49 + V50 + V51 + V52 + V53 + V54 + V55 + V56 + V57 + V58 + V59 + V60 + V61 + V62 + V63 + V64 + V65 + V66 + V67 + V68 + V69 + V70 + V71 + V72 + V73 + V74 + V75 + V76 + V77 + V78 + V79 , data = x) > slm1 <- step(lm1) > summary(slm1) Call: lm(formula = Y ~ V12 + V13 + V14 + V15 + V17 + V18 + V21 + V23 + V25 + V27 + V28 + V29 + V33 + V36 + V41 + V42 + V43 + V44 + V45 + V46 + V48 + V49 + V50 + V51 + V54 + V55 + V57 + V58 + V59 + V60 + V63 + V71 + V72 + V73 + V75 + V76 + V78 + V79, data = x) Residuals: Min 1Q Median 3Q Max -1.68564 -0.42775 -0.03593 0.33893 1.28641 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 0.06315 0.08801 0.718 0.475759 V12 -0.23544 0.09586 -2.456 0.016909 * V13 0.21286 0.09096 2.340 0.022555 * V14 0.19611 0.09732 2.015 0.048308 * V15 -0.24926 0.09067 -2.749 0.007848 ** V17 -0.36518 0.09416 -3.878 0.000260 *** V18 -0.26382 0.09225 -2.860 0.005797 ** V21 0.09008 0.07777 1.158 0.251280 V23 -0.34988 0.10223 -3.423 0.001113 ** V25 0.13527 0.10529 1.285 0.203750 V27 -0.23714 0.09567 -2.479 0.015960 * V28 -0.36008 0.08791 -4.096 0.000126 *** V29 0.16312 0.09642 1.692 0.095789 . V33 0.45835 0.09992 4.587 2.29e-05 *** V36 0.32018 0.08845 3.620 0.000600 *** V41 -0.13782 0.08341 -1.652 0.103619 V42 -0.20141 0.09487 -2.123 0.037810 * V43 -0.27979 0.08810 -3.176 0.002345 ** V44 0.15616 0.10470 1.491 0.140992 V45 -0.10602 0.08966 -1.183 0.241569 V46 -0.10912 0.08905 -1.225 0.225129 V48 0.26458 0.08754 3.022 0.003664 ** V49 -0.30894 0.08973 -3.443 0.001045 ** V50 0.27884 0.09523 2.928 0.004788 ** V51 -0.39035 0.10456 -3.733 0.000418 *** V54 0.20855 0.09488 2.198 0.031753 * V55 0.22992 0.09541 2.410 0.018990 * V57 0.25984 0.08055 3.226 0.002022 ** V58 0.22328 0.07809 2.859 0.005805 ** V59 -0.13445 0.08293 -1.621 0.110107 V60 0.39149 0.10381 3.771 0.000369 *** V63 0.29437 0.10381 2.836 0.006197 ** V71 0.26058 0.09621 2.708 0.008763 ** V72 -0.38626 0.11125 -3.472 0.000955 *** V73 -0.58436 0.08923 -6.549 1.37e-08 *** V75 -0.23510 0.08818 -2.666 0.009810 ** V76 0.23500 0.09067 2.592 0.011927 * V78 -0.16313 0.08718 -1.871 0.066120 . V79 -0.17518 0.09724 -1.801 0.076571 . --- Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1 Residual standard error: 0.7373 on 61 degrees of freedom Multiple R-Squared: 0.7535, Adjusted R-squared: 0.5999 F-statistic: 4.907 on 38 and 61 DF, p-value: 1.952e-08 > ------------------------------------------------- Phillip Y. Lipscy Perkins Hall Room #129 35 Oxford Street Cambridge, MA 02138 (617)493-4893 lipscy(a)fas.harvard.edu Ph.D. Candidate Harvard University, FAS, Department of Government -------------------------------------------------

21 years, 5 months

add, drop, step

by ravishan＠fas.harvard.edu

Dear Dave, Can u explain how 'step', 'add' and 'drop' work. The online help menu is not very helpful -- or maybe its just me. My understanding is that 'add' and 'drop' add and drop one variable at a time, but step performs 'add' and 'drop' recursively. 1) Is it right to start with a regression model that includes all the variables? 2) should one define the scope as the whole model using something like "~." 3) how do we get R to select the top 10. I tried setting steps = 10 and steps = 89, but neither worked. Using just step(lmobj) gave output in some order. Is the order significant -- like the top 10 or the bottom 10 are the best variables? Help will be much appreciated. Thanks, Nirmala My object is the regression model with all variables. > lmobj <- lm(x$Y ~ ., data = x) > exp <- step(lmobj,direction = c("backward")) Start: AIC= -Inf x$Y ~ V2 + V3 + V4 + V5 + V6 + V7 + V8 + V9 + V10 + V11 + V12 + V13 + V14 + V15 + V16 + V17 + V18 + V19 + V20 + V21 + V22 + V23 + V24 + V25 + V26 + V27 + V28 + V29 + V30 + V31 + V32 + V33 + V34 + V35 + V36 + V37 + V38 + V39 + V40 + V41 + V42 + V43 + V44 + V45 + V46 + V47 + V48 + V49 + V50 + V51 + V52 + V53 + V54 + V55 + V56 + V57 + V58 + V59 + V60 + V61 + V62 + V63 + V64 + V65 + V66 + V67 + V68 + V69 + V70 + V71 + V72 + V73 + V74 + V75 + V76 + V77 + V78 + V79 + V80 + V81 + V82 + V83 + V84 + V85 + V86 + V87 + V88 + V89 + V90 + V91 + V92 + V93 + V94 + V95 + V96 + V97 + V98 + V99 + V100 Df Sum of Sq RSS AIC <none> 0 -Inf - V100 1 5.520e-07 5.520e-07 -1703 - V29 1 1.230e-04 1.230e-04 -1163 - V67 1 1.124e-03 1.124e-03 -942 - V79 1 3.507e-03 3.507e-03 -828 - V86 1 7.614e-03 7.614e-03 -750 - V3 1 1.679e-02 1.679e-02 -671 - V56 1 2.228e-02 2.228e-02 -643 - V98 1 3.112e-02 3.112e-02 -610 - V37 1 3.127e-02 3.127e-02 -609 - V35 1 3.638e-02 3.638e-02 -594 - V61 1 3.984e-02 3.984e-02 -585 - V36 1 4.629e-02 4.629e-02 -570 - V74 1 5.339e-02 5.339e-02 -556 - V87 1 6.568e-02 6.568e-02 -535 - V90 1 7.678e-02 7.678e-02 -519 - V95 1 8.774e-02 8.774e-02 -506 - V18 1 8.796e-02 8.796e-02 -506 - V50 1 9.531e-02 9.531e-02 -498 - V7 1 9.885e-02 9.885e-02 -494 - V63 1 1.004e-01 1.004e-01 -492 - V99 1 1.023e-01 1.023e-01 -491 - V4 1 1.144e-01 1.144e-01 -479 - V2 1 1.184e-01 1.184e-01 -476 - V55 1 1.328e-01 1.328e-01 -464 - V46 1 1.372e-01 1.372e-01 -461 - V34 1 1.399e-01 1.399e-01 -459 - V68 1 1.408e-01 1.408e-01 -459 - V53 1 1.425e-01 1.425e-01 -457 - V26 1 1.570e-01 1.570e-01 -448 - V20 1 1.612e-01 1.612e-01 -445 - V49 1 1.634e-01 1.634e-01 -444 - V27 1 1.821e-01 1.821e-01 -433 - V51 1 1.869e-01 1.869e-01 -430 - V5 1 1.988e-01 1.988e-01 -424 - V85 1 2.103e-01 2.103e-01 -418 - V57 1 2.114e-01 2.114e-01 -418 - V78 1 2.117e-01 2.117e-01 -418 - V93 1 2.156e-01 2.156e-01 -416 - V72 1 2.173e-01 2.173e-01 -415 - V13 1 2.207e-01 2.207e-01 -414 - V83 1 2.237e-01 2.237e-01 -412 - V96 1 2.293e-01 2.293e-01 -410 - V71 1 2.316e-01 2.316e-01 -409 - V40 1 2.427e-01 2.427e-01 -404 - V54 1 2.440e-01 2.440e-01 -404 - V88 1 2.465e-01 2.465e-01 -403 - V52 1 2.552e-01 2.552e-01 -399 - V10 1 2.603e-01 2.603e-01 -397 - V41 1 2.619e-01 2.619e-01 -396 - V17 1 2.692e-01 2.692e-01 -394 - V16 1 2.692e-01 2.692e-01 -394 - V14 1 2.715e-01 2.715e-01 -393 - V48 1 2.748e-01 2.748e-01 -392 - V11 1 2.798e-01 2.798e-01 -390 - V65 1 2.933e-01 2.933e-01 -385 - V25 1 3.016e-01 3.016e-01 -382 - V92 1 3.037e-01 3.037e-01 -382 - V81 1 3.046e-01 3.046e-01 -381 - V64 1 3.088e-01 3.088e-01 -380 - V39 1 3.197e-01 3.197e-01 -377 - V82 1 3.204e-01 3.204e-01 -376 - V21 1 3.279e-01 3.279e-01 -374 - V76 1 3.320e-01 3.320e-01 -373 - V84 1 3.350e-01 3.350e-01 -372 - V45 1 3.437e-01 3.437e-01 -369 - V19 1 3.488e-01 3.488e-01 -368 - V24 1 3.500e-01 3.500e-01 -367 - V60 1 3.677e-01 3.677e-01 -363 - V66 1 3.704e-01 3.704e-01 -362 - V9 1 3.756e-01 3.756e-01 -360 - V44 1 3.929e-01 3.929e-01 -356 - V62 1 3.930e-01 3.930e-01 -356 - V97 1 3.984e-01 3.984e-01 -355 - V58 1 4.030e-01 4.030e-01 -353 - V47 1 4.071e-01 4.071e-01 -352 - V80 1 4.119e-01 4.119e-01 -351 - V30 1 4.164e-01 4.164e-01 -350 - V59 1 4.178e-01 4.178e-01 -350 - V77 1 4.296e-01 4.296e-01 -347 - V38 1 4.610e-01 4.610e-01 -340 - V89 1 4.623e-01 4.623e-01 -340 - V94 1 4.929e-01 4.929e-01 -333 - V6 1 1 1 -327 - V8 1 1 1 -321 - V12 1 1 1 -318 - V23 1 1 1 -318 - V73 1 1 1 -314 - V22 1 1 1 -310 - V28 1 1 1 -303 - V75 1 1 1 -300 - V31 1 1 1 -298 - V32 1 1 1 -292 - V33 1 1 1 -281 - V91 1 1 1 -267 - V70 1 1 1 -246 - V69 1 1 1 -243 - V15 1 1 1 -223 - V43 1 2 2 -187 - V42 1 2 2 -181 -- Nirmala Ravishankar Perkins Hall #210 35 Oxford Street Cambridge, MA 02138 Tel: (617) 493 3460

21 years, 5 months

sorting dataframes/matrices

by Phillip Y. Lipscy

Dear all, sort() works nicely for vectors but not for dataframes and matrices (for matrices it doesn't sort by columns properly). Since even ms excel can sort dataframes by columns, I'm assuming R can do that too, but so far I haven't found anything that seems to let me do that. Any ideas? Thanks, Phillip. ------------------------------------------------- Phillip Y. Lipscy Perkins Hall Room #129 35 Oxford Street Cambridge, MA 02138 (617)493-4893 lipscy(a)fas.harvard.edu Ph.D. Candidate Harvard University, FAS, Department of Government -------------------------------------------------

21 years, 5 months

cleaning data

by Phillip Y. Lipscy

Dear All Cleaning away all the extreme years (dpct) and the questionable dwin/incumb combos in the all the lagged years is rather tedious. My gut feeling is that it won't make much of a difference. Should we spend a lot of time making sure? -Phillip. ------------------------------------------------- Phillip Y. Lipscy Perkins Hall Room #129 35 Oxford Street Cambridge, MA 02138 (617)493-4893 lipscy(a)fas.harvard.edu Ph.D. Candidate Harvard University, FAS, Department of Government -------------------------------------------------

21 years, 5 months

← Newer
1
2
3
4
5
6
7
8
9
Older →

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

gov1000-list December 2002