DFBETAs question - gov1000-list

Alison Elizabeth Post

16 Dec 16 Dec

10:04 a.m.

Hi Andy- You'll want to look at the dfbetas associated with independent variable. There are two ways to contstruct these. One version of the plot simply contains the dfbetas for one regression coefficient. (an "index plot") The other plot contains dfbetas for one coefficient on the x axis, dfbetas for another coefficient on the y axis. The latter type of plot contains the same dfbeta values but does allow you to make half the number of graphs. I would recommend plotting dfbetas for each of the regression coefficients for this problem. Each of these various types of outlier diagnostics is getting at something slightly different... Alison On Thu, 16 Dec 2004, Andy Harris wrote:

...

On plots of DFBETAs: How do we decide what deserves a plot, and what does not? Seems like we could just do a DFBETA plot matrix and plot the effect of each independent variable on the other. Is this shotgun approach correct, or is there a more systematic way of approaching the problem? Wearily, Andy _______________________________________________ gov1000-list mailing list gov1000-list(a)lists.fas.harvard.edu http://lists.fas.harvard.edu/mailman/listinfo/gov1000-list

Reply

Andy Harris

12:21 p.m.

Hi Kevin, The index plots make sense, but how should be systematically evaluate the points? In each plot (there will be 10 on them, including the intercept), there are over 50 points over the benchmark. We don't get an easily useable output of those points that we can compare across all 10 plots, so, as far as I can tell, the only way to do that is to copy the CL output that we get after using identify, coercing it into a vector after inserting commas between each of the identified points. Is there an efficient way to do this? Or am I trying to be too systematic? Do we just need to show that there is a good amount of high leverage points? andy On Dec 16, 2004, at 10:13 AM, Kevin Quinn wrote:

...

Hi Andy, Several options here. In what follows I'm assuming that the output from my fitted model is in lm.out. and that I've created a matrix of DFBETAS called using dfb <- dfbetas(lm.out) The first option is to construct k index plots (one for each column of dfb) where k is the number of estimated coefficients: plot(dfb[,1]) plot(dfb[,2]) ... plot(dfb[,k]) This will tell you which observation numbers tend to exert a large influence on each coefficient. With a lot of data points you may want to identify points interactively with identify() You could also look at all pairwise scatterplots (either by hand, or as you suggest with a scatterplot matrix). The scatterplot matrix is much easier to do (1 line of code) but the identify function doesn't work with the R pairs() function. Doing the plots by hand is a lot more work but it does allow you to use identify(). The bivariate plots give you information about which observations have a large joint influence on 2 coefficients of interest. With a lot of covariates it is typically easiest to look at index plots. Hope this helps. Best, Kevin ------------------------------------------------------ Kevin Quinn Assistant Professor Department of Government and Center for Basic Research in the Social Sciences 34 Kirkland Street Harvard University Cambridge, MA 02138 On Thu, 16 Dec 2004, Andy Harris wrote:

On plots of DFBETAs: How do we decide what deserves a plot, and what does not? Seems like we could just do a DFBETA plot matrix and plot the effect of each independent variable on the other. Is this shotgun approach correct, or is there a more systematic way of approaching the problem? Wearily, Andy _______________________________________________ gov1000-list mailing list gov1000-list(a)lists.fas.harvard.edu http://lists.fas.harvard.edu/mailman/listinfo/gov1000-list

Reply

Kevin Quinn

1:35 p.m.

Hi Andy,

...

The index plots make sense, but how should be systematically evaluate the points? In each plot (there will be 10 on them, including the intercept), there are over 50 points over the benchmark. We don't get an easily useable output of those points that we can compare across all 10 plots, so, as far as I can tell, the only way to do that is to copy the CL output that we get after using identify, coercing it into a vector after inserting commas between each of the identified points. Is there an efficient way to do this? Or am I trying to be too systematic?

Yes, there is an easier way to do this. recall that you can select elements of a matrix using the logical operators in R. for instance, dfb[dfb[,1]>c, 1] will select all of the elements in the first column of dfb that are greater than some constant c. Hope this helps. Best, Kevin

...

Do we just need to show that there is a good amount of high leverage points? andy On Dec 16, 2004, at 10:13 AM, Kevin Quinn wrote:

Hi Andy, Several options here. In what follows I'm assuming that the output from my fitted model is in lm.out. and that I've created a matrix of DFBETAS called using dfb <- dfbetas(lm.out) The first option is to construct k index plots (one for each column of dfb) where k is the number of estimated coefficients: plot(dfb[,1]) plot(dfb[,2]) ... plot(dfb[,k]) This will tell you which observation numbers tend to exert a large influence on each coefficient. With a lot of data points you may want to identify points interactively with identify() You could also look at all pairwise scatterplots (either by hand, or as you suggest with a scatterplot matrix). The scatterplot matrix is much easier to do (1 line of code) but the identify function doesn't work with the R pairs() function. Doing the plots by hand is a lot more work but it does allow you to use identify(). The bivariate plots give you information about which observations have a large joint influence on 2 coefficients of interest. With a lot of covariates it is typically easiest to look at index plots. Hope this helps. Best, Kevin ------------------------------------------------------ Kevin Quinn Assistant Professor Department of Government and Center for Basic Research in the Social Sciences 34 Kirkland Street Harvard University Cambridge, MA 02138 On Thu, 16 Dec 2004, Andy Harris wrote:

On plots of DFBETAs: How do we decide what deserves a plot, and what does not? Seems like we could just do a DFBETA plot matrix and plot the effect of each independent variable on the other. Is this shotgun approach correct, or is there a more systematic way of approaching the problem? Wearily, Andy _______________________________________________ gov1000-list mailing list gov1000-list(a)lists.fas.harvard.edu http://lists.fas.harvard.edu/mailman/listinfo/gov1000-list

Reply