Hope everyone's had a good weekend. Three preliminary questions about HW 8:
1(a) asks for a correlation matrix with 18 variables, but I only see 15
possible variables from P, I, and v for years ending with 0, 8, 6, 4, and 2.
Am I missing something, or is 18 a typo? Wanted to be sure.
In 3(c) asks for "non-matrix notation." I'm not sure what you mean by this.
My instinct would be to derive B-hat_{1,IV} and B-hat_{1,IISLS} using the
algebraic sort of matrix notation, e.g. A'A or B^{-1}X, etc.
1(e) wants us to estimate the "best" model we can. Is there some technique
(like propensity scores in matching, except for regression) that would
efficiently guide us to a certain model? Or do we just need to try out umpteen
possible combinations?
Thanks,
Anna
--
Anna Lorien Nelson
Department of Government,
Harvard University
alnelson(a)fas.harvard.edu
Hi All,
One of the problems with the list serve is every once in a while I think of
something someone said long ago, but I can't locate it... So this time, I'm
trying to set the number of decimal points in the R output. I tried searching
the list for "decimal", I tried looking it up on google, and Intro to R, but
couldn't find it. Any ideas?
Thanks,
Phillip.
-------------------------------------------------
Phillip Y. Lipscy
Perkins Hall Room #129
35 Oxford Street
Cambridge, MA 02138
(617)493-4893
lipscy(a)fas.harvard.edu
Ph.D. Candidate
Harvard University, FAS, Department of Government
-------------------------------------------------
Dear all,
I was rather dissapointed with the answer key to problem set 7. Specifically, it totally skips problem 1f, which is the one in which everyone had a lot of problems interpreting the interaction effect between incumbency and year. I was totally lost on that, and it just wasn't in the solution key at all. (e.g., the solution key goes from 1e to problem 2)
Upon further examination, it seems that a lot of the code for problems 1 and 2 came from a member of the class, which is interesting because the solution key was handed out at 7pm and this member of the class didn't finish the problem set until just unto 7pm, and he wasn't credited for the code.
I am not sure how I'm going to learn when I try for a week to figure out what the R output means; ask for help and (reasonably) receive hints but not answers before the problem set is due; but (unreasonably, in my view) don't get answers when the solution is distributed. It doesn't help me to see the best code in the class, because I've already seen the best code in the class (because the class has turned into a pooled study group).
I need someone who really understands statistics (i.e., Tao, Dave, or Gary), to interpret the code's output. Perhaps it might be useful to go over the R summary(lm()) output for problem 1f line by line in class on Monday.
Does anyone else feel this way?
Thanks,
Olivia.
P.S. -- For those of you who would like to limit access to your course1 directory, type "chmod *700" at the unix prompt. (Dave distributed this command earlier in the semester...or some command very similar to it. Dave, please correct me if I got the command wrong.)
The fact that I dropped the "outlier" elections is the main difference
between the size of my dataset and Nirmala's. Here are the comments
that I just added to my code at this point:
## Just to be clean, I get rid of rows in which the Democratic or
## Republican victory is too extreme. I think that GK do this. Note
## that there are a surprisingly large (more than 25%) number of
## elections effected by this rule. In more serious work, you would
## want to try the analysis both with and without these observations
## and double check that your key conclusions are unaffected. If
## they are affected, you should argue about why including them or
## not makes sense.
x <- x[x$d.perc > 0.3 & x$d.perc < 0.7, ]
One way to think about empirical analysis is that there is a large
space of possible assumptions. Each assumption leads to a particular
conclusion about the quantity of interest, say incumbency
advantage. The purpose of empirical analysis is to provide a *mapping*
between the space of possible assumption to the space of
conclusions. If "most", "reasonable" assumptions lead (map to) the
same conclusion, then everything is OK. If changing assumptions (by,
for example, dropping all the extreme observations) changes your
conclusions (incumbency effect goes from 4% to 0), then you need to
tell the reader so. You are then free to make the argument that
certain parts of the space of possible assumptions are much more
plausible than others.
We will be assigning a article by Leamer for next week that touches
on this point. It is one of the best articles that I have ever read.
Dave
Nirmala Ravishankar writes:
> No, I didn't drop the extreme values. I don't know what effect that has
> on the results.....
>
> On Sat, 7 Dec 2002, Dave Kane wrote:
>
> > Hmmm.
> >
> > Now that I have fixed my previous errors, I have:
> >
> > dim(x)
> > [1] 2024 7
> > > table(x$year)
> >
> > 1910 1920 1930 1940 1950 1960 1970 1980 1990
> > 249 182 206 264 273 263 233 192 162
> > >
> >
> > I could understand a little difference from yours, but 800 is too
> > many. Did you drop extreme election results (d.perc > 0.7 or < 0.3) as
> > I did? Does your table of years look like mine? I am somewhat worried
> > about how few obersvations I have in later years . . .
> >
> > Thanks for taking the time to help out.
> >
> > Dave
> >
> >
> >
> >
> >
> > Nirmala Ravishankar writes:
> > > Dear Dave,
> > >
> > > I initially had 2815 observations in my dataset. Then I removed instances
> > > where the winner of the previous election was Democratic but the
> > > incumbent this year was coded as Republican and vice
> > > versa. The final number came to 2796.
> > >
> > > I hope this helps.
> > >
> > > - Nirmala
> > >
> > > On Sat, 7 Dec 2002, Dave Kane wrote:
> > >
> > > > Do other people agree with my final dataset for this problem? The
> > > > closer that I look at it, the more weird it seems. For example, here
> > > > is an interesting table for both the "clean" dataframe and my "x"
> > > > dataframe.
> > > >
> > > > > table(clean$party, clean$incumb)
> > > >
> > > > -1 0 1
> > > > 0 5113 1186 650
> > > > 1 734 944 4159
> > > > > table(x$party, x$incumb)
> > > >
> > > > -1 0 1
> > > > 0 639 133 288
> > > > 1 197 100 441
> > > > >
> > > >
> > > > What's weird here (and I think that others have commented on something
> > > > related to it) is that there are many observations that strike me as
> > > > unlikely, if not impossible.
> > > >
> > > > For example, in the full dataset (in clean) there are 5,113 races in
> > > > which a Republican won the last race and in which the incumbent is a
> > > > Republican. So far, so good. But then there are 650 races in which a
> > > > Republican won the last election but in which a Democrat is the
> > > > incumbent.
> > > >
> > > > Of course, there are real cases when this happens: deaths in office,
> > > > resignations, special elections and the like. But the magnitude looks
> > > > wrong.
> > > >
> > > > Is this what other people see? I keep think that my loading code is
> > > > screwed up in some way . . .
> > > >
> > > > Whoops! Just discovered my problem. I was combining it with data from
> > > > 10 years ago instead of 2. The error was in my combine.data code.
> > > >
> > > > Back in a bit,
> > > >
> > > > Dave
> > > >
> > > >
> > > > --
> > > > David Kane
> > > > Lecturer in Government
> > > > 617-563-0122
> > > > dkane(a)latte.harvard.edu
> > > > _______________________________________________
> > > > gov1000-list mailing list
> > > > gov1000-list(a)fas.harvard.edu
> > > > http://www.fas.harvard.edu/mailman/listinfo/gov1000-list
> > > >
> > >
> >
> > --
> > David Kane
> > Lecturer in Government
> > 617-563-0122
> > dkane(a)latte.harvard.edu
> >
>
--
David Kane
Lecturer in Government
617-563-0122
dkane(a)latte.harvard.edu
First, let us add year.f to our base model.
> summary(lm(d.perc ~ year.f + d.perc + party + incumb, data = x))
Call:
lm(formula = d.perc ~ year.f + d.perc + party + incumb, data = x)
Residuals:
Min 1Q Median 3Q Max
-0.1999082 -0.0455751 -0.0004351 0.0445913 0.1866350
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.487046 0.005069 96.084 < 2e-16 ***
year.f1920 -0.090550 0.006243 -14.505 < 2e-16 ***
year.f1930 0.010860 0.006010 1.807 0.070899 .
year.f1940 -0.023756 0.005651 -4.204 2.74e-05 ***
year.f1950 -0.054303 0.005601 -9.696 < 2e-16 ***
year.f1960 -0.041514 0.005649 -7.349 2.88e-13 ***
year.f1970 -0.020193 0.005815 -3.473 0.000526 ***
year.f1980 -0.045050 0.006201 -7.265 5.29e-13 ***
year.f1990 -0.017937 0.006471 -2.772 0.005625 **
party 0.080067 0.007012 11.419 < 2e-16 ***
incumb 0.036550 0.003745 9.760 < 2e-16 ***
---
Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1
Residual standard error: 0.06369 on 2013 degrees of freedom
Multiple R-Squared: 0.579, Adjusted R-squared: 0.5769
F-statistic: 276.9 on 10 and 2013 DF, p-value: < 2.2e-16
>
There is a lot of interesting stuff going on here. (Really!).
First, we see that the incumbency effect is around 3.6%. This is
similar to the resulst that we have been seeing all along. Second, we
see that the sign and size of party are larger and more
"correct". This says that, elections in which the Democrats won the
election are associated with 8% higher Democratic percentage votes
than elections in which Republicans won the previous election.
It could be that the negative sign that we were seeing before was an
artifact of our failure to "account for" or "control for" the fact
that some years Democrats do well and some years they do poorly. We
can talk more about this in class, if anyone is interested.
Now, is year.f itself significant? Well, first note that 1910 has been
dropped to avoid multicollinearity in the presence of an
intercept. Second, Neter has lots of ways of testing these sorts of
things. In essence, we are asking if all the coefficients could be
zero, taken as a group. One way to do this in R is to use anova to
compare lm objects with and without year effects. But, just
eye-balling the results make it pretty clear that they are
significant.
But, even if they weren't you would probably want to include them in a
regression in which you tested for differences across years in
incumbency. To be honest, I just know this as a statistical rule of
thumb: "Always include in the regression on their own any terms that
you include as interactions." Perhaps Gary could shed light on why
this is a rule of thumb. I just always do it.
This would make the real regression to answer 1f be:
> lm.obj.3 <- lm(d.perc ~ year.f + d.perc + party + incumb, data = x)
> lm.obj.4 <- lm(d.perc ~ year.f + d.perc + party + incumb + year.f*incumb, data = x)
> lm.obj.4
Call:
lm(formula = d.perc ~ year.f + d.perc + party + incumb + year.f * incumb, data = x)
Coefficients:
(Intercept) year.f1920 year.f1930
0.480587 -0.091166 0.018203
year.f1940 year.f1950 year.f1960
-0.017890 -0.048079 -0.035452
year.f1970 year.f1980 year.f1990
-0.009497 -0.042235 -0.014499
party incumb year.f1920:incumb
0.080731 0.009735 -0.004116
year.f1930:incumb year.f1940:incumb year.f1950:incumb
0.029023 0.017767 0.029116
year.f1960:incumb year.f1970:incumb year.f1980:incumb
0.025392 0.059654 0.037803
year.f1990:incumb
0.041070
As you include more and more terms in a regression, you need to be
clear about precisely what everything means. You also need to worry
about overfitting -- recall Gary's discussion of the curse of
dimensionality.
To interpret this, you should note that the coefficient of incumbency
is 1%. Because the 1910 year has been dropped, the 1% is the estimate
for 1910. The estimate for 1920 is lower than this while the estimate
for later years are higher. Looks like the biggest incumbency effect
was in 1970. summary shows us more details:
> summary(lm.obj.4)
Call:
lm(formula = d.perc ~ year.f + d.perc + party + incumb + year.f *
incumb, data = x)
Residuals:
Min 1Q Median 3Q Max
-0.1910283 -0.0437213 0.0007353 0.0417632 0.2152378
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.480587 0.004998 96.161 < 2e-16 ***
year.f1920 -0.091166 0.006245 -14.597 < 2e-16 ***
year.f1930 0.018203 0.006398 2.845 0.00449 **
year.f1940 -0.017890 0.005568 -3.213 0.00133 **
year.f1950 -0.048079 0.005520 -8.710 < 2e-16 ***
year.f1960 -0.035452 0.005573 -6.361 2.48e-10 ***
year.f1970 -0.009497 0.005759 -1.649 0.09926 .
year.f1980 -0.042235 0.006225 -6.785 1.53e-11 ***
year.f1990 -0.014499 0.006394 -2.268 0.02347 *
party 0.080731 0.006804 11.866 < 2e-16 ***
incumb 0.009735 0.005588 1.742 0.08163 .
year.f1920:incumb -0.004116 0.006928 -0.594 0.55249
year.f1930:incumb 0.029023 0.007012 4.139 3.63e-05 ***
year.f1940:incumb 0.017767 0.006133 2.897 0.00381 **
year.f1950:incumb 0.029116 0.006030 4.828 1.48e-06 ***
year.f1960:incumb 0.025392 0.006138 4.137 3.67e-05 ***
year.f1970:incumb 0.059654 0.006331 9.423 < 2e-16 ***
year.f1980:incumb 0.037803 0.006839 5.527 3.68e-08 ***
year.f1990:incumb 0.041070 0.006910 5.943 3.29e-09 ***
---
Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1
Residual standard error: 0.06163 on 2005 degrees of freedom
Multiple R-Squared: 0.6073, Adjusted R-squared: 0.6038
F-statistic: 172.3 on 18 and 2005 DF, p-value: < 2.2e-16
>
Note that the significance values for various years are claims that
the incumbency effect in that year, say 1970, was different than the
effect in the "base" year (the year that was dropped to avoid
multicollinearity and that is estimated by the coefficient of
incumb). So, this is one way of seeing that the incumbency effect does
vary by year. Note also that, issues of statistical significance
aside, the size of the effects is worth paying attention to.
Another way of seeing this (more advanced -- not needed for the final)
is to compare models 3 and 4 (the only difference between them is an
interaction between year and incumbency).
> anova(lm.obj.3, lm.obj.4)
Analysis of Variance Table
Model 1: d.perc ~ d.perc + year.f + party + incumb
Model 2: d.perc ~ d.perc + year.f + party + incumb + year.f:incumb
Res.Df RSS Df Sum of Sq F Pr(>F)
1 2013 8.1644
2 2005 7.6165 8 0.5480 18.031 < 2.2e-16 ***
---
Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1
>
This shows that there is a lot of evidence that the incumbency effect
differs across years. I *think* that the value of the F statistic
listed here (18) is equivalent to what you would get if, as in Neter,
you calculated a test based on the probability that the coefficients
of all the year*incumbency interactions is zero. The answer is that,
given our data, it is *highly* unlikely that this is the case.
That is about it for 1f. If there are other questions that people
would like to read pages and pages of my pensees on, please let me
know.
Dave
--
David Kane
Lecturer in Government
617-563-0122
dkane(a)latte.harvard.edu
I thought I'd ask this now since 1f is on the table - in this problem set, we
treated year as a factor variable to see if incumbency varied by year. Could we
also include the absolute level of year as an explanatory variable (& as an
interaction w/ psi) to see if incumbency varies systematically over time? i.e.,
if we have 100 years instead of 10, like in this case, assigning dummies to all
100 would seem to produce results that are difficult to interpret. So would
including "year*incumb" be a legitimate way to do it?
Thanks,
Phillip.
-------------------------------------------------
Phillip Y. Lipscy
Perkins Hall Room #129
35 Oxford Street
Cambridge, MA 02138
(617)493-4893
lipscy(a)fas.harvard.edu
Ph.D. Candidate
Harvard University, FAS, Department of Government
-------------------------------------------------
Here are some thoughts on 1f.
First, you want to create year as a factor variable. I create a new
variable for this.
> x$year.f <- as.factor(x$year)
> table(x$year.f)
1910 1920 1930 1940 1950 1960 1970 1980 1990
249 182 206 264 273 263 233 192 162
Now, this looks just like the result that I get when I try
table(x$year), so what was the point?
The point is that the statistical functions in R will treat year as a
number and year as a factor (a categorical variable) in very different
ways. For example:
> lm(d.perc ~ year, data = x)
Call:
lm(formula = d.perc ~ year, data = x)
Coefficients:
(Intercept) year
-0.2359013 0.0003716
That regression is looking for a time trend in Democratic
percentage. That is, if the Democrats had been getting consistently
stronger over the last 100 years, we would expect to see a positive
and significant coefficient.
Now, with a factor version of year, we get.
> lm(d.perc ~ year.f, data = x)
Call:
lm(formula = d.perc ~ year.f, data = x)
Coefficients:
(Intercept) year.f1920 year.f1930 year.f1940 year.f1950
0.504257 -0.082359 -0.005893 -0.003557 -0.035178
year.f1960 year.f1970 year.f1980 year.f1990
-0.024998 -0.009435 0.001361 0.017585
Recall that, with factors, there is no relationship between 1920 and
1930; they are just different categories. Note that 1920 is not
"closer" to 1930 than it is to 1990. There is no notion of "distance"
between categories.
These results suggest that, on average, the Democratic percentage has
been around 50%, but that 1920 was a bad year and 1990 was a good
year. Note that we can calculate the mean values directly by dropping
the intercept.
> lm(d.perc ~ -1 + year.f, data = x)
Call:
lm(formula = d.perc ~ -1 + year.f, data = x)
Coefficients:
year.f1910 year.f1920 year.f1930 year.f1940 year.f1950
0.5043 0.4219 0.4984 0.5007 0.4691
year.f1960 year.f1970 year.f1980 year.f1990
0.4793 0.4948 0.5056 0.5218
>
Note that this is the same as the answer for:
> tapply(x$d.perc, x$year.f, mean)
1910 1920 1930 1940 1950 1960
0.5042568 0.4218975 0.4983633 0.5007000 0.4690785 0.4792588
1970 1980 1990
0.4948222 0.5056173 0.5218420
>
which is sort of cool (or, obvious, as Gary would say).
Anyway, what is the answer to 1f?
Next e-mail.
Dave
>
--
David Kane
Lecturer in Government
617-563-0122
dkane(a)latte.harvard.edu
I am working on my own set of answers for question 1e and 1f. I'll be
sending along updates to the list as I go along. Of course, there are
those who would argue that I should make everything all perfect before
I reveal anything, but I thought that it couldn't hurt to see the
process in action.
I welcome questions, comments and criticisms of all kind.
First, how does my cleaned up data set look. I will be working with x
throughout.
> dim(x)
[1] 1798 7
> names(x)
[1] "state" "district" "year" "d.perc"
[5] "incumb" "d.perc.old" "party"
> summary(x)
state district year d.perc
Min. : 1.00 Min. : 1.00 Min. :1910 Min. :0.3003
1st Qu.:14.00 1st Qu.: 3.00 1st Qu.:1930 1st Qu.:0.4114
Median :24.00 Median : 6.00 Median :1950 Median :0.4864
Mean :29.66 Mean :10.96 Mean :1950 Mean :0.4893
3rd Qu.:44.00 3rd Qu.:13.00 3rd Qu.:1970 3rd Qu.:0.5639
Max. :81.00 Max. :98.00 Max. :1990 Max. :0.6998
incumb d.perc.old party
Min. :-1.00000 Min. :0.3003 Min. :0.0000
1st Qu.:-1.00000 1st Qu.:0.4102 1st Qu.:0.0000
Median : 0.00000 Median :0.4754 Median :0.0000
Mean :-0.05951 Mean :0.4820 Mean :0.4105
3rd Qu.: 1.00000 3rd Qu.:0.5476 3rd Qu.:1.0000
Max. : 1.00000 Max. :0.6995 Max. :1.0000
> table(x$year)
1910 1920 1930 1940 1950 1960 1970 1980 1990
213 177 163 191 249 252 212 174 167
> table(x$incumb)
-1 0 1
836 233 729
>
Although this does not look exactly like some of the results that
Nirmala and others posted recently, I think that it is close
enough. Here is the code that I used to generate x.
load.data <- function(end.year = 992){
## Simple function for looping through years and loading up the
## appropriate data. Note that different years have different
## numbers of rows (as well as different district and state (!)
## numbering schemes) so you need to be careful. Thanks to Olivia
## for helpful clues. In our ASCII files, the data runs from 1898 to
## 1992. This code assumes that you are running it in the directory
## with the ASCII files in it. By default, I load up all the
## data. There are 20,954 rows.
## First we need an empty dataframe that we will add things on to.
result <- data.frame()
## Then we do our loop. Note that this code is somewhat inefficient
## since each time we read in a new dataframe we append that to are
## running total. This is (computationally) costly since it forces R
## to find somewhere to put the new (larger) object as opposed to it
## having already allocated the space for it, as we did in previous
## simulations. Alas, pre-allocation would be hard to do since,
## before we start, we do not know how big the files are. But this
## goes quick enough that the inefficiency isn't too bad. Key is to
## save the resulting dataframe in a .Rdata file so that you don't
## need to re-run this code every time you do the analysis.
for(year in seq(898, end.year, by = 2)){
file.name <- paste("da6311_LREC.yr", year, sep = "")
x <- read.table(file.name,
col.names = c("state", "district", "incumb", "dem", "rep"),
na.strings = "-9")
x$year <- 1000 + year
result <- rbind(result, x)
}
result
}
clean.data <- function(x){
## Simple function that cleans up the results of load.data. Mainly just an
## exercise in new variables and bad data deletion. Note that it is not clear
## if this is the correct set of deletions, but it seems OK to me.
x <- x[! (x$dem %in% c(0, 1, NA) | x$rep %in% c(0, 1, NA) | x$incumb %in% c(NA, 3)),]
x$d.perc <- x$dem / (x$dem + x$rep)
## Just to be clean, I get rid of rows in which the Democratic
## victory is too extreme. I think that GK do this.
x <- x[x$d.perc > 0.3 & x$d.perc < 0.7, ]
## In the GK article, this is coded 1 if Democrats win the election
## and -1 if Republicans do. Just for kicks, we switched this to 1
## and 0 for this problem set.
x$party <- ifelse(x$dem > x$rep, 1, 0)
x
}
combine.data <- function(x){
## Could use some more comments here. Want to do this in a more
## robust way for probelm set 8.
result <- data.frame()
for(year in seq(1910, 1990, by = 10)){
this.year <- x[x$year == year, ]
last.year <- x[x$year == (year - 10), ]
both.years <- merge(this.year[c("state", "district", "year", "d.perc", "incumb")],
last.year[c("state", "district", "d.perc", "party")],
by = c("state", "district"),
suffix = c("", ".old")
)
result <- rbind(result, both.years)
}
result
}
--
David Kane
Lecturer in Government
617-563-0122
dkane(a)latte.harvard.edu
Hi, generic question:
how do you convert a jpeg/gif files to a ps in emacs? I downloaded
jpeg2ps.tar.gz and unzipped the file, but it doesn't seem to work. LaTeX only
likes ps files, so you can't incorporate a jpg directly.
Thanks,
Phillip.
-------------------------------------------------
Phillip Y. Lipscy
Perkins Hall Room #129
35 Oxford Street
Cambridge, MA 02138
(617)493-4893
lipscy(a)fas.harvard.edu
Ph.D. Candidate
Harvard University, FAS, Department of Government
-------------------------------------------------
Dear Colleagues,
I am trying to build a matching function from the ground up, in small steps,
and encountering problems that perhaps people could assist me in working
through. I have two requests. The first is help dealing with this specific
error; the second is if people could look over my code more generally--
especially my loop function at the bottom--to let me know if this is the way to
approach this.
1. The error message I get is this:
> tttt <- many.bins()
Error in sample(length(x), size, replace, prob) :
Invalid first argument
However, when I test the individual sampling function--my set of functions is
below, and very much invites suggestions!--it works just fine:
> t2 <- sampler(aaf, 100)
> t2
[1] 0.5426789 100.0000000
Any thoughts puzzling through this?
2. HERE IS MY CODE:
how.many <- function(party, vote.old){
control.size <- nrow(clean8b[clean8b$incum.10 == 0 & clean8b$demwin.08 == party
& clean8b$dempct.08.bin == vote.old,])
control.size
}
match.maker <- function(control.size, party, vote.old) {
draw.from <- clean8b[clean8b$incum.10 == 1 & clean8b$demwin.08 == party &
clean8b$dempct.08.bin == vote.old, "dempct.10"]
draw.from
}
sampler <- function(draw.from, control.size){
holding <- sample(draw.from, control.size)
pct <- mean(holding)
result <- c(pct, control.size)
}
many.bins <- function(){
result <- rep(NA, 12)
for(party in 0:1){
for(vote.old in 1:6){
control.size <- how.many(party, vote.old)
draw.from <- match.maker(control.size, party, vote.old)
holding[i,] <- sampler(draw.from, control.size)
}
}
result <- as.data.frame(holding)
names(result) <- c("percent", "n")
result
}
As always, many many thanks.
Best,
DAn