Dear Applied Statistics Workshop Community,
Our next meeting will be on October 11 (12:00 EST). Soichiro Yamauchi
presents "Statistical Analysis with Machine Learning Predicted Variables."
<When>
October 11, 12:00 to 1:30 PM, EST
Lunch will be available for pick-up inside CGIS K354.
<Where>
In-person: CGIS K354
Zoom:
https://harvard.zoom.us/j/93217566507?pwd=elBwYjRJcWhlVE5teE1VNDZoUXdjQT09
<Abstract>
Scholars in the social sciences are increasingly relying on machine
learning (ML) techniques to construct data from large corpora of text and
images. The ML-generated variables are subsequently utilized in statistical
analysis to address substantive questions through regression and hypothesis
testing. However, this approach can introduce substantial bias and lead to
incorrect inferences due to prediction errors during the machine learning
stage. In this paper, we present an approach that incorporates ML-generated
variables into regression analysis while ensuring consistency and
asymptotic normality. The proposed approach leverages a small-scale
human-coded sample to capture the bias in the naive estimator, without the
need for strict assumptions about the structure of prediction errors.
Furthermore, we have developed diagnostic tools to assess whether
additional human coding can further reduce variance in the main analysis.
We illustrate the effectiveness of our method by revisiting a study on the
sources of election fraud with ballot image data and regression analysis.
<2023 Schedule>
GOV 3009 Website:
https://projects.iq.harvard.edu/applied.stats.workshop-gov3009
Calendar:
https://calendar.google.com/calendar/u/0?cid=Y18zdjkzcGF2OWZqa2tsZHJidTlzbm…
Best,
Jialu
--
Jialu Li
Department of Government
Harvard University
jialu_li(a)g.harvard.edu