Hi everyone!
This week (the last meeting of the semester!) at the Applied Statistics
Workshop we will be welcoming *Jeff Gill*, Professor of Statistics and
Government at American University. He will be presenting work entitled *Models
for Identifying Substantive Clusters and Fitted Subclusters in Social
Science Data*. Please find the abstract below and on the Applied Stats
website here
<https://projects.iq.harvard.edu/applied.stats.workshop-gov3009>.
As usual, we will meet at noon in CGIS Knafel Room 354 and lunch will be
provided. See you all there!
-- Dana Higgins
*Title:* *Models for Identifying Substantive Clusters and Fitted
Subclusters in Social Science Data *
*Abstract:* Unseen grouping, often called latent clustering, is a common
feature in social science data. Subjects may intentionally
or unintentionially group themselves in ways that complicate the
statistical analysis of substantively important relationships. This work
introduces a new model-based clustering design which incorporates two
sources of heterogeneity. The first source is a random effect that
introduces substantively unimportant grouping but must be
accounted-for. The second source is more important and more difficult to
handle since it is directly related to the relationships of interest in the
data. We develop a model to handle both of these challenges and apply it
to data on terrorist groups, which are notoriously hard to model with
conventional tools.
Hi everyone!
This week at the Applied Statistics Workshop we will be welcoming *Xiang
Zhou*, Professor of Government at Harvard University. He will be presenting
work entitled *Two residual-based methods to adjust for treatment-induced
confounding in causal inference*. Please find the abstract below and on
the Applied Stats website here
<https://projects.iq.harvard.edu/applied.stats.workshop-gov3009>.
As usual, we will meet at noon in CGIS Knafel Room 354 and lunch will be
provided. See you all there!
-- Dana Higgins
*Title:* *Two residual-based methods to adjust for treatment-induced
confounding in causal inference *
*Abstract:* Treatment-induced confounding arises in both causal inference
of time-varying treatments and causal mediation analysis where
post-treatment variables affect both the mediator and outcome. Existing
methods to adjust for treatment-induced confounding include, among others,
Robins's structural nest mean model (SNMM) with its g-estimation and
marginal structural models (MSM) with inverse probability weighting (IPW).
In this talk, I describe two alternative methods, one called
"regression-with-residuals" (RWR) and the other called "residual
balancing," for estimating the marginal means of potential outcomes. The
RWR method is a simple extension of Almirall et al.'s (2010) two-stage
estimator for studying effect moderation to the estimation of marginal
effects. In special cases, it is equivalent to Vansteelandt's (2009)
sequential g-estimator for estimating controlled direct effects. The
residual balancing method, on the other hand, can be considered a
generalization of Hainmueller's (2012) entropy balancing method to
time-varying settings. Numeric simulations show that the residual balancing
method tends to be more efficient and more robust than IPW in a variety of
settings.
Hi everyone!
This week at the Applied Statistics Workshop we will be welcoming *Michael
Windzio*, Professor of Sociology at the University of Bremen. He will be
presenting work entitled *Does schoolwork cooperation improve pupils’
grades and well-being in school? Results from social network and propensity
score analysis*. Please find the abstract below and on the Applied Stats
website here
<https://projects.iq.harvard.edu/applied.stats.workshop-gov3009>.
As usual, we will meet at noon in CGIS Knafel Room 354 and lunch will be
provided. See you all there!
-- Dana Higgins
*Title:* *Does schoolwork cooperation improve pupils’ grades and
well-being in school? Results from social network and propensity score
analysis *
*Abstract:* Using panel data of school-class networks and outcomes of
11-13-year-old students, effects of collaboration in schoolwork networks on
grades and school-related well-being will be investigated. The analysis
might suffer from endogeneity-bias because pupils actively select their
peers also with regard to their school-performance. This selectivity will
be demonstrated by using p* models for ties in schoolwork-networks at t1
based data of 1,289 pupils in 76 classrooms. Predictions from this model
will be used to generate propensity scores. Stochastic actor-based models
(SOAM) for the co-evolution of networks and behavior/attitudes (N=244, k=
10) result in a systematic loss of data, whereas propensity score matching
appropriately limits the data to the area of common support. However,
violation of the SUTVA requires that indicators of network embeddedness are
controlled, which can be done in a propensity score weighting regression.
Overall, results of SOAMs and propensity score matching suggest that
schoolwork networks do not have significantly positive effects, neither on
grades nor on well-being.
Hi everyone!
This week at the Applied Statistics Workshop we will be welcoming *Francesca
Dominici*, Professor of Statistics at the Harvard School of Public Health
and Co-Director of the Harvard Data Science Initiative. She will be
presenting work entitled *Data Science and Our Environment*. Please find
the abstract below and on the Applied Stats website here
<https://projects.iq.harvard.edu/applied.stats.workshop-gov3009>.
As usual, we will meet at noon in CGIS Knafel Room 354 and lunch will be
provided. See you all there!
-- Dana Higgins
*Title:* * Data Science and Our Environment *
*Abstract:* What if I told you I had evidence of a serious threat to
American national security – a terrorist attack in which a jumbo jet will
be hijacked and crashed every 12 days. Thousands will continue to die
unless we act now. This is the question before us today – but the threat
doesn’t come from terrorists. The threat comes from climate change and air
pollution.
We have developed an artificial neural network model that uses on-the-
ground air-monitoring data and satellite-based measurements to estimate
daily pollution levels across the continental U.S., breaking the country up
into 1-square- kilometer zones. We have paired that information with health
data contained in Medicare claims records from the last 12 years, and for
97% of the population ages 65 or older. We have developed statistical
methods and computational efficient algorithms for the analysis over
460 million health records.
Our research shows that short and long term exposure to air pollution is
killing thousands of senior citizens each year. This data science platform
is telling us that federal limits on the nation’s most widespread air
pollutants are not stringent enough. This type of data is the sign of a new
era for the role of data science in public health, and also for
the associated methodological challenges. For example, with enormous
amounts of data, the threat of unmeasured confounding bias is amplified,
and causality is even harder to assess with observational studies. These
and other challenges will be discussed.